llama.run package

Utilities for working with a Run Directory, i.e. a directory containing multiple subdirectories, one per event. Includes tools for automatically finding events that can be updated and keeping them updated. Useful for running a pipeline or batch job.

class llama.run.ParseRunsAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)

Bases: argparse.Action

Take a bunch of pathnames and parse them into Run instances with associated eventid glob filters. See Parsers docstring for details.

class llama.run.Parsers

Bases: object

Each LLAMA trigger gets its own directory. The name of this directory is called the eventid and the trigger itself is a LLAMA Event (see: llama.event). For a given LLAMA run, all event directories should go in a commond directory called a “run directory”; the collection of events is called a Run (see: llama.run). Most things the pipeline does work on a single Run and are meant to affect one or more matching Event instances. When you specify directories, you are implicitly specifying the Run (i.e. collection of triggers) as well as a UNIX-style glob (like the asterisk matching all files, *) which describes the eventid pattern you want to match. For example, matching all event IDs that start with “S” (corresponding to O3 LIGO/Virgo superevents) would require using S* as your event glob.

If you want to explicitly print which currently-existing Event directories will be impacted by the arguments you provide, you can use --dry-run-dirs to print the impacted directories and exit without taking further action. This is good practice while getting used to this interface.

The syntax for specifying the Run and Event glob is the path of the run directory followed by a slash followed by the event glob with no slash at the end (be sure to escape the * so the shell doesn’t expand it):

'/run/directory/event*glob'

Specify only the event glob by leaving the run directory out but keeping the leading / (if for some insane reason your root directory is your run directory, a double-leading / will communicate your perverse desire). In this case the default Run directory /root/.local/share/llama/current_run/ is implied, so the following are equivalent:

'/event*glob'
/root/.local/share/llama/current_run/'event*glob'

Specify only the Run directory by leaving a trailing slash and omitting the event glob; in this case, the default event glob * will be used, so the following are equivalent:

/run/directory/
/run/directory/'*'

You can use relative paths for the Run directory, the final part of the path will not be expanded and will be treated as the base directory. The only exception to this is if you are using relative paths and don’t put any / in the specified path, in which case the relative path will be expanded. This allows the common and intuitive behavior of running specific events in the current directory when you pass their name alone, or alternatively to treat the current directory as the only event directory by passing a single . as the run argument. Something like ./., however, will be interpreted as meaning you want the current directory to be the run directory only matching Event ids of ..

The following examples assume you are currently in the event directory /some/directory/. Let’s say this is the event directory, and you want to update only the contents of this directory. You can specify the run as /some/ and the event glob as directory with either of the following commands paths:

.
/some/directory

Alternatively, if /some/directory/ is a run directory, and you want to affect the event directories it contains that match the default event glob *, you can run use any of the following (note again that the event glob is in quotes to prevent your shell from expanding it into multiple arguments):

./
./'*'
/some/directory/
/some/directory/'*'

If you want to use the name of the current directory as your event glob (so that only eventids that have the same basename as your current directory are used) while keeping the default run directory /root/.local/share/llama/current_run/, you would have to place a leading slash followed by the actual name of the run directory; as noted above, /. not work because the dot will be treated literally as the eventid you want to use. (Note that you usually wouldn’t want to do this; why would you be in this directory if you want to operate on an event stored in a different run directory?):

/directory
/root/.local/share/llama/current_run/directory

See llama.run and llama.event for more information on Run and Event objects.

eventfiltering = CliParser(prog='sphinx-build', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False)
class llama.run.Run

Bases: llama.run.RunTuple

A single directory containing multiple event directories combined with a pipeline (i.e. a selection of analysis steps to use) and a set of downselection criteria for picking events:

Run Directory ├─ Event Directory 1 ├─ Event Directory 2 └─ Event Directory 3

This should ordinarily correspond to a run of some sort (an observing run, engineering run, offline run, test run, etc.) where the events are somehow related. Since this class mostly just provides methods for organizing and selecting Event instances with tailored Pipeline instances, it’s up to you to decide how to best organize a run. Run objects are immutable to simplify hashing and uniqueness checks.

These tools allow the user to conveniently check on the status of all events in a given Run. A dictionary of downselection arguments (as fed to downselect) can be used to restrict the set of events that will be returned events.

Parameters
  • rundir (str) – The directory where all events are stored. Files for individual events are stored in per-event subdirectories of rundir. Will be converted to a canonical path with os.path.realpath to help ensure unique Run definitions.

  • pipeline (llama.pipeline.Pipeline, optional) – A Pipeline instance holding FileHandler classes that should be used for this analysis. Defaults to the main pipeline in production use.

  • downselection (tuple, optional) – A tuple of dictionaries of keyword arguments of the type passed to downselect. The events returned by events will match these downselection criteria with each downselection dict applied in the order they appear in this argument (to allow more complex chained downselections). You probably don’t want to manually specify this; a more pythonic way to provide downselection arguments is to use the downselect method to return a downselection from a starting Run.

downselect(**kwargs)

Get another Run instance identical to the current one but with the following downselection criteria applied to the Event instances returned by self.events. Can also specify a sorting function and a maximum number of returned values:

Parameters
  • invert (bool, optional) – Invert what matches and what doesn’t. Default: False

  • eventid_filter (str, optional) – A glob (as taken by fnmatch) that the eventid must match.

  • fileexists (str, optional) – The event directory contains a file with this name.

  • fhexists (llama.filehandler.FileHandler, optional) – The eventdir contains the file for this FileHandler.

  • fhnameexists (str, optional) – The eventdir contains the file for the FileHandler with this name.

  • fhmeta (llama.filehandler.FileHandler, optional) – The eventdir contains a metadata rider for the file for this FileHandler.

  • fhnamemeta (str, optional) – The eventdir contains a metadata rider for the file for the FileHandler with this name.

  • vetoed (bool, optional) – Whether the events have been vetoed or not.

  • sortkey (function, optional) – A function taking Event instances that can be passed to sorted to sort the downselected Event instances. Default: None (i.e. no sorting)

  • reverse (bool, optional) – Whether to reverse the order of sorting (i.e. put the results in descending order) before applying limit. Default: False

  • limit (int, optional) – Return up to this number of events. Most useful if sortkey has also been provided. Default: None (i.e. no limit)

downselect_pipeline(invert=False, **kwargs)

Return a Run instance with a pipeline that has been downselected using Pipeline.downselect.

property events

Return a list of events in this run directory with self.downselection criteria applied (see downselect for a list of possible downselection criteria).

Parameters
  • sortkey (function, optional) – A sorting key (as passed to sorted) to use to sort the returned events. If none is provided, the events will be sorted based on astrophysical event time using Event.gpstime; beware that an error will be raised if this quantity is ill-defined for ANY of the returned events.

  • reverse (bool, optional) – Whether to reverse the default sort order, i.e. put in descending order. True by default so that the most-recently-occuring events are first in the list.

update()

Get a list of Event instances matching this Run instance’s downselection criteria and update each event directory.

property vis

A collection of visualization methods for this Run instance.

class llama.run.RunTuple(rundir, pipeline, downselection)

Bases: tuple

property downselection

Alias for field number 2

property pipeline

Alias for field number 1

property rundir

Alias for field number 0

class llama.run.RunVisualization(run)

Bases: object

Provide methods for visualizing the status of a run directory.

finished(outfile=None)

Create a bar plot showing the proportion of complete to incomplete files for each FileHandler in this Run instance.

Parameters

outfile (str, optional) – If provided, save the plot to this filename.

Returns

fig – The bar plot figure.

Return type

matplotlib.figure

wall_times(outfile=None)

Create histograms of wall times (i.e. how long each file took to generate) for each FileHandler in this Run instance.

Parameters

outfile (str, optional) – If provided, save all plots as PNG files to a gzipped tarfile with this filename.

Returns

plots – A dictionary of matplotlib.figure instances whose keys are the names of each FileHandler class and whose values are histograms of wall times for each FileHandler class.

Return type

dict

llama.run.downselect_events(events, **kwargs)

Take a list of events and downselect them using the checks described in Run.downselect.

llama.run.past_runs(paths=('/root/.local/share/llama/past_runs', ), pipeline=frozenset({'Advok', 'CoincScatterI3LvcPdf', 'CoincScatterI3LvcPng', 'CoincScatterZtfI3LvcPdf', 'CoincScatterZtfI3LvcPng', 'CoincScatterZtfLVCPdf', 'CoincScatterZtfLVCPng', 'CoincSignificanceI3Lvc', 'CoincSummaryI3LvcPdf', 'CoincSummaryI3LvcTex', 'FermiGRBsJSON', 'IceCubeNeutrinoList', 'IceCubeNeutrinoListCoincTxt', 'IceCubeNeutrinoListTex', 'IceCubeNeutrinoListTxt', 'LVAlertAdvok', 'LVAlertJSON', 'LVCGraceDbEventData', 'LVCInitialXml', 'LVCPreliminaryXml', 'LvcDistancesJson', 'LvcSkymapFits', 'LvcSkymapHdf5', 'PAstro', 'RctSlkI3CoincSummaryI3LvcPdf', 'RctSlkLmaCoincScatterI3LvcPdf', 'RctSlkLmaCoincScatterI3LvcPng', 'RctSlkLmaCoincScatterZtfI3LvcPdf', 'RctSlkLmaCoincScatterZtfI3LvcPng', 'RctSlkLmaCoincScatterZtfLVCPdf', 'RctSlkLmaCoincScatterZtfLVCPng', 'RctSlkLmaCoincSignificanceI3Lvc', 'RctSlkLmaCoincSummaryI3LvcPdf', 'RctSlkLmaLVAlertJSON', 'RctSlkLmaLVCGraceDbEventData', 'RctSlkLmaLVCInitialXml', 'RctSlkLmaLVCPreliminaryXml', 'RctSlkLmaLvcDistancesJson', 'RctSlkLmaSkymapInfo', 'SkymapInfo', 'ZtfTriggerList'}))

Get a dictionary of run names and corresponding Run instances, looking for run directories in the specified paths.

Parameters
  • paths (tuple, optional) – Directories in which to search for past run directories.

  • pipeline (llama.pipeline.Pipeline) – The pipeline to use for the returned Run instances in runs.

Returns

runs – A dictionary of past runs whose keys are the name of the rundir and whose values are the corresponding Run. If multiple paths are given, then the keys will have the corresponding pathname from paths prepended by ‘os.join’ to ensure uniqueness of keys.

Return type

dict

llama.run.postprocess_dry_run(self: llama.cli.CliParser, namespace: argparse.Namespace)

If --dry-run-dirs is true, print the directories that would be affected by the given arguments and quit without taking further action. If you want to extend this, print more dry run information and then call this function to print run/event information before quitting.