llama.run package

Utilities for working with a Run Directory, i.e. a directory containing multiple subdirectories, one per event. Includes tools for automatically finding events that can be updated and keeping them updated. Useful for running a pipeline or batch job.

class llama.run.ParseRunsAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)

Bases: argparse.Action

Take a bunch of pathnames and parse them into Run instances with associated eventid glob filters. See Parsers docstring for details.

class llama.run.Parsers(downselect=None, run=('/root/.local/share/llama/current_run/*', ))

Bases: object

Each LLAMA trigger gets its own directory. The name of this directory is called the eventid and the trigger itself is a LLAMA Event (see: llama.event). For a given LLAMA run, all event directories should go in a commond directory called a “run directory”; the collection of events is called a Run (see: llama.run). Most things the pipeline does work on a single Run and are meant to affect one or more matching Event instances. When you specify directories, you are implicitly specifying the Run (i.e. collection of triggers) as well as a UNIX-style glob (like the asterisk matching all files, *) which describes the eventid pattern you want to match. For example, matching all event IDs that start with “S” (corresponding to O3 LIGO/Virgo superevents) would require using S* as your event glob.

If you want to explicitly print which currently-existing Event directories will be impacted by the arguments you provide, you can use --dry-run-dirs to print the impacted directories and exit without taking further action. This is good practice while getting used to this interface.

The syntax for specifying the Run and Event glob is the path of the run directory followed by a slash followed by the event glob with no slash at the end (be sure to escape the * so the shell doesn’t expand it):

'/run/directory/event*glob'

Specify only the event glob by leaving the run directory out but keeping the leading / (if for some insane reason your root directory is your run directory, a double-leading / will communicate your perverse desire). In this case the default Run directory /root/.local/share/llama/current_run/ is implied, so the following are equivalent:

'/event*glob'
/root/.local/share/llama/current_run/'event*glob'

Specify only the Run directory by leaving a trailing slash and omitting the event glob; in this case, the default event glob * will be used, so the following are equivalent:

/run/directory/
/run/directory/'*'

You can use relative paths for the Run directory, the final part of the path will not be expanded and will be treated as the base directory. The only exception to this is if you are using relative paths and don’t put any / in the specified path, in which case the relative path will be expanded. This allows the common and intuitive behavior of running specific events in the current directory when you pass their name alone, or alternatively to treat the current directory as the only event directory by passing a single . as the run argument. Something like ./., however, will be interpreted as meaning you want the current directory to be the run directory only matching Event ids of ..

The following examples assume you are currently in the event directory /some/directory/. Let’s say this is the event directory, and you want to update only the contents of this directory. You can specify the run as /some/ and the event glob as directory with either of the following commands paths:

.
/some/directory

Alternatively, if /some/directory/ is a run directory, and you want to affect the event directories it contains that match the default event glob *, you can run use any of the following (note again that the event glob is in quotes to prevent your shell from expanding it into multiple arguments):

./
./'*'
/some/directory/
/some/directory/'*'

If you want to use the name of the current directory as your event glob (so that only eventids that have the same basename as your current directory are used) while keeping the default run directory /root/.local/share/llama/current_run/, you would have to place a leading slash followed by the actual name of the run directory; as noted above, /. not work because the dot will be treated literally as the eventid you want to use. (Note that you usually wouldn’t want to do this; why would you be in this directory if you want to operate on an event stored in a different run directory?):

/directory
/root/.local/share/llama/current_run/directory

You can further specify which types of events should be processed by specifying --downselect followed by a string to be passed as the arguments to Run.downselect (run --print-downselections to see possible options).

See llama.run and llama.event for more information on Run and Event objects.

property eventfiltering

A CliParser to be used for downselecting runs and events.

property pipeline_and_eventfiltering

Get a combination of llama.pipeline.Parsers.pipeline and llama.run.Parsers.eventfiltering processors in the correct order and includes the extra step of using the pipeline specified in the first parser in the Run instances returned by the second parser.

class llama.run.PrintDownselectionsAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)

Bases: argparse.Action

Print a dedented docstring for Run.downselect and exit.

class llama.run.Run

Bases: llama.run.RunTuple

A single directory containing multiple event directories combined with a pipeline (i.e. a selection of analysis steps to use) and a set of downselection criteria for picking events:

Run Directory ├─ Event Directory 1 ├─ Event Directory 2 └─ Event Directory 3

This should ordinarily correspond to a run of some sort (an observing run, engineering run, offline run, test run, etc.) where the events are somehow related. Since this class mostly just provides methods for organizing and selecting Event instances with tailored Pipeline instances, it’s up to you to decide how to best organize a run. Run objects are immutable to simplify hashing and uniqueness checks.

These tools allow the user to conveniently check on the status of all events in a given Run. A dictionary of downselection arguments (as fed to downselect) can be used to restrict the set of events that will be returned events.

Parameters
  • rundir (str) – The directory where all events are stored. Files for individual events are stored in per-event subdirectories of rundir. Will be converted to a canonical path with os.path.realpath to help ensure unique Run definitions.

  • pipeline (llama.pipeline.Pipeline, optional) – A Pipeline instance holding FileHandler classes that should be used for this analysis. Defaults to the main pipeline in production use.

  • downselection (tuple, optional) – A tuple of dictionaries of keyword arguments of the type passed to downselect. The events returned by events will match these downselection criteria with each downselection dict applied in the order they appear in this argument (to allow more complex chained downselections). You probably don’t want to manually specify this; a more pythonic way to provide downselection arguments is to use the downselect method to return a downselection from a starting Run.

downselect(**kwargs)

Get another Run instance identical to the current one but with the following downselection criteria applied to the Event instances returned by self.events. Can also specify a sorting function and a maximum number of returned values:

Parameters
  • invert (bool, optional) – Invert what matches and what doesn’t. Default: False

  • eventid_filter (str, optional) – A glob (as taken by fnmatch) that the eventid must match.

  • fileexists (str, optional) – The event directory contains a file with this name.

  • fhexists (llama.filehandler.FileHandler, optional) – The eventdir contains the file for this FileHandler.

  • fhnameexists (str, optional) – The eventdir contains the file for the FileHandler with this name.

  • fhmeta (llama.filehandler.FileHandler, optional) – The eventdir contains a metadata rider for the file for this FileHandler.

  • fhnamemeta (str, optional) – The eventdir contains a metadata rider for the file for the FileHandler with this name.

  • vetoed (bool, optional) – Whether the events have been vetoed by the VETOED flag or not.

  • manual (bool, optional) – Whether the events have been marked as manual by the MANUAL flag.

  • modbefore (float, optional) – Select events whose directory modtimes were before this timestamp.

  • modafter (float, optional) – Select events whose directory modtimes were after this timestamp.

  • sec_since_mod_gt (float, optional) – Select events whose directory modtimes are more than this many seconds ago.

  • sec_since_mod_lt (float, optional) – Select events whose directory modtimes are less than this many seconds ago.

  • v0before (float, optional) – Select events whose first event state version was generated before this timestamp. Will IGNORE directories that do not have any versioned files.

  • v0after (float, optional) – Select events whose first event state version was generated after this timestamp. Will IGNORE directories that do not have any versioned files.

  • sec_since_v0_gt (float, optional) – Select events whose first event state version was generated more than this many seconds ago. Will IGNORE directories that do not have any versioned files.

  • sec_since_v0_lt (float, optional) – Select events whose first event state version was generated less than this many seconds ago. Will IGNORE directories that do not have any versioned files.

  • sortkey (function, optional) – A function taking Event instances that can be passed to sorted to sort the downselected Event instances. Default: None (i.e. no sorting)

  • reverse (bool, optional) – Whether to reverse the order of sorting (i.e. put the results in descending order) before applying limit. Default: False

  • limit (int, optional) – Return up to this number of events. Most useful if sortkey has also been provided. Default: None (i.e. no limit)

downselect_pipeline(invert=False, **kwargs)

Return a Run instance with a pipeline that has been downselected using Pipeline.downselect.

property events

Return a list of events in this run directory with self.downselection criteria applied (see downselect for a list of possible downselection criteria).

Parameters
  • sortkey (function, optional) – A sorting key (as passed to sorted) to use to sort the returned events. If none is provided, the events will be sorted based on astrophysical event time using Event.gpstime; beware that an error will be raised if this quantity is ill-defined for ANY of the returned events.

  • reverse (bool, optional) – Whether to reverse the default sort order, i.e. put in descending order. True by default so that the most-recently-occuring events are first in the list.

update(**downselect)

Get a list of Event instances matching this Run instance’s downselection criteria and update each event directory. Run until all events are up-to-date. Will queue files from each event that are ready to update, allowing them to be handled in parallel, and will track outstanding jobs. Optionally specify a FileGraph.downselect downselection argument to pass to each FileGraph being updated (default is to regenerate all files needing regeneration). Be careful with this argument, as it will cause file generation attempts for matching files without checking whether they need to be generated.

property vis

A collection of visualization methods for this Run instance.

class llama.run.RunTuple(rundir, pipeline, downselection)

Bases: tuple

property downselection

Alias for field number 2

property pipeline

Alias for field number 1

property rundir

Alias for field number 0

class llama.run.RunVisualization(run)

Bases: object

Provide methods for visualizing the status of a run directory.

finished(outfile=None)

Create a bar plot showing the proportion of complete to incomplete files for each FileHandler in this Run instance.

Parameters

outfile (str, optional) – If provided, save the plot to this filename.

Returns

fig – The bar plot figure.

Return type

matplotlib.figure

wall_times(outfile=None)

Create histograms of wall times (i.e. how long each file took to generate) for each FileHandler in this Run instance.

Parameters

outfile (str, optional) – If provided, save all plots as PNG files to a gzipped tarfile with this filename.

Returns

plots – A dictionary of matplotlib.figure instances whose keys are the names of each FileHandler class and whose values are histograms of wall times for each FileHandler class.

Return type

dict

llama.run.downselect_events(events, **kwargs)

Take a list of events and downselect them using the checks described in Run.downselect.

llama.run.past_runs(paths=('/root/.local/share/llama/past_runs', ), pipeline=frozenset({'Advok', 'CoincScatterI3LvcPdf', 'CoincScatterI3LvcPng', 'CoincScatterZtfI3LvcPdf', 'CoincScatterZtfI3LvcPng', 'CoincScatterZtfLVCPdf', 'CoincScatterZtfLVCPng', 'CoincSignificanceI3Lvc', 'CoincSignificanceSubthresholdI3Lvc', 'CoincSummaryI3LvcPdf', 'CoincSummaryI3LvcTex', 'FermiGRBsJSON', 'IceCubeNeutrinoList', 'IceCubeNeutrinoListCoincTxt', 'IceCubeNeutrinoListTex', 'IceCubeNeutrinoListTxt', 'LVAlertAdvok', 'LVAlertJSON', 'LVCGraceDbEventData', 'LvcDistancesJson', 'LvcGcnXml', 'LvcRetractionXml', 'LvcSkymapFits', 'LvcSkymapHdf5', 'PAstro', 'RctSlkI3CoincSummaryI3LvcPdf', 'RctSlkLmaCoincScatterI3LvcPdf', 'RctSlkLmaCoincScatterI3LvcPng', 'RctSlkLmaCoincScatterZtfI3LvcPdf', 'RctSlkLmaCoincScatterZtfI3LvcPng', 'RctSlkLmaCoincScatterZtfLVCPdf', 'RctSlkLmaCoincScatterZtfLVCPng', 'RctSlkLmaCoincSignificanceI3Lvc', 'RctSlkLmaCoincSignificanceSubthresholdI3Lvc', 'RctSlkLmaCoincSummaryI3LvcPdf', 'RctSlkLmaLVAlertJSON', 'RctSlkLmaLVCGraceDbEventData', 'RctSlkLmaLvcDistancesJson', 'RctSlkLmaLvcGcnXml', 'RctSlkLmaLvcRetractionXml', 'RctSlkLmaSkymapInfo', 'SkymapInfo', 'ZtfTriggerList'}))

Get a dictionary of run names and corresponding Run instances, looking for run directories in the specified paths.

Parameters
  • paths (tuple, optional) – Directories in which to search for past run directories.

  • pipeline (llama.pipeline.Pipeline) – The pipeline to use for the returned Run instances in runs.

Returns

runs – A dictionary of past runs whose keys are the name of the rundir and whose values are the corresponding Run. Absolute paths are used as the keys.

Return type

dict

llama.run.postprocess_downselect(_self: llama.cli.CliParser, namespace: argparse.Namespace)

If namespace.downselect is not None, parse it as a comma-separated list of key=value pairs, where value will be parsed as a boolean if it equals either True or False and as a string otherwise. Use these arguments to downselect each of the runs specified in namespace.run.

llama.run.postprocess_dry_run(_self: llama.cli.CliParser, namespace: argparse.Namespace)

If --dry-run-dirs is true, print the directories that would be affected by the given arguments and quit without taking further action. If you want to extend this, print more dry run information and then call this function to print run/event information before quitting.

llama.run.postprocess_select_pipeline(_self: llama.cli.CliParser, namespace: argparse.Namespace)

Take the pipeline specified by --pipeline and/or --filehandlers and set the llama.Run instances selected in namespace to use that pipeline instead of the default.