llama.run package¶
Utilities for working with a Run Directory, i.e. a directory containing multiple subdirectories, one per event. Includes tools for automatically finding events that can be updated and keeping them updated. Useful for running a pipeline or batch job.
-
class
llama.run.
ParseRunsAction
(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)¶ Bases:
argparse.Action
Take a bunch of pathnames and parse them into
Run
instances with associatedeventid
glob filters. SeeParsers
docstring for details.
-
class
llama.run.
Parsers
(downselect=None, run=('/root/.local/share/llama/current_run/*', ))¶ Bases:
object
Each LLAMA trigger gets its own directory. The name of this directory is called the
eventid
and the trigger itself is a LLAMAEvent
(see:llama.event
). For a given LLAMA run, all event directories should go in a commond directory called a “run directory”; the collection of events is called aRun
(see:llama.run
). Most things the pipeline does work on a singleRun
and are meant to affect one or more matchingEvent
instances. When you specify directories, you are implicitly specifying theRun
(i.e. collection of triggers) as well as a UNIX-style glob (like the asterisk matching all files,*
) which describes theeventid
pattern you want to match. For example, matching all event IDs that start with “S” (corresponding to O3 LIGO/Virgo superevents) would require usingS*
as your event glob.If you want to explicitly print which currently-existing
Event
directories will be impacted by the arguments you provide, you can use--dry-run-dirs
to print the impacted directories and exit without taking further action. This is good practice while getting used to this interface.The syntax for specifying the
Run
andEvent
glob is the path of the run directory followed by a slash followed by the event glob with no slash at the end (be sure to escape the*
so the shell doesn’t expand it):'/run/directory/event*glob'
Specify only the event glob by leaving the run directory out but keeping the leading
/
(if for some insane reason your root directory is your run directory, a double-leading/
will communicate your perverse desire). In this case the defaultRun
directory/root/.local/share/llama/current_run/
is implied, so the following are equivalent:'/event*glob' /root/.local/share/llama/current_run/'event*glob'
Specify only the
Run
directory by leaving a trailing slash and omitting the event glob; in this case, the default event glob*
will be used, so the following are equivalent:/run/directory/ /run/directory/'*'
You can use relative paths for the
Run
directory, the final part of the path will not be expanded and will be treated as the base directory. The only exception to this is if you are using relative paths and don’t put any/
in the specified path, in which case the relative path will be expanded. This allows the common and intuitive behavior of running specific events in the current directory when you pass their name alone, or alternatively to treat the current directory as the only event directory by passing a single.
as the run argument. Something like./.
, however, will be interpreted as meaning you want the current directory to be the run directory only matchingEvent
ids of.
.The following examples assume you are currently in the event directory
/some/directory/
. Let’s say this is the event directory, and you want to update only the contents of this directory. You can specify the run as/some/
and the event glob asdirectory
with either of the following commands paths:. /some/directory
Alternatively, if
/some/directory/
is a run directory, and you want to affect the event directories it contains that match the default event glob*
, you can run use any of the following (note again that the event glob is in quotes to prevent your shell from expanding it into multiple arguments):./ ./'*' /some/directory/ /some/directory/'*'
If you want to use the name of the current directory as your event glob (so that only
eventids
that have the same basename as your current directory are used) while keeping the default run directory/root/.local/share/llama/current_run/
, you would have to place a leading slash followed by the actual name of the run directory; as noted above,/.
not work because the dot will be treated literally as the eventid you want to use. (Note that you usually wouldn’t want to do this; why would you be in this directory if you want to operate on an event stored in a different run directory?):/directory /root/.local/share/llama/current_run/directory
You can further specify which types of events should be processed by specifying
--downselect
followed by a string to be passed as the arguments toRun.downselect
(run--print-downselections
to see possible options).See
llama.run
andllama.event
for more information onRun
andEvent
objects.-
property
eventfiltering
¶ A
CliParser
to be used for downselecting runs and events.
-
property
pipeline_and_eventfiltering
¶ Get a combination of
llama.pipeline.Parsers.pipeline
andllama.run.Parsers.eventfiltering
processors in the correct order and includes the extra step of using the pipeline specified in the first parser in theRun
instances returned by the second parser.
-
property
-
class
llama.run.
PrintDownselectionsAction
(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)¶ Bases:
argparse.Action
Print a dedented docstring for
Run.downselect
and exit.
-
class
llama.run.
Run
¶ Bases:
llama.run.RunTuple
A single directory containing multiple event directories combined with a pipeline (i.e. a selection of analysis steps to use) and a set of downselection criteria for picking events:
Run Directory ├─ Event Directory 1 ├─ Event Directory 2 └─ Event Directory 3
This should ordinarily correspond to a run of some sort (an observing run, engineering run, offline run, test run, etc.) where the events are somehow related. Since this class mostly just provides methods for organizing and selecting
Event
instances with tailoredPipeline
instances, it’s up to you to decide how to best organize a run. Run objects are immutable to simplify hashing and uniqueness checks.These tools allow the user to conveniently check on the status of all events in a given
Run
. A dictionary of downselection arguments (as fed todownselect
) can be used to restrict the set of events that will be returnedevents
.- Parameters
rundir (str) – The directory where all events are stored. Files for individual events are stored in per-event subdirectories of
rundir
. Will be converted to a canonical path withos.path.realpath
to help ensure uniqueRun
definitions.pipeline (llama.pipeline.Pipeline, optional) – A
Pipeline
instance holdingFileHandler
classes that should be used for this analysis. Defaults to the main pipeline in production use.downselection (tuple, optional) – A tuple of dictionaries of keyword arguments of the type passed to
downselect
. The events returned byevents
will match these downselection criteria with each downselection dict applied in the order they appear in this argument (to allow more complex chained downselections). You probably don’t want to manually specify this; a more pythonic way to provide downselection arguments is to use thedownselect
method to return a downselection from a startingRun
.
-
downselect
(**kwargs)¶ Get another
Run
instance identical to the current one but with the following downselection criteria applied to theEvent
instances returned byself.events
. Can also specify a sorting function and a maximum number of returned values:- Parameters
invert (bool, optional) – Invert what matches and what doesn’t. Default: False
eventid_filter (str, optional) – A glob (as taken by
fnmatch
) that theeventid
must match.fileexists (str, optional) – The event directory contains a file with this name.
fhexists (llama.filehandler.FileHandler, optional) – The eventdir contains the file for this FileHandler.
fhnameexists (str, optional) – The eventdir contains the file for the FileHandler with this name.
fhmeta (llama.filehandler.FileHandler, optional) – The eventdir contains a metadata rider for the file for this FileHandler.
fhnamemeta (str, optional) – The eventdir contains a metadata rider for the file for the FileHandler with this name.
vetoed (bool, optional) – Whether the events have been vetoed by the VETOED flag or not.
manual (bool, optional) – Whether the events have been marked as manual by the MANUAL flag.
modbefore (float, optional) – Select events whose directory modtimes were before this timestamp.
modafter (float, optional) – Select events whose directory modtimes were after this timestamp.
sec_since_mod_gt (float, optional) – Select events whose directory modtimes are more than this many seconds ago.
sec_since_mod_lt (float, optional) – Select events whose directory modtimes are less than this many seconds ago.
v0before (float, optional) – Select events whose first event state version was generated before this timestamp. Will IGNORE directories that do not have any versioned files.
v0after (float, optional) – Select events whose first event state version was generated after this timestamp. Will IGNORE directories that do not have any versioned files.
sec_since_v0_gt (float, optional) – Select events whose first event state version was generated more than this many seconds ago. Will IGNORE directories that do not have any versioned files.
sec_since_v0_lt (float, optional) – Select events whose first event state version was generated less than this many seconds ago. Will IGNORE directories that do not have any versioned files.
sortkey (function, optional) – A function taking
Event
instances that can be passed tosorted
to sort the downselectedEvent
instances. Default: None (i.e. no sorting)reverse (bool, optional) – Whether to reverse the order of sorting (i.e. put the results in descending order) before applying
limit
. Default: Falselimit (int, optional) – Return up to this number of events. Most useful if
sortkey
has also been provided. Default: None (i.e. no limit)
-
downselect_pipeline
(invert=False, **kwargs)¶ Return a
Run
instance with a pipeline that has been downselected usingPipeline.downselect
.
-
property
events
¶ Return a list of events in this run directory with
self.downselection
criteria applied (seedownselect
for a list of possible downselection criteria).- Parameters
sortkey (function, optional) – A sorting key (as passed to
sorted
) to use to sort the returned events. If none is provided, the events will be sorted based on astrophysical event time usingEvent.gpstime
; beware that an error will be raised if this quantity is ill-defined for ANY of the returned events.reverse (bool, optional) – Whether to reverse the default sort order, i.e. put in descending order.
True
by default so that the most-recently-occuring events are first in the list.
-
update
(**downselect)¶ Get a list of
Event
instances matching thisRun
instance’s downselection criteria and update each event directory. Run until all events are up-to-date. Will queue files from each event that are ready to update, allowing them to be handled in parallel, and will track outstanding jobs. Optionally specify aFileGraph.downselect
downselection argument to pass to eachFileGraph
being updated (default is to regenerate all files needing regeneration). Be careful with this argument, as it will cause file generation attempts for matching files without checking whether they need to be generated.
-
property
vis
¶ A collection of visualization methods for this
Run
instance.
-
class
llama.run.
RunTuple
(rundir, pipeline, downselection)¶ Bases:
tuple
-
property
downselection
¶ Alias for field number 2
-
property
pipeline
¶ Alias for field number 1
-
property
rundir
¶ Alias for field number 0
-
property
-
class
llama.run.
RunVisualization
(run)¶ Bases:
object
Provide methods for visualizing the status of a run directory.
-
finished
(outfile=None)¶ Create a bar plot showing the proportion of complete to incomplete files for each FileHandler in this
Run
instance.- Parameters
outfile (str, optional) – If provided, save the plot to this filename.
- Returns
fig – The bar plot figure.
- Return type
matplotlib.figure
-
wall_times
(outfile=None)¶ Create histograms of wall times (i.e. how long each file took to generate) for each FileHandler in this
Run
instance.- Parameters
outfile (str, optional) – If provided, save all plots as PNG files to a gzipped tarfile with this filename.
- Returns
plots – A dictionary of
matplotlib.figure
instances whose keys are the names of eachFileHandler
class and whose values are histograms of wall times for eachFileHandler
class.- Return type
dict
-
-
llama.run.
downselect_events
(events, **kwargs)¶ Take a list of events and downselect them using the checks described in
Run.downselect
.
-
llama.run.
past_runs
(paths=('/root/.local/share/llama/past_runs', ), pipeline=frozenset({'Advok', 'CoincScatterI3LvcPdf', 'CoincScatterI3LvcPng', 'CoincScatterZtfI3LvcPdf', 'CoincScatterZtfI3LvcPng', 'CoincScatterZtfLVCPdf', 'CoincScatterZtfLVCPng', 'CoincSignificanceI3Lvc', 'CoincSignificanceSubthresholdI3Lvc', 'CoincSummaryI3LvcPdf', 'CoincSummaryI3LvcTex', 'FermiGRBsJSON', 'IceCubeNeutrinoList', 'IceCubeNeutrinoListCoincTxt', 'IceCubeNeutrinoListTex', 'IceCubeNeutrinoListTxt', 'LVAlertAdvok', 'LVAlertJSON', 'LVCGraceDbEventData', 'LvcDistancesJson', 'LvcGcnXml', 'LvcRetractionXml', 'LvcSkymapFits', 'LvcSkymapHdf5', 'PAstro', 'RctSlkI3CoincSummaryI3LvcPdf', 'RctSlkLmaCoincScatterI3LvcPdf', 'RctSlkLmaCoincScatterI3LvcPng', 'RctSlkLmaCoincScatterZtfI3LvcPdf', 'RctSlkLmaCoincScatterZtfI3LvcPng', 'RctSlkLmaCoincScatterZtfLVCPdf', 'RctSlkLmaCoincScatterZtfLVCPng', 'RctSlkLmaCoincSignificanceI3Lvc', 'RctSlkLmaCoincSignificanceSubthresholdI3Lvc', 'RctSlkLmaCoincSummaryI3LvcPdf', 'RctSlkLmaLVAlertJSON', 'RctSlkLmaLVCGraceDbEventData', 'RctSlkLmaLvcDistancesJson', 'RctSlkLmaLvcGcnXml', 'RctSlkLmaLvcRetractionXml', 'RctSlkLmaSkymapInfo', 'SkymapInfo', 'ZtfTriggerList'}))¶ Get a dictionary of run names and corresponding
Run
instances, looking for run directories in the specified paths.- Parameters
paths (tuple, optional) – Directories in which to search for past run directories.
pipeline (llama.pipeline.Pipeline) – The pipeline to use for the returned
Run
instances inruns
.
- Returns
runs – A dictionary of past runs whose keys are the name of the rundir and whose values are the corresponding
Run
. Absolute paths are used as the keys.- Return type
dict
-
llama.run.
postprocess_downselect
(_self: llama.cli.CliParser, namespace: argparse.Namespace)¶ If
namespace.downselect
is notNone
, parse it as a comma-separated list ofkey=value
pairs, wherevalue
will be parsed as a boolean if it equals eitherTrue
orFalse
and as a string otherwise. Use these arguments todownselect
each of the runs specified innamespace.run
.
-
llama.run.
postprocess_dry_run
(_self: llama.cli.CliParser, namespace: argparse.Namespace)¶ If
--dry-run-dirs
is true, print the directories that would be affected by the given arguments and quit without taking further action. If you want to extend this, print more dry run information and then call this function to print run/event information before quitting.
-
llama.run.
postprocess_select_pipeline
(_self: llama.cli.CliParser, namespace: argparse.Namespace)¶ Take the pipeline specified by
--pipeline
and/or--filehandlers
and set thellama.Run
instances selected innamespace
to use that pipeline instead of the default.