llama package¶
The Gravitational Wave/High Energy Neutrino (LLAMA) analysis pipeline.
-
class
llama.
Event
¶ Bases:
llama.event.EventTuple
,llama.flags.FlagsMixin
,llama.versioning.GitDirMixin
An event object, used to update and access data associated with a specific trigger (which recieves its own directory or
eventdir
). FileHandler or another Event (or anything with ‘eventid’ and ‘rundir’ properties) as an input argument, in which case it will correspond to the same event as the object provided as an argument. One can also provide a module or dictionary containing FileHandlers, which will be used to create aFileGraph
for the Event (i.e. it will specify which files should be made for this event). Defaults to the files module for now, though eventually this should be refactored out.- Parameters
eventid_or_event (str or EventTuple or llama.filehandler.FileHandlerTuple) – This can be a string with the unique ID of this event (which can simply be a filename-friendly descriptive string for tests or manual analyses), in which case the next arguments,
rundir
andpipeline
, will be used; OR, alternatively, it can be anEventTuple
(e.g. anotherEvent
instance),FileHandlerTuple
(e.g. anyFileHandler
instance), or any object with valideventid
andrundir
attributes. In this case, those attributes from the provided object will be re-used, and therundir
argument will be ignored. This makes it easy to get a newEvent
instance describing the same underlying event but with a differentPipeline
specified, or alternatively to get theEvent
corresponding to a givenFileHandler
(though in this case you should take care to manually specify the ``Pipeline`` you want to use!).rundir (str, optional) – The
rundir
, i.e. the directory where all events for a given run are stored, if it differs from the default and is not specified byeventid_or_event
.pipeline (llama.pipeline.Pipeline, optional) – The
pipeline
, i.e. the set of FileHandlers that we want to generate, if it differs from the default pipeline. If none is provided, useDEFAULT_PIPELINE
.
- Returns
event – A new
Event
instance with the given properties.- Return type
- Raises
ValueError – If the
eventid_or_event
argument does not conform to the above expectations or if therundir
directory for the run does not exist, a ValueError with a descriptive message will be thrown.
-
property
auxiliary_paths
¶ Names of possible auxiliary paths in the directory that are used to track the state of the Event as a whole.
-
change_time
()¶ The time at which the permissions of this event directory were last changed (according to the underlying storage system). Note that you probably are more interested in
modification_time
.
-
clone
(commit='HEAD', rundir=None, clobber=False)¶ Make a clone of this event in a temporary directory for quick manipulations on a specific version of a file.
- Parameters
commit (str, optional) – The commit hash to check out when cloning this event. If not specified, the most recent commit will be used. Unsaved changes will be discarded.
rundir (str, optional) – The run directory in which to store the cloned event. If not specified, a temporary directory will be created and used. The contents of this directory will NOT be deleted automatically.
clobber (bool, optional) – Whether this cloned event should overwrite existing state.
- Returns
clone_event (llama.event.Event) – A clone of this event. The full history is saved, but the specified
commit
is checked out. Any uncommitted changes in the working directory will not be copied over to theclone_event
. Ifclone_event
already seems to be a valid event with the correctcommit
hash, no further action will be taken (thus repeated cloning has little performance penalty).Raises
llama.versioning.GitRepoUninitialized – If this is called on an
Event
that has not had its git history initialized.IOError – If this event already exists in the specified
rundir
and is checked out to a different hash, unlessclobber
is True, in which case that working directory will be deleted and replaced with the desired commit.
-
compare_contents
(other)¶ Compare the file contents of this event to another event using
filecmp.cmpfiles
(though results are given asFileHandler
instances rather than file paths). Use this to see whether two event directories contain the same contents under a given pipeline.- Parameters
other (Event, str) – The other
Event
instance to compare this one to, or else a directory containing files that can be compared to thisEvent
(though in that case the filenames must still follow the expected format).- Returns
match (FileGraph) – A
FileGraph
for thisEvent
whose files have the same contents as those corresponding to theother
event.mismatch (FileGraph) – A
FileGraph
for thisEvent
whose files have differing contents as those corresponding to theother
event.errors (FileGraph) – A
FileGraph
for thisEvent
whose corresponding files do not exist or otherwise could not be accessed for comparison (either for the files corresponding to thisEvent
or theother
one).
- Raises
ValueError – If the
Pipeline
instances of thisEvent
and theother
one are not equal, it does not make sense to compare them, and aValueError
will be raised.
-
property
cruft_files
¶ Return a list of files in the event directory that are not associated with any file handler nor with event state directories.
-
property
eventdir
¶ The full path to the directory containing files related to this event.
-
exists
()¶ Check whether this event already exists.
-
property
files
¶ Get a
FileGraph
full ofFileHandler
instances for the files in this event with this particularpipeline
.
-
classmethod
fromdir
(eventdir='.', **kwargs)¶ Initialize an event just by providing a filepath to its event directory. If no directory is specified, default to the current directory and try to treat that like an event. Note that the returned event will eliminate symbolic links when determining paths for
rundir
andeventid
. Useful for quickly making events during interactive work.- Parameters
eventdir (str, optional) – The event directory from which to initialize a new event.
**kwargs – Remaining keyword arguments to pass to
Event()
.
-
gpstime
()¶ Return the GPS time of this event. Returns -1 if none can be parsed.
-
init
()¶ Initialize the directory for this event, making sure it is in a proper state for processing data. Make sure the
eventdir
exists by creating it if necessary. Also initializes version control and set flags to the defaults specified inFlagsMixin.DEFAULT_FLAGS
(whichEvent
inherits).- Returns
Returns this
Event
instance to allow command chaining.- Return type
self
- Raises
ValueError – If the
eventdir
path exists but is not a directory or a link to a directory, we don’t want to overwrite it to make an the directory.
-
modification_time
()¶ The time at which this event directory was modified (according to the underlying storage system).
-
printstatus
(cruft=False, highlight=None, unicode=True, plot=None)¶ Get a user-readable message indicating the current status of this event. Include a list of files not in the selected pipeline with
cruft=True
. Bold lines in the summary table containing strings inhighlight
as substrings. Use nice unicode characters and terminal colors withunicode=True
, or use plain ascii withunicode=False
. Include a status graph plot withplot=True
, or exclude it withplot=False
; ifplot=None
, include the plot only if the underlyingGraph::Easy
Perl library is available on the host.
-
save_tarball
(outfile)¶ Save this event and all its contents as a gzipped tarball. You should probably use a
.tar.gz
extension for theoutfile
name.
-
update
(**downselect)¶ Generate any files that fit the
FileGraph
downselection criteria specified indownselect
. By default, generate all files that have not been generated and regenerate all files that have been obsoleted because their data dependencies have changed. ReturnsTrue
if files were updated,False
if no files in need of update were found.
-
v0_time
()¶ Return the timestamp of the first file version commit, catching the error if the event does not have versioning initialized/has no versions and returning
False
.
-
class
llama.
Run
¶ Bases:
llama.run.RunTuple
A single directory containing multiple event directories combined with a pipeline (i.e. a selection of analysis steps to use) and a set of downselection criteria for picking events:
Run Directory ├─ Event Directory 1 ├─ Event Directory 2 └─ Event Directory 3
This should ordinarily correspond to a run of some sort (an observing run, engineering run, offline run, test run, etc.) where the events are somehow related. Since this class mostly just provides methods for organizing and selecting
Event
instances with tailoredPipeline
instances, it’s up to you to decide how to best organize a run. Run objects are immutable to simplify hashing and uniqueness checks.These tools allow the user to conveniently check on the status of all events in a given
Run
. A dictionary of downselection arguments (as fed todownselect
) can be used to restrict the set of events that will be returnedevents
.- Parameters
rundir (str) – The directory where all events are stored. Files for individual events are stored in per-event subdirectories of
rundir
. Will be converted to a canonical path withos.path.realpath
to help ensure uniqueRun
definitions.pipeline (llama.pipeline.Pipeline, optional) – A
Pipeline
instance holdingFileHandler
classes that should be used for this analysis. Defaults to the main pipeline in production use.downselection (tuple, optional) – A tuple of dictionaries of keyword arguments of the type passed to
downselect
. The events returned byevents
will match these downselection criteria with each downselection dict applied in the order they appear in this argument (to allow more complex chained downselections). You probably don’t want to manually specify this; a more pythonic way to provide downselection arguments is to use thedownselect
method to return a downselection from a startingRun
.
-
downselect
(**kwargs)¶ Get another
Run
instance identical to the current one but with the following downselection criteria applied to theEvent
instances returned byself.events
. Can also specify a sorting function and a maximum number of returned values:- Parameters
invert (bool, optional) – Invert what matches and what doesn’t. Default: False
eventid_filter (str, optional) – A glob (as taken by
fnmatch
) that theeventid
must match.fileexists (str, optional) – The event directory contains a file with this name.
fhexists (llama.filehandler.FileHandler, optional) – The eventdir contains the file for this FileHandler.
fhnameexists (str, optional) – The eventdir contains the file for the FileHandler with this name.
fhmeta (llama.filehandler.FileHandler, optional) – The eventdir contains a metadata rider for the file for this FileHandler.
fhnamemeta (str, optional) – The eventdir contains a metadata rider for the file for the FileHandler with this name.
vetoed (bool, optional) – Whether the events have been vetoed by the VETOED flag or not.
manual (bool, optional) – Whether the events have been marked as manual by the MANUAL flag.
modbefore (float, optional) – Select events whose directory modtimes were before this timestamp.
modafter (float, optional) – Select events whose directory modtimes were after this timestamp.
sec_since_mod_gt (float, optional) – Select events whose directory modtimes are more than this many seconds ago.
sec_since_mod_lt (float, optional) – Select events whose directory modtimes are less than this many seconds ago.
v0before (float, optional) – Select events whose first event state version was generated before this timestamp. Will IGNORE directories that do not have any versioned files.
v0after (float, optional) – Select events whose first event state version was generated after this timestamp. Will IGNORE directories that do not have any versioned files.
sec_since_v0_gt (float, optional) – Select events whose first event state version was generated more than this many seconds ago. Will IGNORE directories that do not have any versioned files.
sec_since_v0_lt (float, optional) – Select events whose first event state version was generated less than this many seconds ago. Will IGNORE directories that do not have any versioned files.
sortkey (function, optional) – A function taking
Event
instances that can be passed tosorted
to sort the downselectedEvent
instances. Default: None (i.e. no sorting)reverse (bool, optional) – Whether to reverse the order of sorting (i.e. put the results in descending order) before applying
limit
. Default: Falselimit (int, optional) – Return up to this number of events. Most useful if
sortkey
has also been provided. Default: None (i.e. no limit)
-
downselect_pipeline
(invert=False, **kwargs)¶ Return a
Run
instance with a pipeline that has been downselected usingPipeline.downselect
.
-
property
events
¶ Return a list of events in this run directory with
self.downselection
criteria applied (seedownselect
for a list of possible downselection criteria).- Parameters
sortkey (function, optional) – A sorting key (as passed to
sorted
) to use to sort the returned events. If none is provided, the events will be sorted based on astrophysical event time usingEvent.gpstime
; beware that an error will be raised if this quantity is ill-defined for ANY of the returned events.reverse (bool, optional) – Whether to reverse the default sort order, i.e. put in descending order.
True
by default so that the most-recently-occuring events are first in the list.
-
update
(**downselect)¶ Get a list of
Event
instances matching thisRun
instance’s downselection criteria and update each event directory. Run until all events are up-to-date. Will queue files from each event that are ready to update, allowing them to be handled in parallel, and will track outstanding jobs. Optionally specify aFileGraph.downselect
downselection argument to pass to eachFileGraph
being updated (default is to regenerate all files needing regeneration). Be careful with this argument, as it will cause file generation attempts for matching files without checking whether they need to be generated.
-
property
vis
¶ A collection of visualization methods for this
Run
instance.
-
class
llama.
Pipeline
¶ Bases:
llama.classes.ImmutableDict
,llama.classes.NamespaceMappable
A pipeline specifies a specific set of data inputs and the functions that act on them in terms of intermediate data products and the functions used to generate them in a Directed Acyclic Graph (DAG); these products are bundled into FileHandlers. FileHandlers are graph nodes with
DEPENDENCIES
(edges) specified. A Pipeline DAG can be built purely by specifying the specific FileHandlers which can be done trivially and clearly at the file-system level by putting the FileHandler code into a single directory for each pipeline.- Parameters
kwargs (dict) – Names of
FileHandler
classes mapped to the classes themselves.args (array-like) –
FileHandler
classes. The__name__
of eachFileHandler
will be used as the key.
- Returns
pipeline – A new
Pipeline
instance containing all of theFileHandler
classes specified inargs
andkwargs
.- Return type
- Raises
TypeError – If there are any name collisions between classes in the input
args
andkwargs
, if any of theFileHandler
classes it contains are abstract (non-implemented) classes, or if any of theFileHandler
classes it contains have missingrequired_attributes
.
-
check_consistency
(other)¶ Check whether two
Pipeline
instances use the same keys to describe the sameFileHandler
classes, raising aValueError
if they don’t.
-
dependency_graph
(outfile: str = None, title: str = 'Pipeline', url: function = None, bgcolor: str = 'black')¶ Return a graphviz .dot graph of
DEPENDENCIES
between file handlers in this pipeline. Optionally plot the graph to an output image file visualizing the graph.Optional file extensions for outfile:
dot: just save the dotfile in .dot format.
png: save the image in PNG format.
pdf: save the image in PDF format.
svg: save the image in svg format.
- Parameters
outfile (str, optional) – If not provided, return a string in
.dot
file format specifying graph relationsIf an output file is specified, infer the filetype and write to that file.title (str, optional) – The title of the pipeline graph plot.
url (FunctionType, optional) – A function taking
FileHandler
classes as input and returning a URL that will be added to eachFileHandler
class’s node in the output graph. Allows you to add links. If not included, URLs will not be included.bgcolor (str, optional) – The background color to use for the generated plot.
- Returns
dot – The dependency graph in
.dot
format (can be used as input todot
at the command line). This is returned regardless of whether an outfile is specified.- Return type
str
-
downselect
(invert=False, reducer=<built-in function all>, **kwargs)¶ Return a
Pipeline
instance whoseFileHandler
classes match ALL the given query parameters.- Parameters
invert (bool, optional) – Invert results. (Default: False)
reducer (function, optional) – Specify
any
builtin to match if any check passes. Specifyall
to match only when every check passes. (Default:all
)type (type, optional) – The type of the
FileHandler
must exactly match the givenFileHandler
.typename (str, optional) – The
FileHandler
type’s name must match this string.subclass (type, optional) – The
FileHandler
must be a subclass of thisFileHandler
.subgraph (type, optional) – The
FileHandler
must be either thisFileHandler
or one of itsUR_DEPENDENCIES
; use this to make aPipeline
that only generates the subgraph leading to thisFileHandler
.
-
file_handler_instances
(*args, **kwargs)¶ Return a FileHandlerMap with FileHandler instances sharing the same initialization arguments, e.g. for FileHandler instances that all refer to the same event.
-
classmethod
from_module
(module)¶ Create a pipeline by extracting all FileHandler objects from a given submodule.