The Gravitational Wave/High Energy Neutrino (LLAMA) analysis pipeline.
An event object, used to update and access data associated with a specific trigger (which recieves its own directory or
eventdir). FileHandler or another Event (or anything with ‘eventid’ and ‘rundir’ properties) as an input argument, in which case it will correspond to the same event as the object provided as an argument. One can also provide a module or dictionary containing FileHandlers, which will be used to create a
FileGraphfor the Event (i.e. it will specify which files should be made for this event). Defaults to the files module for now, though eventually this should be refactored out.
eventid_or_event (str or EventTuple or llama.filehandler.FileHandlerTuple) – This can be a string with the unique ID of this event (which can simply be a filename-friendly descriptive string for tests or manual analyses), in which case the next arguments,
pipeline, will be used; OR, alternatively, it can be an
FileHandlerinstance), or any object with valid
rundirattributes. In this case, those attributes from the provided object will be re-used, and the
rundirargument will be ignored. This makes it easy to get a new
Eventinstance describing the same underlying event but with a different
Pipelinespecified, or alternatively to get the
Eventcorresponding to a given
FileHandler(though in this case you should take care to manually specify the ``Pipeline`` you want to use!).
rundir (str, optional) – The
rundir, i.e. the directory where all events for a given run are stored, if it differs from the default and is not specified by
pipeline (llama.pipeline.Pipeline, optional) – The
pipeline, i.e. the set of FileHandlers that we want to generate, if it differs from the default pipeline. If none is provided, use
event – A new
Eventinstance with the given properties.
- Return type
ValueError – If the
eventid_or_eventargument does not conform to the above expectations or if the
rundirdirectory for the run does not exist, a ValueError with a descriptive message will be thrown.
Names of possible auxiliary paths in the directory that are used to track the state of the Event as a whole.
The time at which the permissions of this event directory were last changed (according to the underlying storage system). Note that you probably are more interested in
clone(commit='HEAD', rundir=None, clobber=False)¶
Make a clone of this event in a temporary directory for quick manipulations on a specific version of a file.
commit (str, optional) – The commit hash to check out when cloning this event. If not specified, the most recent commit will be used. Unsaved changes will be discarded.
rundir (str, optional) – The run directory in which to store the cloned event. If not specified, a temporary directory will be created and used. The contents of this directory will NOT be deleted automatically.
clobber (bool, optional) – Whether this cloned event should overwrite existing state.
clone_event (llama.event.Event) – A clone of this event. The full history is saved, but the specified
commitis checked out. Any uncommitted changes in the working directory will not be copied over to the
clone_eventalready seems to be a valid event with the correct
commithash, no further action will be taken (thus repeated cloning has little performance penalty).
llama.versioning.GitRepoUninitialized – If this is called on an
Eventthat has not had its git history initialized.
IOError – If this event already exists in the specified
rundirand is checked out to a different hash, unless
clobberis True, in which case that working directory will be deleted and replaced with the desired commit.
Compare the file contents of this event to another event using
filecmp.cmpfiles(though results are given as
FileHandlerinstances rather than file paths). Use this to see whether two event directories contain the same contents under a given pipeline.
other (Event, str) – The other
Eventinstance to compare this one to, or else a directory containing files that can be compared to this
Event(though in that case the filenames must still follow the expected format).
match (FileGraph) – A
Eventwhose files have the same contents as those corresponding to the
mismatch (FileGraph) – A
Eventwhose files have differing contents as those corresponding to the
errors (FileGraph) – A
Eventwhose corresponding files do not exist or otherwise could not be accessed for comparison (either for the files corresponding to this
ValueError – If the
Pipelineinstances of this
otherone are not equal, it does not make sense to compare them, and a
ValueErrorwill be raised.
Return a list of files in the event directory that are not associated with any file handler nor with event state directories.
The full path to the directory containing files related to this event.
Check whether this event already exists.
FileHandlerinstances for the files in this event with this particular
Initialize an event just by providing a filepath to its event directory. If no directory is specified, default to the current directory and try to treat that like an event. Note that the returned event will eliminate symbolic links when determining paths for
eventid. Useful for quickly making events during interactive work.
eventdir (str, optional) – The event directory from which to initialize a new event.
**kwargs – Remaining keyword arguments to pass to
Return the GPS time of this event. Returns -1 if none can be parsed.
Initialize the directory for this event, making sure it is in a proper state for processing data. Make sure the
eventdirexists by creating it if necessary. Also initializes version control and set flags to the defaults specified in
Eventinstance to allow command chaining.
- Return type
ValueError – If the
eventdirpath exists but is not a directory or a link to a directory, we don’t want to overwrite it to make an the directory.
The time at which this event directory was modified (according to the underlying storage system).
printstatus(cruft=False, highlight=None, unicode=True, plot=None)¶
Get a user-readable message indicating the current status of this event. Include a list of files not in the selected pipeline with
cruft=True. Bold lines in the summary table containing strings in
highlightas substrings. Use nice unicode characters and terminal colors with
unicode=True, or use plain ascii with
unicode=False. Include a status graph plot with
plot=True, or exclude it with
plot=None, include the plot only if the underlying
Graph::EasyPerl library is available on the host.
Save this event and all its contents as a gzipped tarball. You should probably use a
.tar.gzextension for the
Generate any files that fit the
FileGraphdownselection criteria specified in
downselect. By default, generate all files that have not been generated and regenerate all files that have been obsoleted because their data dependencies have changed. Returns
Trueif files were updated,
Falseif no files in need of update were found.
Return the timestamp of the first file version commit, catching the error if the event does not have versioning initialized/has no versions and returning
A single directory containing multiple event directories combined with a pipeline (i.e. a selection of analysis steps to use) and a set of downselection criteria for picking events:
Run Directory ├─ Event Directory 1 ├─ Event Directory 2 └─ Event Directory 3
This should ordinarily correspond to a run of some sort (an observing run, engineering run, offline run, test run, etc.) where the events are somehow related. Since this class mostly just provides methods for organizing and selecting
Eventinstances with tailored
Pipelineinstances, it’s up to you to decide how to best organize a run. Run objects are immutable to simplify hashing and uniqueness checks.
These tools allow the user to conveniently check on the status of all events in a given
Run. A dictionary of downselection arguments (as fed to
downselect) can be used to restrict the set of events that will be returned
rundir (str) – The directory where all events are stored. Files for individual events are stored in per-event subdirectories of
rundir. Will be converted to a canonical path with
os.path.realpathto help ensure unique
pipeline (llama.pipeline.Pipeline, optional) – A
FileHandlerclasses that should be used for this analysis. Defaults to the main pipeline in production use.
downselection (tuple, optional) – A tuple of dictionaries of keyword arguments of the type passed to
downselect. The events returned by
eventswill match these downselection criteria with each downselection dict applied in the order they appear in this argument (to allow more complex chained downselections). You probably don’t want to manually specify this; a more pythonic way to provide downselection arguments is to use the
downselectmethod to return a downselection from a starting
Runinstance identical to the current one but with the following downselection criteria applied to the
Eventinstances returned by
self.events. Can also specify a sorting function and a maximum number of returned values:
invert (bool, optional) – Invert what matches and what doesn’t. Default: False
eventid_filter (str, optional) – A glob (as taken by
fnmatch) that the
fileexists (str, optional) – The event directory contains a file with this name.
fhexists (llama.filehandler.FileHandler, optional) – The eventdir contains the file for this FileHandler.
fhnameexists (str, optional) – The eventdir contains the file for the FileHandler with this name.
fhmeta (llama.filehandler.FileHandler, optional) – The eventdir contains a metadata rider for the file for this FileHandler.
fhnamemeta (str, optional) – The eventdir contains a metadata rider for the file for the FileHandler with this name.
vetoed (bool, optional) – Whether the events have been vetoed by the VETOED flag or not.
manual (bool, optional) – Whether the events have been marked as manual by the MANUAL flag.
modbefore (float, optional) – Select events whose directory modtimes were before this timestamp.
modafter (float, optional) – Select events whose directory modtimes were after this timestamp.
sec_since_mod_gt (float, optional) – Select events whose directory modtimes are more than this many seconds ago.
sec_since_mod_lt (float, optional) – Select events whose directory modtimes are less than this many seconds ago.
v0before (float, optional) – Select events whose first event state version was generated before this timestamp. Will IGNORE directories that do not have any versioned files.
v0after (float, optional) – Select events whose first event state version was generated after this timestamp. Will IGNORE directories that do not have any versioned files.
sec_since_v0_gt (float, optional) – Select events whose first event state version was generated more than this many seconds ago. Will IGNORE directories that do not have any versioned files.
sec_since_v0_lt (float, optional) – Select events whose first event state version was generated less than this many seconds ago. Will IGNORE directories that do not have any versioned files.
sortkey (function, optional) – A function taking
Eventinstances that can be passed to
sortedto sort the downselected
Eventinstances. Default: None (i.e. no sorting)
reverse (bool, optional) – Whether to reverse the order of sorting (i.e. put the results in descending order) before applying
limit. Default: False
limit (int, optional) – Return up to this number of events. Most useful if
sortkeyhas also been provided. Default: None (i.e. no limit)
Runinstance with a pipeline that has been downselected using
Return a list of events in this run directory with
self.downselectioncriteria applied (see
downselectfor a list of possible downselection criteria).
sortkey (function, optional) – A sorting key (as passed to
sorted) to use to sort the returned events. If none is provided, the events will be sorted based on astrophysical event time using
Event.gpstime; beware that an error will be raised if this quantity is ill-defined for ANY of the returned events.
reverse (bool, optional) – Whether to reverse the default sort order, i.e. put in descending order.
Trueby default so that the most-recently-occuring events are first in the list.
Get a list of
Eventinstances matching this
Runinstance’s downselection criteria and update each event directory. Run until all events are up-to-date. Will queue files from each event that are ready to update, allowing them to be handled in parallel, and will track outstanding jobs. Optionally specify a
FileGraph.downselectdownselection argument to pass to each
FileGraphbeing updated (default is to regenerate all files needing regeneration). Be careful with this argument, as it will cause file generation attempts for matching files without checking whether they need to be generated.
A collection of visualization methods for this
A pipeline specifies a specific set of data inputs and the functions that act on them in terms of intermediate data products and the functions used to generate them in a Directed Acyclic Graph (DAG); these products are bundled into FileHandlers. FileHandlers are graph nodes with
DEPENDENCIES(edges) specified. A Pipeline DAG can be built purely by specifying the specific FileHandlers which can be done trivially and clearly at the file-system level by putting the FileHandler code into a single directory for each pipeline.
kwargs (dict) – Names of
FileHandlerclasses mapped to the classes themselves.
args (array-like) –
FileHandlerwill be used as the key.
pipeline – A new
Pipelineinstance containing all of the
FileHandlerclasses specified in
- Return type
TypeError – If there are any name collisions between classes in the input
kwargs, if any of the
FileHandlerclasses it contains are abstract (non-implemented) classes, or if any of the
FileHandlerclasses it contains have missing
Check whether two
Pipelineinstances use the same keys to describe the same
FileHandlerclasses, raising a
ValueErrorif they don’t.
dependency_graph(outfile: str = None, title: str = 'Pipeline', url: function = None, bgcolor: str = 'black')¶
Return a graphviz .dot graph of
DEPENDENCIESbetween file handlers in this pipeline. Optionally plot the graph to an output image file visualizing the graph.
Optional file extensions for outfile:
dot: just save the dotfile in .dot format.
png: save the image in PNG format.
pdf: save the image in PDF format.
svg: save the image in svg format.
outfile (str, optional) – If not provided, return a string in
.dotfile format specifying graph relationsIf an output file is specified, infer the filetype and write to that file.
title (str, optional) – The title of the pipeline graph plot.
url (FunctionType, optional) – A function taking
FileHandlerclasses as input and returning a URL that will be added to each
FileHandlerclass’s node in the output graph. Allows you to add links. If not included, URLs will not be included.
bgcolor (str, optional) – The background color to use for the generated plot.
dot – The dependency graph in
.dotformat (can be used as input to
dotat the command line). This is returned regardless of whether an outfile is specified.
- Return type
downselect(invert=False, reducer=<built-in function all>, **kwargs)¶
FileHandlerclasses match ALL the given query parameters.
invert (bool, optional) – Invert results. (Default: False)
reducer (function, optional) – Specify
anybuiltin to match if any check passes. Specify
allto match only when every check passes. (Default:
type (type, optional) – The type of the
FileHandlermust exactly match the given
typename (str, optional) – The
FileHandlertype’s name must match this string.
subclass (type, optional) – The
FileHandlermust be a subclass of this
subgraph (type, optional) – The
FileHandlermust be either this
FileHandleror one of its
UR_DEPENDENCIES; use this to make a
Pipelinethat only generates the subgraph leading to this
Return a FileHandlerMap with FileHandler instances sharing the same initialization arguments, e.g. for FileHandler instances that all refer to the same event.
Create a pipeline by extracting all FileHandler objects from a given submodule.