llama package

The Gravitational Wave/High Energy Neutrino (LLAMA) analysis pipeline.

class llama.Event

Bases: llama.event.EventTuple, llama.flags.FlagsMixin, llama.versioning.GitDirMixin

An event object, used to update and access data associated with a specific trigger (which recieves its own directory or eventdir). FileHandler or another Event (or anything with ‘eventid’ and ‘rundir’ properties) as an input argument, in which case it will correspond to the same event as the object provided as an argument. One can also provide a module or dictionary containing FileHandlers, which will be used to create a FileGraph for the Event (i.e. it will specify which files should be made for this event). Defaults to the files module for now, though eventually this should be refactored out.

Parameters
  • eventid_or_event (str or EventTuple or llama.filehandler.FileHandlerTuple) – This can be a string with the unique ID of this event (which can simply be a filename-friendly descriptive string for tests or manual analyses), in which case the next arguments, rundir and pipeline, will be used; OR, alternatively, it can be an EventTuple (e.g. another Event instance), FileHandlerTuple (e.g. any FileHandler instance), or any object with valid eventid and rundir attributes. In this case, those attributes from the provided object will be re-used, and the rundir argument will be ignored. This makes it easy to get a new Event instance describing the same underlying event but with a different Pipeline specified, or alternatively to get the Event corresponding to a given FileHandler (though in this case you should take care to manually specify the ``Pipeline`` you want to use!).

  • rundir (str, optional) – The rundir, i.e. the directory where all events for a given run are stored, if it differs from the default and is not specified by eventid_or_event.

  • pipeline (llama.pipeline.Pipeline, optional) – The pipeline, i.e. the set of FileHandlers that we want to generate, if it differs from the default pipeline. If none is provided, use DEFAULT_PIPELINE.

Returns

event – A new Event instance with the given properties.

Return type

Event

Raises

ValueError – If the eventid_or_event argument does not conform to the above expectations or if the rundir directory for the run does not exist, a ValueError with a descriptive message will be thrown.

property auxiliary_paths

Names of possible auxiliary paths in the directory that are used to track the state of the Event as a whole.

change_time()

The time at which the permissions of this event directory were last changed (according to the underlying storage system). Note that you probably are more interested in modification_time.

clone(commit='HEAD', rundir=None, clobber=False)

Make a clone of this event in a temporary directory for quick manipulations on a specific version of a file.

Parameters
  • commit (str, optional) – The commit hash to check out when cloning this event. If not specified, the most recent commit will be used. Unsaved changes will be discarded.

  • rundir (str, optional) – The run directory in which to store the cloned event. If not specified, a temporary directory will be created and used. The contents of this directory will NOT be deleted automatically.

  • clobber (bool, optional) – Whether this cloned event should overwrite existing state.

Returns

  • clone_event (llama.event.Event) – A clone of this event. The full history is saved, but the specified commit is checked out. Any uncommitted changes in the working directory will not be copied over to the clone_event. If clone_event already seems to be a valid event with the correct commit hash, no further action will be taken (thus repeated cloning has little performance penalty).

  • Raises

  • llama.versioning.GitRepoUninitialized – If this is called on an Event that has not had its git history initialized.

  • IOError – If this event already exists in the specified rundir and is checked out to a different hash, unless clobber is True, in which case that working directory will be deleted and replaced with the desired commit.

compare_contents(other)

Compare the file contents of this event to another event using filecmp.cmpfiles (though results are given as FileHandler instances rather than file paths). Use this to see whether two event directories contain the same contents under a given pipeline.

Parameters

other (Event, str) – The other Event instance to compare this one to, or else a directory containing files that can be compared to this Event (though in that case the filenames must still follow the expected format).

Returns

  • match (FileGraph) – A FileGraph for this Event whose files have the same contents as those corresponding to the other event.

  • mismatch (FileGraph) – A FileGraph for this Event whose files have differing contents as those corresponding to the other event.

  • errors (FileGraph) – A FileGraph for this Event whose corresponding files do not exist or otherwise could not be accessed for comparison (either for the files corresponding to this Event or the other one).

Raises

ValueError – If the Pipeline instances of this Event and the other one are not equal, it does not make sense to compare them, and a ValueError will be raised.

property cruft_files

Return a list of files in the event directory that are not associated with any file handler nor with event state directories.

property eventdir

The full path to the directory containing files related to this event.

exists()

Check whether this event already exists.

property files

Get a FileGraph full of FileHandler instances for the files in this event with this particular pipeline.

classmethod fromdir(eventdir='.', **kwargs)

Initialize an event just by providing a filepath to its event directory. If no directory is specified, default to the current directory and try to treat that like an event. Note that the returned event will eliminate symbolic links when determining paths for rundir and eventid. Useful for quickly making events during interactive work.

Parameters
  • eventdir (str, optional) – The event directory from which to initialize a new event.

  • **kwargs – Remaining keyword arguments to pass to Event().

gpstime()

Return the GPS time of this event. Returns -1 if none can be parsed.

init()

Initialize the directory for this event, making sure it is in a proper state for processing data. Make sure the eventdir exists by creating it if necessary. Also initializes version control and set flags to the defaults specified in FlagsMixin.DEFAULT_FLAGS (which Event inherits).

Returns

Returns this Event instance to allow command chaining.

Return type

self

Raises

ValueError – If the eventdir path exists but is not a directory or a link to a directory, we don’t want to overwrite it to make an the directory.

modification_time()

The time at which this event directory was modified (according to the underlying storage system).

printstatus(cruft=False, highlight=None, unicode=True, plot=None)

Get a user-readable message indicating the current status of this event. Include a list of files not in the selected pipeline with cruft=True. Bold lines in the summary table containing strings in highlight as substrings. Use nice unicode characters and terminal colors with unicode=True, or use plain ascii with unicode=False. Include a status graph plot with plot=True, or exclude it with plot=False; if plot=None, include the plot only if the underlying Graph::Easy Perl library is available on the host.

save_tarball(outfile)

Save this event and all its contents as a gzipped tarball. You should probably use a .tar.gz extension for the outfile name.

update(**downselect)

Generate any files that fit the FileGraph downselection criteria specified in downselect. By default, generate all files that have not been generated and regenerate all files that have been obsoleted because their data dependencies have changed. Returns True if files were updated, False if no files in need of update were found.

v0_time()

Return the timestamp of the first file version commit, catching the error if the event does not have versioning initialized/has no versions and returning False.

class llama.Run

Bases: llama.run.RunTuple

A single directory containing multiple event directories combined with a pipeline (i.e. a selection of analysis steps to use) and a set of downselection criteria for picking events:

Run Directory ├─ Event Directory 1 ├─ Event Directory 2 └─ Event Directory 3

This should ordinarily correspond to a run of some sort (an observing run, engineering run, offline run, test run, etc.) where the events are somehow related. Since this class mostly just provides methods for organizing and selecting Event instances with tailored Pipeline instances, it’s up to you to decide how to best organize a run. Run objects are immutable to simplify hashing and uniqueness checks.

These tools allow the user to conveniently check on the status of all events in a given Run. A dictionary of downselection arguments (as fed to downselect) can be used to restrict the set of events that will be returned events.

Parameters
  • rundir (str) – The directory where all events are stored. Files for individual events are stored in per-event subdirectories of rundir. Will be converted to a canonical path with os.path.realpath to help ensure unique Run definitions.

  • pipeline (llama.pipeline.Pipeline, optional) – A Pipeline instance holding FileHandler classes that should be used for this analysis. Defaults to the main pipeline in production use.

  • downselection (tuple, optional) – A tuple of dictionaries of keyword arguments of the type passed to downselect. The events returned by events will match these downselection criteria with each downselection dict applied in the order they appear in this argument (to allow more complex chained downselections). You probably don’t want to manually specify this; a more pythonic way to provide downselection arguments is to use the downselect method to return a downselection from a starting Run.

downselect(**kwargs)

Get another Run instance identical to the current one but with the following downselection criteria applied to the Event instances returned by self.events. Can also specify a sorting function and a maximum number of returned values:

Parameters
  • invert (bool, optional) – Invert what matches and what doesn’t. Default: False

  • eventid_filter (str, optional) – A glob (as taken by fnmatch) that the eventid must match.

  • fileexists (str, optional) – The event directory contains a file with this name.

  • fhexists (llama.filehandler.FileHandler, optional) – The eventdir contains the file for this FileHandler.

  • fhnameexists (str, optional) – The eventdir contains the file for the FileHandler with this name.

  • fhmeta (llama.filehandler.FileHandler, optional) – The eventdir contains a metadata rider for the file for this FileHandler.

  • fhnamemeta (str, optional) – The eventdir contains a metadata rider for the file for the FileHandler with this name.

  • vetoed (bool, optional) – Whether the events have been vetoed by the VETOED flag or not.

  • manual (bool, optional) – Whether the events have been marked as manual by the MANUAL flag.

  • modbefore (float, optional) – Select events whose directory modtimes were before this timestamp.

  • modafter (float, optional) – Select events whose directory modtimes were after this timestamp.

  • sec_since_mod_gt (float, optional) – Select events whose directory modtimes are more than this many seconds ago.

  • sec_since_mod_lt (float, optional) – Select events whose directory modtimes are less than this many seconds ago.

  • v0before (float, optional) – Select events whose first event state version was generated before this timestamp. Will IGNORE directories that do not have any versioned files.

  • v0after (float, optional) – Select events whose first event state version was generated after this timestamp. Will IGNORE directories that do not have any versioned files.

  • sec_since_v0_gt (float, optional) – Select events whose first event state version was generated more than this many seconds ago. Will IGNORE directories that do not have any versioned files.

  • sec_since_v0_lt (float, optional) – Select events whose first event state version was generated less than this many seconds ago. Will IGNORE directories that do not have any versioned files.

  • sortkey (function, optional) – A function taking Event instances that can be passed to sorted to sort the downselected Event instances. Default: None (i.e. no sorting)

  • reverse (bool, optional) – Whether to reverse the order of sorting (i.e. put the results in descending order) before applying limit. Default: False

  • limit (int, optional) – Return up to this number of events. Most useful if sortkey has also been provided. Default: None (i.e. no limit)

downselect_pipeline(invert=False, **kwargs)

Return a Run instance with a pipeline that has been downselected using Pipeline.downselect.

property events

Return a list of events in this run directory with self.downselection criteria applied (see downselect for a list of possible downselection criteria).

Parameters
  • sortkey (function, optional) – A sorting key (as passed to sorted) to use to sort the returned events. If none is provided, the events will be sorted based on astrophysical event time using Event.gpstime; beware that an error will be raised if this quantity is ill-defined for ANY of the returned events.

  • reverse (bool, optional) – Whether to reverse the default sort order, i.e. put in descending order. True by default so that the most-recently-occuring events are first in the list.

update(**downselect)

Get a list of Event instances matching this Run instance’s downselection criteria and update each event directory. Run until all events are up-to-date. Will queue files from each event that are ready to update, allowing them to be handled in parallel, and will track outstanding jobs. Optionally specify a FileGraph.downselect downselection argument to pass to each FileGraph being updated (default is to regenerate all files needing regeneration). Be careful with this argument, as it will cause file generation attempts for matching files without checking whether they need to be generated.

property vis

A collection of visualization methods for this Run instance.

class llama.Pipeline

Bases: llama.classes.ImmutableDict, llama.classes.NamespaceMappable

A pipeline specifies a specific set of data inputs and the functions that act on them in terms of intermediate data products and the functions used to generate them in a Directed Acyclic Graph (DAG); these products are bundled into FileHandlers. FileHandlers are graph nodes with DEPENDENCIES (edges) specified. A Pipeline DAG can be built purely by specifying the specific FileHandlers which can be done trivially and clearly at the file-system level by putting the FileHandler code into a single directory for each pipeline.

Parameters
  • kwargs (dict) – Names of FileHandler classes mapped to the classes themselves.

  • args (array-like) – FileHandler classes. The __name__ of each FileHandler will be used as the key.

Returns

pipeline – A new Pipeline instance containing all of the FileHandler classes specified in args and kwargs.

Return type

Pipeline

Raises

TypeError – If there are any name collisions between classes in the input args and kwargs, if any of the FileHandler classes it contains are abstract (non-implemented) classes, or if any of the FileHandler classes it contains have missing required_attributes.

check_consistency(other)

Check whether two Pipeline instances use the same keys to describe the same FileHandler classes, raising a ValueError if they don’t.

dependency_graph(outfile: str = None, title: str = 'Pipeline', url: function = None, bgcolor: str = 'black')

Return a graphviz .dot graph of DEPENDENCIES between file handlers in this pipeline. Optionally plot the graph to an output image file visualizing the graph.

Optional file extensions for outfile:

  • dot: just save the dotfile in .dot format.

  • png: save the image in PNG format.

  • pdf: save the image in PDF format.

  • svg: save the image in svg format.

Parameters
  • outfile (str, optional) – If not provided, return a string in .dot file format specifying graph relationsIf an output file is specified, infer the filetype and write to that file.

  • title (str, optional) – The title of the pipeline graph plot.

  • url (FunctionType, optional) – A function taking FileHandler classes as input and returning a URL that will be added to each FileHandler class’s node in the output graph. Allows you to add links. If not included, URLs will not be included.

  • bgcolor (str, optional) – The background color to use for the generated plot.

Returns

dot – The dependency graph in .dot format (can be used as input to dot at the command line). This is returned regardless of whether an outfile is specified.

Return type

str

downselect(invert=False, reducer=<built-in function all>, **kwargs)

Return a Pipeline instance whose FileHandler classes match ALL the given query parameters.

Parameters
  • invert (bool, optional) – Invert results. (Default: False)

  • reducer (function, optional) – Specify any builtin to match if any check passes. Specify all to match only when every check passes. (Default: all)

  • type (type, optional) – The type of the FileHandler must exactly match the given FileHandler.

  • typename (str, optional) – The FileHandler type’s name must match this string.

  • subclass (type, optional) – The FileHandler must be a subclass of this FileHandler.

  • subgraph (type, optional) – The FileHandler must be either this FileHandler or one of its UR_DEPENDENCIES; use this to make a Pipeline that only generates the subgraph leading to this FileHandler.

file_handler_instances(*args, **kwargs)

Return a FileHandlerMap with FileHandler instances sharing the same initialization arguments, e.g. for FileHandler instances that all refer to the same event.

classmethod from_module(module)

Create a pipeline by extracting all FileHandler objects from a given submodule.