llama.filehandler package

Abstract definitions of FileHandler classes. FileHandlers provide methods for defining and working with data associated with a trigger.

class llama.filehandler.EventTriggeredFileHandler

Bases: llama.filehandler.FileHandler

A data file that is downloaded or received as the result of an event notification from some external source, e.g. a post request or a GCN notification. The generate() method should be able to take any arguments necessary for the creation of this file, since this method is likely to be invoked by an event-handler script. These FileHandler classes should not have DEPENDENCIES because they are created by events external to the dependency graph.

DEPENDENCIES = ()
class llama.filehandler.FileGraph

Bases: llama.classes.ImmutableDict, llama.classes.NamespaceMappable

Used to store a list of FileHandler instances, e.g. those associated with a particular GraceDB event. In that example, one might access the LvcSkymapHdf5 file handler associated with some event using

> event.files.LvcSkymapHdf5

Has the nice feature of being able to take a dictionary as an initialization argument and create a dot-notation acessible map from that dictionary’s key-value pairs.

dependency_graph(outfile: str = None, title: str = 'FileGraph', urls: str = None, bgcolor: str = 'black')

Return a graphviz .dot graph of the FileHandler instances in this FileGraph and their DEPENDENCIES on each other. Plot their status (whether the file exists) as well as their metadata. Optionally plot the graph to an output image file visualizing the graph.

Parameters
  • outfile (str, optional) – If not provided, return a string in .dot file format specifying graph relations. If an output file is specified, infer the filetype and write to that file.

  • title (str, optional) – The name of the graph to use. If not provided, “FileGraph” will be used.

  • urls (str, optional) – A format string for including URLs attributes for each FileHandler node. If provided, include a URL attribute pointing to each FILENAME for use on the summary page website, where the FILENAME of each FileHandler instance will be provided to the urls format string as the only format argument. The URL attribute is not compatible with all graphviz output formats (http://www.graphviz.org/doc/info/attrs.html#d:URL) so make sure to turn it off if publishing to an incompatible format.

  • bgcolor (str, optional) – The background color to use for the generated plot.

Returns

dot – The dependency graph in .dot format (can be used as input to dot at the command line). This is returned regardless of whether an outfile is specified.

Return type

str

Raises
  • ValueError – If the FileHandler instances in this FileGraph do not all refer to the same event.

  • Optional file extensions for outfile:

  • - dot – just save the dotfile in .dot format.:

  • - png – save the image in PNG format.:

  • - pdf – save the image in PDF format.:

  • - svg – save the image in svg format.:

dependency_graph_term(unicode=True, plot=None, highlight=None)

Like dependency_graph, but return a terminal-friendly text plot of this FileGraph with current statuses as a string. By default, uses unicode characters to print a more legible graph; use pure ascii by passing unicode=False. Highlight a specific line in the output table by specifying a substring in that row as highlight.

Skip the graph plot with plot=False; if plot=None, try to make the graph but give up if Graph::Easy is not installed.

The plot requires Perl and the Graph::Easy module to be installed (with the graph-easy script in the current path); if plot=True and graph-easy is not available, a FileNotFound error will be raised.

downselect(invert=False, reducer=<built-in function all>, **kwargs)

Get a FileGraph of file handlers that match all of the provided criteria. Checks in this docstring are defined in DOWNSELECT_CHECKS.

Parameters
  • invert (bool) – Invert results. (Default: False)

  • reducer (function) – Specify the any builtin to match if any check passes. Specify all to match only when every check passes. (Default: all)

  • equals (AbstractFileHandler or list) – The filehandler must equal the provided FileHandler instance or, if a list of instances is provided, it must be equal to one of them.

  • instanceof (type) – The FileHandler must be an instance of this class.

  • type (type) – The type of the FileHandler must exactly match.

  • typename (str or list) – The FileHandler type’s name must match this string (or one of these strings if given an iterable of strings).

  • extension (str or list) – The file extension for the file name (or one of these strings if given an iterable of strings).

  • startswith (str or list) – The file name starts with this string (or one of these strings if given an iterable of strings).

  • endswith (str or list) – The file name ends with this string (or one of these strings if given an iterable of strings).

  • inname (str or list) – The file name contains this substring (or one of these strings if given an iterable of strings).

  • nameis (str or list) – The filename equals this string (or one of these strings if given an iterable of strings).

  • depends (type or list) – Has this FileHandler class (or one of these FileHandler classes) as a dependency.

  • ancestor (type or list) – Has this FileHandler class (or one of these FileHandler classes) as an ur dependency, i.e. steps AFTER this FileHandler.

  • descendent (type or list) – Has this FileHandler class (or one of these FileHandler classes) as a descendent, i.e. steps BEFORE this FileHandler.

  • subgraph (type or list) – Same as descendent, but also include FileHandlers that are instances of the query arguments. Defines a subgraph of the original FileGraph graph containing only the given FileHandler instances and their DEPENDENCIES.

  • dependsname (str or list) – Has the FileHandler with this name (or one of these names if given an iterable of strings) as a dependency.

  • dependsfile (str or list) – Has the FileHandler with this filename (or one of these names if given an iterable of strings) as a dependency.

  • exists (bool) – Whether the returned FileHandler instances’ output files exist.

  • cooldown (bool) – Whether the returned FileHandler instances are currently cooling down.

  • intent (bool) – Whether the returned FileHandler instances are currently being generated.

  • vetoed (bool) – Whether the returned FileHandler instances have been permanently vetoed.

  • urvetoed (bool) – Whether the returned FileHandler instances have had any of their ancestors vetoed. Does not check the instances themselves (combine with vetoed for that).

  • obsolete (bool) – Whether the returned FileHandler instances’ output files exist but are obsoleted due to newer input data being available. NB Obsolescense implies that the file exists! Will not include FileHandler results whose files don’t exist.

  • selfobsolete (bool) – Same as obsolete, but don’t run obsolescence checks on each file’s ancestors; only run checks relevant relevant to each FileHandler instance.

  • depsmet (bool) – Whether the returned FileHandler instances’ DEPENDENCIES have been met. These can be generated immediately if they don’t exist.

  • needregen (bool) – Whether the returned FileHandler instances that have their DEPENDENCIES met and either exist (but are obsolete) or have not been generated yet.

Returns

matches – A new FileGraph instance containing a subset of the FileHandler instances for this FileGraph that match the given downselection criteria.

Return type

FileGraph

status()

Return a status description of this graph for use in status-checking scripts and webpages. Values are taken from NODELEGEND_COLOR_FMT so that they can be colored appropriately.

update(**downselect)

Generate any files that fit the FileGraph downselection criteria specified in downselect. By default, generate all files that have not been generated and regenerate all files that have been obsoleted because their data DEPENDENCIES have changed. Will only generate files that are immediately generateable; if some of the files you want to generate depend on files that haven’t been generated yet, you’ll need to keep running this method until you’ve generated each successive layer of dependencies (and, finally, your target files). You can do this by running the method repeatedly until it returns False.

Parameters

**downselect – Keyword arguments that can be passed to Event.downselect to narrow down the set of FileHandler instances that should be generated.

Returns

files_submitted_for_generation – An iterable of FileHandler instances that have been checked out and submitted for generation.

Return type

Iterable

class llama.filehandler.FileGraphTuple(eventid, rundir, pipeline)

Bases: tuple

property eventid

Alias for field number 0

property pipeline

Alias for field number 2

property rundir

Alias for field number 1

class llama.filehandler.FileHandler

Bases: llama.classes.AbstractFileHandler, llama.classes.RequiredAttributeMixin, llama.intent.IntentMixin, llama.intent.CoolDownMixin, llama.versioning.GitDirMixin, llama.flags.FlagsMixin, llama.vetoes.VetoMixin, llama.lock.LockMixin, llama.meta.MetaDataMixin, llama.io.classes.IOMixin

A class for generating, opening, and checking existence of data files associated with these events. Specify the maximum amount of time that should be spent generating each file by setting the TIMEOUT attribute of the relevant implementation class. Instances are immutable other than their parent and graph attributes.

FileHandler instances can check whether their corresponding output data has been generated and stored. If the data in their input DEPENDENCIES have changed, they can dynamically check whether their corresponding outputs need to be regenerated. Use DEP_CHECKSUM_KWARGS to specify which subsets of input data are relevant to each FileHandler; these keyword arguments will be fed to the checksum methods of each dependency to see whether relevant subsets of input data have changed (causing the current version of the FileHandler instance to become obsolete and triggering automatic file regeneration on the next update).

The following class attributes must be defined in subclasses, either manually or programmatically (see: FileHandler.set_class_attributes and its implementations in subclasses).

FILENAMEstr

The base filename for this filehandler as it will appear in an event directory.

DEPENDENCIESTuple[FileHandler]

A tuple of other FileHandler subclasses whose data this FileHandler uses in order to generate its own output.

MANIFEST_TYPESTuple[FileHandler]

A tuple of other FileHandler subclasses that are generated at the same time as this one. In other words, running self.generate for any of the subclasses in MANIFEST_TYPES will produce all of the files in that tuple.

UR_DEPENDENCIESTuple[FileHandler]

Return a list of ur-dependencies, i.e. DEPENDENCIES (of DEPENDENCIES etc.) of this FileHandler, i.e. FileHandler classes whose data is ultimately used (after some number of steps in the DAG) to generate this FileHandler. The list is ordered such that its files can be generated in order without encountering missing dependencies (i.e. items deepest in cls.UR_DEPENDENCY_TREE come first).

UR_DEPENDENCY_TREEImmutableDict

A dict of all DEPENDENCIES of DEPENDENCIES going back to the original input files that are ultimately required to generate this file. Maps FileHandler classes to dictionaries of their own ancestry trees recursively starting at the current FileHandler. The deepest items in the dictionary are the furthest DEPENDENCIES up the dependency graph (and consequently the files that must be generated first in order for the shallower files to be generated). See FileHandler.UR_DEPENDENCIES for a flattened, ordered version.

Parameters
  • eventid_or_fh (str, llama.Event, or llama.FileHandler) – the eventid of the file handler or else another FileHandler or Event instance (though anything with eventid and rundir properties will work). If such an object is provided, the rundir will be inferred therefrom.

  • rundir (str, optional) – The directory in which events from this run are being stored. If eventid_or_fh is an object with a rundir attribute, then that value of rundir will be used and this argument will be ignored. (See DEFAULT_RUN_DIR for default value.) Overrides the rundir specified in eventid_or_fh.

  • parent (FileHandler, optional) – If this FileHandler instance is being used to generate a different FileHandler instance, specify the original instance as the parent. Overrides the rundir specified in eventid_or_fh.

Raises

AssertionError – If class constants in cls.required_attributes are not defined or if cls.MANIFEST_TYPES has any inconsistencies.

COOLDOWN_PARAMS = CoolDownParams(base=60, increment=60, maximum=14400)
DEPENDENCIES = None
DEP_CHECKSUM_KWARGS = frozenset({})
FILENAME = None
MANIFEST_TYPES = None
TIMEOUT = 20
UR_DEPENDENCIES = None
UR_DEPENDENCY_TREE = None
are_dependencies_met()

Check whether the data needed to generate this filetype exists. If there are no DEPENDENCIES (i.e. no input), this file cannot possibly be made (since outputs are considered to be pseudo-functional mappings from inputs, having no inputs means that no meaningful output can exist).

are_ur_dependencies_met()

Recursively check whether DEPENDENCIES for this file can be generated (by in turn running this same check on their DEPENDENCIES). In short, this is a check as to whether we can eventually get to generating this file or whether the required data to get to this node in the FileHandler DAG is simply not in the eventdir. If there are no DEPENDENCIES (i.e. no input), this file cannot possibly be made (since outputs are considered to be pseudo-functional mappings from inputs, having no inputs means that no meaningful output can exist).

property auxiliary_paths

Return all names of possible auxiliary (rider) files associated with this FileHandler instance. These are things like metadata, cooldown, veto, and locking files.

checkin(gen_result)

Copy all generated output files generated by the temporary FileHandler created with checkout (as listed in manifest) to this event directory once generation is complete.

gen_resultstr

The generation result to be checked in, containing the temporary filehandler with the results of the generation attempt as well as any errors raised.

Returns

self – Returns the successfully checked-in FileHandler instance.

Return type

FileHandler

checkout()

Create a temporary directory for generating new files in. This ensures that, in the event of file generation failure, the original event directory is unchanged. It also ensures that the state of the original event directory is not modified until the file generation is completed, assisting in parallel file generation. DEPENDENCIES are hardlinked to the temporary event directory to ensure that no extra data is being drawn from files not explicitly declared as such.

Returns

tmp_self – A copy of the current FileHandler pointing to the temporary directory where this file will actually be generated.

Return type

FileHandler

checksum(**kwargs)

Get the sha256 checksum of this file’s contents. Use this to version files and check whether their contents have changed.

Parameters

kwargs (dict) – (Ignored in the base FileHandler implementation). Subclass implementations can optionally use input arguments to identify relevant subsets of the file’s contents for versioning or diff-checking purposes. In this way, checks can be made for changes on only specific subsets of file contents. This is useful for ignoring changes to unused input data when checking for obsolescence.

property chmod

Change read/write/execute permissions of a file. mode has same meaning as in os.chmod.

compare_contents(other)

Check whether the contents of this FileHandler instance’s file are the same as the contents of the other FileHandler instance’s file.

Parameters
  • other (FileHandler, str) – The FileHandler to compare to this one or else a path to a file to compare to this one.

  • Returns

  • same (bool) – Returns True if both files exist and have the same contents. Otherwise, returns False.

delete()

Delete this file if it exists, along with all of its rider files, and commit that change to version control.

dep_checksums()

Recalculate the sha256 sums of the contents of the input files (i.e. DEPENDENCIES) for this FileHandler instance’s manifest_filehandlers (either to store them or to check whether they have changed from the stored values).

Returns

  • dep_checksums (dict) – Keys are FileHandler.clsname values for each of this FileHandler class’s DEPENDENCIES and values are the corresponding checksums of each file. These checksums uniquely determine the exact input files used and are suitable for creating snapshots of pipeline state.

  • dep_subset_checksums (dict) – Same as checksums but with self.DEP_CHECKSUM_KWARGS applied to each dependency’s checksum method to only check whether the data used by self.generate has changed. These checksums determine whether the DEPENDENCIES have changed in a way that is meaningful to this specific filehandler (since unused fields are ignored) and are suitable for determining whether a FileHandler needs to be regenerated based on the availability of new data. (To avoid unnecessary computation, these checksums are only calculated separately from checksums when a dependency has a set of DEP_CHECKSUM_KWARGS defined for it.)

diff_contents(other, force=False)

Return a diff of two text files. If the files are not text files (i.e. if their file extensions are not in TEXT_FILE_EXTENSIONS), prints “Binary files self and other differ” (with self and other replaced with their full file paths).

Parameters
  • other (FileHandler, str) – The FileHandler to diff to this one or a path to a file to diff with this one.

  • force (bool, optional) – Whether to force the diff (as if self and other are both text files) even if the files are not recognized as a form of text file.

  • Returns

  • diff (str) – A textual diff of the two files’ contents, provided they are both recognized as text files (or force is True); or else a message saying that they differ. Prints nothing if the files are the same.

property eventdir

The directory where data for the event this FileHandler corresponds to is stored.

exists()

Check whether the file associated with this handler has yet been generated.

property filename_for_download

Get a filename that includes the eventid, revision number, and version hash for this file (i.e. what version number this is in the version history; e.g. if three versions of this file exist in the version history, then this is version 3). If this file does not appear in the git history, it will be marked ‘v0’ and the hash will be ‘UNVERSIONED’. The output format is eventid, version, first 7 digits of commit hash, and filename, split by hyphens, so that the third version of skymap_info.json for event S1234a with git hash dedb33f would be called S1234a-v3-dedb33f-skymap_info.json. Use this for file downloads or files sent to other services in order to facilitate data product tracking outside the highly-organized confines of a pipeline run directory.

property fullpath

The full path to the file referred to by this FileHandler.

generate(*args, **kwargs)

Make the next version of a file syncronously, assuming it does not exist or is in need of updating, in a syncronous manner. In the event that some sort of error causes file generation go fail, delete the output file (if it exists). You should ALWAYS use this instead of _generate() in order to generate and version files safely and atomically without risk of disrupting parallel file manipulations. If you want to do something more complicated, like updating multiple files in an event directory, or even multiple events, use functions like Run.events.update, Event.update, FileHandler.subgraph.update, or FileGraph.downselect().update.

generate_unsafe(*args, **kwargs)

Generate this file without performing any of the usual checkout and checkin procedures; no cleanup will happen if file generation fails. You almost certainly want to call generate instead, or if you need to execute more than a single step of the pipeline, FileGraph.update with an appropriate downselection.

Returns

gen_result – A GenerationResult containing any errors raised during any part of file generation.

Return type

GenerationResult

property graph

Get the FileGraph to which this FileHandler instance belongs. This can be set dynamically to associate a FileHandler with a given FileGraph (and hence a given Pipeline). If graph is not manually set, it defaults to the subgraph of this FileHandler.

is_obsolete(checked=None, ancestors=True)

Check whether this file exists but needs to be regenerated by seeing whether any of its DEPENDENCIES have updated their file contents since this file was made. This is a somewhat conservative check to see whether any files need to be regenerated automatically; it will return False if there is any ambiguity (in which case you will need to manually delete and regenerate child files). You can extend this definition with extra obsolescence criteria, but make sure to call super to keep this automatic regeneration functionality in response to regenerated DEPENDENCIES.

If the file is marked as “locked” (see: llama.lock.LockHandler), the file will never be marked obsolete. This provides a way to manually prevent file obsolescence in situations in which it does not apply.

If any of this file’s ancestors are obsolete, then this file will be marked obsolete. You can skip this check with ancestors=False.

Failing that, if the output file does not exist, or if its input DEPENDENCIES do not exist, it is not considered obsolete, and this method will return False. Likewise, if the file has no DEPENDENCIES, it cannot be naively obsoleted, and this method will return False.

Failing that, it will be marked obsolete if its dep_subset_checksums()[1] (see: FileHandler.dep_checksums second return value) have changed from their previously recorded values. If those checksums are not recorded, it will not be marked as obsolete.

Parameters
  • checked (dict, optional) – A dictionary mapping FileHandler instances that have previously had their obsolescences checked mapped to whether they are yet obsolete; used internally to track whether this FileHandler instance’s DEPENDENCIES are obsolete without recomputing them.

  • ancestors (bool, optional) – If False, don’t check whether ancestors are obsolete; only run this FileHandler instance’s obsolescence checks.

property manifest

A set of filenames generated by this FileHandler (not including temp files that should be deleted after generation). By default, this set only includes the filename for this FileHandler. If generate fails, these files will be removed to enforce atomicity.

In general, you should not need to modify this, since it will return filenames from the FileHandler instances in self.manifest_filehandlers, which should contain an exhaustive list of FileHandler instances associated with the _generate method that creates them.

property manifest_filehandlers

A set of FileHandler instances generated by this FileHandler (not including temp files that should be deleted after generation). By default, this set only includes the filename for this FileHandler. If generate fails, these files will be removed to enforce atomicity.

If you are making a bunch of FileHandler subclasses that are all generated with the same method, you should define an abstract base class for those FileHandler classes with a suitable _generate method and a manifest_filehandlers property that returns all of their filenames. The subclasses then only need to return those FileHandler instances (and any other relevant methods and properties unique to each subclass FileHandler).

modtime(ts=None)

Get a datetime object with the modification time of this file, assuming it exists.

Parameters

ts (int or float, optional) – If the file does not exist, return a datetime parsed from the UNIX timestamp given by ts if provided; otherwise, return None. Use this to provide a fallback modification time in cases when modification times might need to be compared between existing files and files whose existence is uncertain.

open(mode='r')

Open this file in readonly mode and return the resulting object.

property read_bytes

Read full file into memory as a bytes object. If memmap is True, create a memory-map object (workalike to a bytes object) mapped to the data on disk (what you want to use if you don’t know the file size, since it could be larger than available memory). Note that if memmap=True, you’ll need to manually close the returned bytes-like object to free up the file descriptor when you’re done using its close method. Returns the resulting bytes-like object containing the file’s data. Raises a FileNotFoundError if the file does not exist.

classmethod set_class_attributes(subclass)

Decorater for a new subclass that sets its MANIFEST_TYPES class attribute to the default FileHandler value without requiring them to be manually specified. Also determines the UR_DEPENDENCIES and UR_DEPENDENCY_TREE based on subclass.DEPENDENCIES. NB: manually set class attributes will be overwritten by this decorator.

Parameters

subclass (type) – The subclass whose class attributes need to be set.

Returns

subclass – The same decorated subclass.

Return type

type

size()

Get the size in bytes of this file. Raises a FileNotFoundError if this file does not exist.

status(obscheck=None)

Return the status string (see NODELEGEND_COLORS) for a given FileHandler instance.

property subgraph

Get all ur DEPENDENCIES along with this FileHandler instance in a single FileGraph instance.

classmethod sync_shared_manifests(*filehandlers)

When implementing a FileHandler with its own subclasses in its manifest (the easiest pattern for ensuring that the FileHandler classes in the manifest share a common generate method and DEPENDENCIES), use this decorator before each class definition to make sure that the base FileHandler (with the shared generate function) as well as its subclasses (which distinguish between its outputs by each having their own FILENAME attributes).

Parameters

*filehandlers (FileHandler or str) – FileHandler subclasses that are generated together.

Returns

The first FileHandler subclass from filehandlers (this allows the function to operate as a class decorator).

Return type

first_filehandler

Raises
  • ValueError – If provided filehandlers are not subclasses of this class or if no filehandlers are provided.

  • TypeError – If provided filehandlers do not share the same DEPENDENCIES and _generate methods as this class, or if any of them lack a FILENAME attribute.

property write_bytes

Write data (a bytes-like object) to file, replacing any existing file contents. data can be a memory-mapped object or a file-object opened in binary mode (autodetected), allowing write_bytes to work on memory-mapped byte arrays returned by read_bytes. Raises a ValueError if bytes can’t be read from data.

class llama.filehandler.GenerateOnceMixin

Bases: object

An object that, once generated, never becomes obsolete; must be manually regenerated.

is_obsolete(checked=None, **kwargs)

Because this class inherits from GenerateOnceMixin, it is never marked obsolete automatically; it must be manually regenerated. This function always returns False. See FileHandler.is_obsolete for the meaning of checked.

class llama.filehandler.JSONFile

Bases: llama.filehandler.FileHandler

A FileHandler abstract subclass providing tools to work with JSON dictionaries.

checksum(fields=None)

Get the sha256 checksum of this file’s contents. Use this to version files and check whether their contents have changed. Optionally only use specific fields in generating the checksum to ignore irrelevant changes (e.g. when determining file obsolescence).

Parameters
  • fields (tuple or list, optional) – A tuple of tuples of strings, with each sub-tuple containing strings or integers indexing into this JSON-file’s contents (strings for dict fields, integers for list fields) to specify only relevant fields. See example below.

  • kwargs (dict) – Remaining keyword arguments are ignored (see FileHandler.checksum note on kwargs).

Raises
  • KeyError – If a field that is expected in a sub-dictionary is not found.

  • IndexError – If an entry that is expected in a sub-list is not found.

  • TypeError – If you attempt to index into a sub field that is not a dictionary using a string or if fields cannot be indexed into.

Examples

For a JSON dictionary like: >>> foo = { … “names”: [“stef”, “countryman”], … “age”: 27, … “eyes”: { … “left”: “brown”, … “right”: “brown”, … } … }

You can calculate the checksum using only the first name and left eye-color values by using these fields: >>> fields = ( … (‘names’, 0), … (‘eyes’, ‘left’) … )

property html_table

Generate an HTML table representation of the JSON data contained in this FileHandler.

read_json()

Read in this JSON file as a dictionary.

llama.filehandler.NOW()

Returns new datetime object representing current time local to tz.

tz

Timezone object.

If no tz is specified, uses local timezone.

class llama.filehandler.Status(status, stats, color)

Bases: tuple

property color

Alias for field number 2

property stats

Alias for field number 1

property status

Alias for field number 0

class llama.filehandler.TriggerList

Bases: llama.filehandler.FileHandler

A file that contains lists of triggers of some sort.

DETECTORS = None
abstract property num_triggers

The number of triggers described by this file. Useful mostly for quickly determining if this trigger list is empty.

llama.filehandler.fromtimestamp()

timestamp[, tz] -> tz’s local time from POSIX timestamp.

llama.filehandler.recursive_obsolescence(func)

Store is_obsolete values for repeated calls to make sure that we don’t recompute them while recursively checking is_obsolete values.

llama.filehandler.utcnow()

Return a new datetime representing UTC day and time.

Submodules