llama.versioning module

Classes for versioning files using in a given directory. Currently implemented with git.

class llama.versioning.GitDirMixin

Bases: object

A mixin for EventTuple and FileHandlerTuple subclasses that allows you to manipulate their event directories through a git property returning a GitHandler pointing to that property.

static decorate_checkin(func)

If generation and check in succeeded, commit changes to event history.

static decorate_checkout(func)

Commit the state of the event before file generation attempt to the event’s history and proceed with checkout.

property git

Get a GitHandler for manipulating the eventdir as a git repository. Used for versioning events.

class llama.versioning.GitHandler

Bases: llama.versioning.GitHandlerTuple

A class that performs git operations on an eventdir.

You can also call an instance as if it were a function to perform git commands conveniently; the interface is the same as subprocess.Popen with cwd set to the GitHandler’s eventdir (for convenience at the command line).

eventdirstr

The path to the directory that the new GitHandler instance will manipulate.

add(*files)

Run git add for all files. Raises a GitRepoUninitialized exception if not a git repository.

commit_changes(message)

git add all files in the eventdir and commit changes using message as the commit message. Raises a GitRepoUninitialized exception if not a git repository. This will FAIL with a GenerationError if there are no new changes.

copy_file(filename, outpath, commit_hash=None, serial_version=None)

Check out a copy of a file, optionally specifying a particular version of the file from this event’s history, to the given outpath. If no version is specified with commit_hash or serial_version, the latest version will be copied.

Parameters
  • filename (str) – The relative path to the file from self.eventdir (in most cases just the filename).

  • outpath (str) – The path to the output file, or, if this path corresponds to an existing directory, the directory in which it should be saved (with the os.path.basename of``filename``). If the file exists, it will be overwritten without warning.

  • commit_hash (str, optional) – The commit hash, or partial commit hash containing the starting characters of the full hash (as long as enough characters are provided to disambiguate hashes), of the version of filename that is to be copied to outpath. You can only specify one of commit_hash or serial_version.

  • serial_version (int, optional) – The serial_version (i.e. the numbered version) of filename to checkout. This is potentially more ambiguous than using commit_hash. You can only specify one of commit_hash or serial_version.

Returns

outfile – Path to the final output file.

Return type

str

Raises
  • GitRepoUninitialized – If the event directory is not a git directory.

  • ValueError – If both commit_hash and serial_version are specified or if they do not correspond to available file versions.

  • IOError – If the outpath cannot be written to.

  • FileNotFoundError – If the file checkout fails.

property current_hash

Get the current git hash for this directory.

diff(*args)

Return the git diff for the given file paths (from their last commits) as a string. Raises a GitRepoUninitialized exception if not a git repository. This diff can be applied using git apply.

Parameters

*args (str, optional) – File paths relative to the root of the git directory whose diffs should be taken. If no args are provided, the result will always be an empty string.

Returns

diff – The exact text returned by git diff ARG1 ARG2... for the provided arguments. An empty string is returned if none of the file contents of the given paths have changed since the last commit OR if no paths are specified (note that this differs from standard git diff behavior, where ALL diffs from the last commit are provided if no arguments are specified).

Return type

str

property eventid

Parse an eventid from the eventdir by splitting off the basename.

filename_for_download(filename, last_hash=None)

Get a filename that includes the eventid, revision number, and version hash for filename (i.e. what version number this is in the version history; e.g. if three versions of this file exist in the version history, then this is version 3). If this filename does not appear in the git history, it will be marked ‘v0’ and the hash will be ‘UNVERSIONED’. The output format is eventid, version, first 7 digits of commit hash, and filename, split by hyphens, so that the third version of skymap_info.json for event S1234a with git hash dedb33f would be called S1234a-v3-dedb33f-skymap_info.json. Use this for file downloads or files sent to other services in order to facilitate data product tracking outside the highly-organized confines of a pipeline run directory.

hashes(*filenames, pretty='', last_hash=None)

Get a list of full commit hashes for all commits related to the provided filenames. Returns an empty list if no filenames are provided or if the filename is not being tracked by git.

Parameters
  • filenames (list) – Relative paths from the eventdir whose commits should be retrieved. Returns an empty list if no filenames are specified. To match all paths in the commit history, specify ‘–’ as the only filename.

  • pretty (str, optional) – The git format string specifying what to return for each commit. By default, only returns the git hash for each commit pertaining to the given filenames.

  • last_hash (str, optional) – If specified, only return hashes up to and including this hash; does not return hashes appearing topoligically later than this one. This can be a partial hash containing only the starting characters of the full hash (e.g. the first 7 characters, as is typically seen elsewhere) as long as enough characters are provided to disambiguate the available hashes.

Returns

hashes – A list of git checksums for the commits related to the specified filenames (or some other per-commmit string whose contents are defined by pretty).

Return type

list

Raises
  • GitRepoUninitialized – If not a git repository.

  • ValueError – If the command cannot be run with the given filenames in the given eventdir.

  • ValueError – If the input last_hash is ambiguous (matches more than one hash) or if it matches no hashes.

init()

Initialize the eventdir as a git repository.

is_ancestor(possible_ancestor_hash, commit_hash)

Check whether possible_ancestor_hash is a topological ancestor of commit_hash. Returns True if the hashes refer to the same commit. Raises a GitRepoUninitialized exception if not a git repository. Useful for figuring out if one commit came after another (from a data flow perspective).

Returns

is_ancestor – True if possible_ancestor_hash is an ancestor of commit_hash, False otherwise. NOTE that a value of False does not imply that commit_hash is an ancestor of possible_ancestor_hash (since they can be from different branches alltogether).

Return type

bool

is_clean()

Return whether there are any changes made to the eventdir since the last commit. Raises a GitRepoUninitialized exception if not a git repository.

is_repo()

Checks whether this event directory is a git repo by seeing if it contains a .git subdirectory. Raises a GitRepoUninitialized exception if not a git repository.

remove(*files)

Run git rm for all files. Raises a GitRepoUninitialized exception if not a git repository.

reset_hard(ref=None)

Hard reset the status of the branch to a given ref, losing all subsequent changes. If ref is not provided, reset to the last commit.

serial_version(last_hash=None)

The serial version of this file as stored in the version history. Note that this is merely a count of how many prior versions of the file exist in this history; it is not an unambiguous label (in the same way that the hash value is). Use this for human and interpretation. If the file does not exist, this function returns 0 (unversioned), so it effectively starts at 1.

show_log(ref='HEAD')

Show the git commit message and notes for the given ref.

text_graph(*filenames, style='html')

Print a text graph of all files in the past history.

Parameters
  • *filenames (str, optional) – An arbitrary list of filenames that will be spliced onto the end of the argument list for git log. Use this to narrow down the history shown. Use -- to specify all files in the past history of the HEAD state.

  • style (str, optional) – The format to put the output in. Options include ‘html’ (if this is going to go on a summary page).

exception llama.versioning.GitRepoUninitialized

Bases: ValueError

An exception indicating that a git repository has not been initialized.