llama batch

Batch process large sets of organized data using LLAMA. Input and output files can be stored locally or on cloud data. Good for large-scale simulations of injected or randmoized data. Use the command-line interface to specify where input data should be loaded from, where outputs should be stored, and how many events should be processed. These source/destination paths can be specified as python format strings, in which case their variables will be populated with variables sourced from a list of possible values (specified with --varlist) or from the environment (if no --varlist is specified for that variable name).

usage: llama batch [-h] [-V] [-D] [-f [FILEHANDLER [FILEHANDLER ...]]]
                   [+f [FILEHANDLER [FILEHANDLER ...]]]
                   [-p {DEFAULT_PIPELINE,SUBTHRESHOLD_PIPELINE,LLAMA2_REVIEW_PIPELINE}]
                   [--print-default-pipeline] [--dry-run-pipeline]
                   [--flags [FLAGNAME=value [FLAGNAME=value ...]]]
                   [--flag-presets] [-l LOGFILE]
                   [-v {debug,info,warning,error,critical,none}]
                   [--get [GET [GET ...]]] [--put [PUT [PUT ...]]]
                   [--errdump [ERRDUMP [ERRDUMP ...]]] [--erralert]
                   [--format [FORMAT [FORMAT ...]]] [--eventdir EVENTDIR]
                   [--random N] [--public]
                   [params [params ...]]

Named Arguments

-V, --version

Print the version number and exit.

Default: False

-D, --dev-mode

If specified, allow the program to run even if the LLAMA version does not conform to semantic version naming. You should not do this during production except in an emergency. If the flag is not specified but local changes to the source code exist, llama run will complain and quit immediately (the default behavior).

Default: False

--flags

A single flag preset (see: llama.flags) to use (print choices with --flag-presets) in --outdir OR individual flag settings in the format FLAGNAME=value. YOU SHOULD PROBABLY USE A PRESET rather than individual flag settings. If you don’t specify a flag preset or a set of flags manually, you’ll be prompted to provide one; just provide --flags with no arguments to accept the default/existing flags. Flags are used to set overall behaviors for an event and set intentions, e.g. to mark an event as “ONLINE” and therefore allowed to communicate with partner web APIs and send out products and alerts. Flag name options and default values (for new events) are FlagPreset({‘VETOED’: ‘false’, ‘ROLE’: ‘test’, ‘UPLOAD’: ‘false’, ‘BLINDED_NEUTRINOS’: ‘false’, ‘MANUAL’: ‘false’, ‘ONLINE’: ‘true’, ‘ICECUBE_UPLOAD’: ‘false’}); the full set of allowed values is (‘test’, ‘observation’) for ROLE, (‘true’, ‘false’) for BLINDED_NEUTRINOS, (‘true’, ‘false’) for ONLINE, (‘true’, ‘false’) for ICECUBE_UPLOAD, (‘true’, ‘false’) for MANUAL, (‘true’, ‘false’) for UPLOAD, and (‘true’, ‘false’) for VETOED.

--flag-presets

Print available flag presets.

choose pipeline (see ``llama.pipeline``)

-f, --filehandlers

Possible choices: FermiGRBsJSON, CoincSummaryI3LvcTex, ZtfTriggerList, CoincScatterZtfLVCPdf, RctSlkLmaCoincSignificanceSubthresholdI3Lvc, RctSlkLmaSkymapInfo, SkymapInfo, CoincScatterZtfI3LvcPdf, LvcSkymapHdf5, LVAlertJSON, CoincScatterI3LvcPng, RctSlkLmaLvcGcnXml, RctSlkI3CoincSummaryI3LvcPdf, CoincScatterZtfLVCPng, LVCGraceDbEventData, RctSlkLmaCoincScatterZtfI3LvcPdf, LvcDistancesJson, IceCubeNeutrinoListCoincTxt, LvcRetractionXml, CoincScatterI3LvcPdf, IceCubeNeutrinoListTxt, LvcSkymapFits, RctSlkLmaCoincSignificanceI3Lvc, RctSlkLmaCoincSummaryI3LvcPdf, CoincScatterZtfI3LvcPng, LvcGcnXml, RctSlkLmaCoincScatterZtfLVCPng, Advok, RctSlkLmaLvcRetractionXml, RctSlkLmaLvcDistancesJson, LVAlertAdvok, PAstro, RctSlkLmaCoincScatterI3LvcPdf, RctSlkLmaCoincScatterI3LvcPng, CoincSummaryI3LvcPdf, RctSlkLmaLVCGraceDbEventData, RctSlkLmaCoincScatterZtfI3LvcPng, RctSlkLmaLVAlertJSON, RctSlkLmaCoincScatterZtfLVCPdf, CoincSignificanceSubthresholdI3Lvc, IceCubeNeutrinoListTex, IceCubeNeutrinoList, CoincSignificanceI3Lvc

A list of FileHandler class names which should be used. If provided, FileHandler classes whose names are in this list will be included in the pipeline. If the dependencies of a requested file are not available, no attempt will be made to generate them unless they too are listed explicitly. Available filehandlers are drawn from the DEFAULT_PIPELINE (print them with --print-default-pipeline).

+f, ++filehandlers

Possible choices: FermiGRBsJSON, CoincSummaryI3LvcTex, ZtfTriggerList, CoincScatterZtfLVCPdf, RctSlkLmaCoincSignificanceSubthresholdI3Lvc, RctSlkLmaSkymapInfo, SkymapInfo, CoincScatterZtfI3LvcPdf, LvcSkymapHdf5, LVAlertJSON, CoincScatterI3LvcPng, RctSlkLmaLvcGcnXml, RctSlkI3CoincSummaryI3LvcPdf, CoincScatterZtfLVCPng, LVCGraceDbEventData, RctSlkLmaCoincScatterZtfI3LvcPdf, LvcDistancesJson, IceCubeNeutrinoListCoincTxt, LvcRetractionXml, CoincScatterI3LvcPdf, IceCubeNeutrinoListTxt, LvcSkymapFits, RctSlkLmaCoincSignificanceI3Lvc, RctSlkLmaCoincSummaryI3LvcPdf, CoincScatterZtfI3LvcPng, LvcGcnXml, RctSlkLmaCoincScatterZtfLVCPng, Advok, RctSlkLmaLvcRetractionXml, RctSlkLmaLvcDistancesJson, LVAlertAdvok, PAstro, RctSlkLmaCoincScatterI3LvcPdf, RctSlkLmaCoincScatterI3LvcPng, CoincSummaryI3LvcPdf, RctSlkLmaLVCGraceDbEventData, RctSlkLmaCoincScatterZtfI3LvcPng, RctSlkLmaLVAlertJSON, RctSlkLmaCoincScatterZtfLVCPdf, CoincSignificanceSubthresholdI3Lvc, IceCubeNeutrinoListTex, IceCubeNeutrinoList, CoincSignificanceI3Lvc

Exact same as --filehandlers, but all ancestors of the files listed with the + prefix will also be included. This means that, if you ask to generate a single file, an attempt will be made to also generate everything it depends on if necessary. If you want all those files made, this is a handy shortcut; if you want file generation to fail when ancestors are missing, use the - prefix instead.

-p, --pipeline

Possible choices: DEFAULT_PIPELINE, SUBTHRESHOLD_PIPELINE, LLAMA2_REVIEW_PIPELINE

The name of the pipeline to use. Must be the name of a Pipeline instance from llama.pipeline. Available choices: [‘DEFAULT_PIPELINE’, ‘SUBTHRESHOLD_PIPELINE’, ‘LLAMA2_REVIEW_PIPELINE’]. If both this and filehandlers are specified, then the resulting pipeline will include all requested filehandlers from both options.

--print-default-pipeline

Print the contents of the default pipeline and quit.

--dry-run-pipeline

Print the pipeline selected by the user and quit without taking further action. Use this to make sure you’ve specified the correct pipeline.

Default: False

logging settings

-l, --logfile

File where logs should be written. By default, all logging produced by llama run goes to both an archival logfile shared by all instances of the process as well as STDERR. The archival logfile can be overridden with this argument. If you specify /dev/null or a path that resolves to the same, logfile output will be suppressed automatically. Logs written to the logfile are always at maximum verbosity, i.e. DEBUG. (default: /root/.local/share/llama/logs/llama.batch.log)

Default: “/root/.local/share/llama/logs/llama.batch.log”

-v, --verbosity

Possible choices: debug, info, warning, error, critical, none

Set the verbosity level at which to log to STDOUT; the --logfile will ALWAYS receive maximum verbosity logs (unless it is completely supressed by writing to /dev/null). Available choices correspond to logging severity levels from the logging library, with the addition of none if you want to completely suppress logging to standard out. (default: info)

Default: “info”

simulation configuration

Remote file paths in the below arguments can be specified using regular URLs (by prefixing http:// or https:// before the path) or Amazon/DigitalOcean S3 paths by prefixing s3://{bucket}/ before the path (where {bucket is replaced with the relevant bucket name``). Note that this will only work for unauthenticated HTTP/HTTPS endpoints, and the S3 downloads will only work if you have S3 configured (see: llama.com.s3 documentation). Further note that uploads to HTTP/HTTPS endpoints are not supported.

params

Specify variable names and files that contain their possible values. Same syntax as bash variable assignments, but with the value set as the file containing possible values. The list of possible values can be a JSON or newline-delimited list. For example, if you have a list of values for the graceid located at data/graceids.txt and a list of values for neutrino_filename located in S3 bucket llama under the path data/neutrino_filenames.json, you would write --params graceid=data/graceids.txt neutrino_filename=s3://llama/data/neutrino_filenames.json. Note that the pipeline will run through the full Cartesian product of variable values from these lists; you can instead run through a pre-determined number of random vectors from this space (see --random below).

--get

Specify paths from which files can be loaded to be used as inputs for a simulation run. This looks a lot like the --params syntax, but with the variable names equal to the output filename and the option of including variables using python format string syntax. For example, to download skymap_info.json files from S3 directories with the GraceID as part of the path name, you might specify --get skymap_info.json=s3://llama/sim/{graceid}/skymap_info.json.

Default: OrderedDict()

--put

Like --get, but instead you specify the destination to which an output file should be saved. This can be a local path or an S3 path.

Default: OrderedDict()

--errdump

Specify directories in which to dump the contents of the active event directory in case of an unhandled exception. You can specify multiple places for this dump (e.g. upload to S3 and save to a local directory) in order to aid debugging. Git history is not copied; everything else is. This argument is ignored during normal operation.

Default: ()

--erralert

If specified, alert maintainers to every error via Slack.

Default: False

--format

Specify filenames (as relative paths within the event directory) whose contents should be loaded and string formatted with values from --params and the environment. This operation happens after --get and replaces the contents of the file, allowing you to use boilerplate files with parametric values inserted.

--eventdir

Specify a local path in which each simulated event should be saved. Once again, you can use a format string to specify a path that includes the --params. For example, save events to directories based on their graceid and neutrino_filename with --eventdir ~/sim/{graceid}-{neutrino_filename}/. If this argument is omitted, then only outputs specified using --put will be persisted, and the actual simulation directory will be discarded immediately after those files have been copied.

--random

If the --random flag is specified, then the pipeline will be run on random sequences of the variables specified in the --params files, and the simulation will run N times on randomized inputs from those files. Set N to a negative integer to run indefinitely. If --random is not specified, then the full Cartesian product of input variables from the --params will be used in sequence.

--public

If specified, files uploaded to S3 will be publicly available for download. Otherwise, those files will require S3 credentials for access.

Default: False