llama batch
¶
Batch process large sets of organized data using LLAMA. Input and output files
can be stored locally or on cloud data. Good for large-scale simulations of
injected or randmoized data. Use the command-line interface to specify where
input data should be loaded from, where outputs should be stored, and how many
events should be processed. These source/destination paths can be specified as
python format strings, in which case their variables will be populated with
variables sourced from a list of possible values (specified with --varlist
)
or from the environment (if no --varlist
is specified for that variable
name).
usage: llama batch [-h] [-V] [-D] [--err-alert]
[-f [FILEHANDLER [FILEHANDLER ...]]]
[+f [FILEHANDLER [FILEHANDLER ...]]]
[-p {DEFAULT_PIPELINE,SUBTHRESHOLD_PIPELINE,LLAMA2_REVIEW_PIPELINE}]
[--print-default-pipeline] [--dry-run-pipeline]
[--flags [FLAGNAME=value [FLAGNAME=value ...]]]
[--flag-presets] [--dry-run-flags] [-l LOGFILE]
[-v {debug,info,warning,error,critical,none}]
[--get [GET [GET ...]]] [--put [PUT [PUT ...]]]
[--errdump [ERRDUMP [ERRDUMP ...]]]
[--format [FORMAT [FORMAT ...]]] [--eventdir EVENTDIR]
[--random N] [--public]
[params [params ...]]
Named Arguments¶
- -V, --version
Print the version number and exit.
Default: False
- -D, --dev-mode
If specified, allow the program to run even if the LLAMA version does not conform to semantic version naming. You should not do this during production except in an emergency. If the flag is not specified but local changes to the source code exist,
llama run
will complain and quit immediately (the default behavior).Default: False
- --err-alert
Alert maintainers to unhandled exceptions.
- --flags
A single flag preset (see:
llama.flags
) to use (print choices with--flag-presets
) in--outdir
OR individual flag settings in the formatFLAGNAME=value
. YOU SHOULD PROBABLY USE A PRESET rather than individual flag settings. Any omitted flags will take on the default values inDEFAULT_FLAGS
. If you don’t specify--flags
, you’ll be prompted to provide one; just provide--flags
with no arguments to accept the default/existing flags. Flags are used to set overall behaviors for an event and set intentions, e.g. to mark an event as “ONLINE” and therefore allowed to communicate with partner web APIs and send out products and alerts. Flag name options and default values (for new events) are FlagPreset({‘VETOED’: ‘false’, ‘ROLE’: ‘test’, ‘UPLOAD’: ‘false’, ‘ICECUBE_UPLOAD’: ‘false’, ‘ONLINE’: ‘true’, ‘MANUAL’: ‘false’, ‘BLINDED_NEUTRINOS’: ‘false’}); the full set of allowed values is (‘true’, ‘false’) for MANUAL, (‘true’, ‘false’) for UPLOAD, (‘true’, ‘false’) for BLINDED_NEUTRINOS, (‘true’, ‘false’) for ONLINE, (‘true’, ‘false’) for VETOED, (‘true’, ‘false’) for ICECUBE_UPLOAD, and (‘test’, ‘observation’) for ROLE.- --flag-presets
Print available flag presets.
- --dry-run-flags
Print flags parsed by the CLI and exit.
Default: False
choose pipeline (see ``llama.pipeline``)¶
- -f, --filehandlers
Possible choices: SkymapInfo, RctSlkLmaCoincSummaryI3LvcPdf, LvcGcnXml, IceCubeNeutrinoListTex, LVAlertAdvok, LvcSkymapFits, RctSlkLmaCoincScatterZtfLVCPdf, CoincSignificanceSubthresholdI3Lvc, CoincScatterI3LvcPdf, RctSlkLmaLvcRetractionXml, PAstro, CoincSummaryI3LvcPdf, RctSlkLmaCoincScatterZtfI3LvcPng, ZtfTriggerList, RctSlkLmaLvcDistancesJson, LVCGraceDbEventData, RctSlkI3CoincSummaryI3LvcPdf, IceCubeNeutrinoListCoincTxt, LvcRetractionXml, RctSlkLmaCoincSignificanceSubthresholdI3Lvc, IceCubeNeutrinoList, IceCubeNeutrinoListTxt, RctSlkLmaLVAlertJSON, FermiGRBsJSON, RctSlkLmaLvcGcnXml, RctSlkLmaCoincScatterZtfLVCPng, CoincSummaryI3LvcTex, CoincSignificanceI3Lvc, RctSlkLmaCoincScatterI3LvcPng, CoincScatterZtfI3LvcPdf, RctSlkLmaCoincScatterZtfI3LvcPdf, CoincScatterI3LvcPng, CoincScatterZtfLVCPdf, LvcDistancesJson, LvcSkymapHdf5, CoincScatterZtfLVCPng, RctSlkLmaSkymapInfo, CoincScatterZtfI3LvcPng, RctSlkLmaCoincSignificanceI3Lvc, RctSlkLmaCoincScatterI3LvcPdf, Advok, RctSlkLmaLVCGraceDbEventData, LVAlertJSON
A list of
FileHandler
class names which should be used. If provided,FileHandler
classes whose names are in this list will be included in the pipeline. If the dependencies of a requested file are not available, no attempt will be made to generate them unless they too are listed explicitly. Available filehandlers are drawn from theDEFAULT_PIPELINE
(print them with--print-default-pipeline
).- +f, ++filehandlers
Possible choices: SkymapInfo, RctSlkLmaCoincSummaryI3LvcPdf, LvcGcnXml, IceCubeNeutrinoListTex, LVAlertAdvok, LvcSkymapFits, RctSlkLmaCoincScatterZtfLVCPdf, CoincSignificanceSubthresholdI3Lvc, CoincScatterI3LvcPdf, RctSlkLmaLvcRetractionXml, PAstro, CoincSummaryI3LvcPdf, RctSlkLmaCoincScatterZtfI3LvcPng, ZtfTriggerList, RctSlkLmaLvcDistancesJson, LVCGraceDbEventData, RctSlkI3CoincSummaryI3LvcPdf, IceCubeNeutrinoListCoincTxt, LvcRetractionXml, RctSlkLmaCoincSignificanceSubthresholdI3Lvc, IceCubeNeutrinoList, IceCubeNeutrinoListTxt, RctSlkLmaLVAlertJSON, FermiGRBsJSON, RctSlkLmaLvcGcnXml, RctSlkLmaCoincScatterZtfLVCPng, CoincSummaryI3LvcTex, CoincSignificanceI3Lvc, RctSlkLmaCoincScatterI3LvcPng, CoincScatterZtfI3LvcPdf, RctSlkLmaCoincScatterZtfI3LvcPdf, CoincScatterI3LvcPng, CoincScatterZtfLVCPdf, LvcDistancesJson, LvcSkymapHdf5, CoincScatterZtfLVCPng, RctSlkLmaSkymapInfo, CoincScatterZtfI3LvcPng, RctSlkLmaCoincSignificanceI3Lvc, RctSlkLmaCoincScatterI3LvcPdf, Advok, RctSlkLmaLVCGraceDbEventData, LVAlertJSON
Exact same as
--filehandlers
, but all ancestors of the files listed with the+
prefix will also be included. This means that, if you ask to generate a single file, an attempt will be made to also generate everything it depends on if necessary. If you want all those files made, this is a handy shortcut; if you want file generation to fail when ancestors are missing, use the-
prefix instead.- -p, --pipeline
Possible choices: DEFAULT_PIPELINE, SUBTHRESHOLD_PIPELINE, LLAMA2_REVIEW_PIPELINE
The name of the pipeline to use. Must be the name of a Pipeline instance from
llama.pipeline
. Available choices: [‘DEFAULT_PIPELINE’, ‘SUBTHRESHOLD_PIPELINE’, ‘LLAMA2_REVIEW_PIPELINE’]. If both this andfilehandlers
are specified, then the resulting pipeline will include all requested filehandlers from both options.- --print-default-pipeline
Print the contents of the default pipeline and quit.
- --dry-run-pipeline
Print the pipeline selected by the user and quit without taking further action. Use this to make sure you’ve specified the correct pipeline.
Default: False
logging settings¶
- -l, --logfile
File where logs should be written. By default, all logging produced by
llama run
goes to both an archival logfile shared by all instances of the process as well as STDERR. The archival logfile can be overridden with this argument. If you specify/dev/null
or a path that resolves to the same, logfile output will be suppressed automatically. Logs written to the logfile are always at maximum verbosity, i.e. DEBUG. (default: /root/.local/share/llama/logs/llama.batch.log)Default: “/root/.local/share/llama/logs/llama.batch.log”
- -v, --verbosity
Possible choices: debug, info, warning, error, critical, none
Set the verbosity level at which to log to STDOUT; the
--logfile
will ALWAYS receive maximum verbosity logs (unless it is completely supressed by writing to /dev/null). Available choices correspond to logging severity levels from thelogging
library, with the addition ofnone
if you want to completely suppress logging to standard out. (default: info)Default: “info”
simulation configuration¶
Remote file paths in the below arguments can be specified using regular URLs (by prefixing
http://
orhttps://
before the path) or Amazon/DigitalOcean S3 paths by prefixings3://{bucket}/
before the path (where{bucket
is replaced with the relevant bucket name``). Note that this will only work for unauthenticated HTTP/HTTPS endpoints, and the S3 downloads will only work if you have S3 configured (see:llama.com.s3
documentation). Further note that uploads to HTTP/HTTPS endpoints are not supported.
- params
Specify variable names and files that contain their possible values. Same syntax as
bash
variable assignments, but with the value set as the file containing possible values. The list of possible values can be a JSON or newline-delimited list. For example, if you have a list of values for thegraceid
located atdata/graceids.txt
and a list of values forneutrino_filename
located in S3 bucketllama
under the pathdata/neutrino_filenames.json
, you would write--params graceid=data/graceids.txt neutrino_filename=s3://llama/data/neutrino_filenames.json
. Note that the pipeline will run through the full Cartesian product of variable values from these lists; you can instead run through a pre-determined number of random vectors from this space (see--random
below).- --get
Specify paths from which files can be loaded to be used as inputs for a simulation run. This looks a lot like the
--params
syntax, but with the variable names equal to the output filename and the option of including variables using python format string syntax. For example, to downloadskymap_info.json
files from S3 directories with the GraceID as part of the path name, you might specify--get skymap_info.json=s3://llama/sim/{graceid}/skymap_info.json
.Default: OrderedDict()
- --put
Like
--get
, but instead you specify the destination to which an output file should be saved. This can be a local path or an S3 path.Default: OrderedDict()
- --errdump
Specify directories in which to dump the contents of the active event directory in case of an unhandled exception. You can specify multiple places for this dump (e.g. upload to S3 and save to a local directory) in order to aid debugging. Git history is not copied; everything else is. This argument is ignored during normal operation.
Default: ()
- --format
Specify filenames (as relative paths within the event directory) whose contents should be loaded and string formatted with values from
--params
and the environment. This operation happens after--get
and replaces the contents of the file, allowing you to use boilerplate files with parametric values inserted.- --eventdir
Specify a local path in which each simulated event should be saved. Once again, you can use a format string to specify a path that includes the
--params
. For example, save events to directories based on theirgraceid
andneutrino_filename
with--eventdir ~/sim/{graceid}-{neutrino_filename}/
. If this argument is omitted, then only outputs specified using--put
will be persisted, and the actual simulation directory will be discarded immediately after those files have been copied.- --random
If the
--random
flag is specified, then the pipeline will be run on random sequences of the variables specified in the--params
files, and the simulation will runN
times on randomized inputs from those files. SetN
to a negative integer to run indefinitely. If--random
is not specified, then the full Cartesian product of input variables from the--params
will be used in sequence.- --public
If specified, files uploaded to S3 will be publicly available for download. Otherwise, those files will require S3 credentials for access.
Default: False