SmartSim API

SmartSim API#

Experiment#

`Experiment.__init__`(name[, exp_path, launcher])	Initialize an Experiment instance.
`Experiment.start`(*args[, block, summary, ...])	Start passed instances using Experiment launcher
`Experiment.stop`(*args)	Stop specific instances launched by this `Experiment`
`Experiment.create_ensemble`(name[, params, ...])	Create an `Ensemble` of `Model` instances
`Experiment.create_model`(name, run_settings)	Create a general purpose `Model`
`Experiment.create_database`([port, path, ...])	Initialize an `Orchestrator` database
`Experiment.create_run_settings`(exe[, ...])	Create a `RunSettings` instance.
`Experiment.create_batch_settings`([nodes, ...])	Create a `BatchSettings` instance
`Experiment.generate`(*args[, tag, overwrite, ...])	Generate the file structure for an `Experiment`
`Experiment.poll`([interval, verbose, ...])	Monitor jobs through logging to stdout.
`Experiment.finished`(entity)	Query if a job has completed.
`Experiment.get_status`(*args)	Query the status of launched entity instances
`Experiment.reconnect_orchestrator`(checkpoint)	Reconnect to a running `Orchestrator`
`Experiment.preview`(*args[, verbosity_level, ...])	Preview entity information prior to launch.
`Experiment.summary`([style])	Return a summary of the `Experiment`
`Experiment.telemetry`	Return the telemetry configuration for this entity.

class Experiment(name: str, exp_path: str | None = None, launcher: str = 'local')[source]#

Bases: object

Experiment is a factory class that creates stages of a workflow and manages their execution.

The instances created by an Experiment represent executable code that is either user-specified, like the Model instance created by Experiment.create_model, or pre-configured, like the Orchestrator instance created by Experiment.create_database.

Experiment methods that accept a variable list of arguments, such as Experiment.start or Experiment.stop, accept any number of the instances created by the Experiment.

In general, the Experiment class is designed to be initialized once and utilized throughout runtime.

Initialize an Experiment instance.

With the default settings, the Experiment will use the local launcher, which will start all Experiment created instances on the localhost.

Example of initializing an Experiment with the local launcher

exp = Experiment(name="my_exp", launcher="local")

SmartSim supports multiple launchers which also can be specified based on the type of system you are running on.

exp = Experiment(name="my_exp", launcher="slurm")

If you want your Experiment driver script to be run across multiple system with different schedulers (workload managers) you can also use the auto argument to have the Experiment detect which launcher to use based on system installed binaries and libraries.

exp = Experiment(name="my_exp", launcher="auto")

The Experiment path will default to the current working directory and if the Experiment.generate method is called, a directory with the Experiment name will be created to house the output from the Experiment.

Parameters:

name (str) – name for the Experiment
exp_path (Optional[str], default: None) – path to location of Experiment directory
launcher (str, default: 'local') – type of launcher being used, options are “slurm”, “pbs”, “lsf”, “sge”, or “local”. If set to “auto”, an attempt will be made to find an available launcher on the system.

create_batch_settings(nodes: int = 1, time: str = '', queue: str = '', account: str = '', batch_args: Dict[str, str] | None = None, **kwargs: Any) → smartsim.settings.base.BatchSettings[source]#

Create a BatchSettings instance

Batch settings parameterize batch workloads. The result of this function can be passed to the Ensemble initialization.

the batch_args parameter can be used to pass in a dictionary of additional batch command arguments that aren’t supported through the smartsim interface

# i.e. for Slurm
batch_args = {
    "distribution": "block"
    "exclusive": None
}
bs = exp.create_batch_settings(nodes=3,
                               time="10:00:00",
                               batch_args=batch_args)
bs.set_account("default")

Parameters:

nodes (int, default: 1) – number of nodes for batch job
time (str, default: '') – length of batch job
queue (str, default: '') – queue or partition (if slurm)
account (str, default: '') – user account name for batch system
batch_args (Optional[Dict[str, str]], default: None) – additional batch arguments

Return type:

BatchSettings

Returns:

a newly created BatchSettings instance

Raises:

SmartSimError – if batch creation fails

create_database(port: int = 6379, path: str | None = None, db_nodes: int = 1, batch: bool = False, hosts: str | List[str] | None = None, run_command: str = 'auto', interface: str | List[str] = 'ipogif0', account: str | None = None, time: str | None = None, queue: str | None = None, single_cmd: bool = True, db_identifier: str = 'orchestrator', **kwargs: Any) → smartsim.database.orchestrator.Orchestrator[source]#

Initialize an Orchestrator database

The Orchestrator database is a key-value store based on Redis that can be launched together with other Experiment created instances for online data storage.

When launched, Orchestrator can be used to communicate data between Fortran, Python, C, and C++ applications.

Machine Learning models in Pytorch, Tensorflow, and ONNX (i.e. scikit-learn) can also be stored within the Orchestrator database where they can be called remotely and executed on CPU or GPU where the database is hosted.

To enable a SmartSim Model to communicate with the database the workload must utilize the SmartRedis clients. For more information on the database, and SmartRedis clients see the documentation at https://www.craylabs.org/docs/smartredis.html

Parameters:

port (int, default: 6379) – TCP/IP port
db_nodes (int, default: 1) – number of database shards
batch (bool, default: False) – run as a batch workload
hosts (Union[str, List[str], None], default: None) – specify hosts to launch on
run_command (str, default: 'auto') – specify launch binary or detect automatically
interface (Union[str, List[str]], default: 'ipogif0') – Network interface
account (Optional[str], default: None) – account to run batch on
time (Optional[str], default: None) – walltime for batch ‘HH:MM:SS’ format
queue (Optional[str], default: None) – queue to run the batch on
single_cmd (bool, default: True) – run all shards with one (MPMD) command
db_identifier (str, default: 'orchestrator') – an identifier to distinguish this orchestrator in multiple-database experiments

Raises:

SmartSimError – if detection of launcher or of run command fails
SmartSimError – if user indicated an incompatible run command for the launcher

Return type:

Orchestrator

Returns:

Orchestrator or derived class

create_ensemble(name: str, params: Dict[str, Any] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None, run_settings: smartsim.settings.base.RunSettings | None = None, replicas: int | None = None, perm_strategy: str = 'all_perm', path: str | None = None, **kwargs: Any) → smartsim.entity.ensemble.Ensemble[source]#

Create an Ensemble of Model instances

Ensembles can be launched sequentially or as a batch if using a non-local launcher. e.g. slurm

Ensembles require one of the following combinations of arguments:

run_settings and params

run_settings and replicas

batch_settings

batch_settings, run_settings, and params

batch_settings, run_settings, and replicas

If given solely batch settings, an empty ensemble will be created that Models can be added to manually through Ensemble.add_model(). The entire Ensemble will launch as one batch.

Provided batch and run settings, either params or replicas must be passed and the entire ensemble will launch as a single batch.

Provided solely run settings, either params or replicas must be passed and the Ensemble members will each launch sequentially.

The kwargs argument can be used to pass custom input parameters to the permutation strategy.

Parameters:

name (str) – name of the Ensemble
params (Optional[Dict[str, Any]], default: None) – parameters to expand into Model members
batch_settings (Optional[BatchSettings], default: None) – describes settings for Ensemble as batch workload
run_settings (Optional[RunSettings], default: None) – describes how each Model should be executed
replicas (Optional[int], default: None) – number of replicas to create
perm_strategy (str, default: 'all_perm') – strategy for expanding params into Model instances from params argument options are “all_perm”, “step”, “random” or a callable function.

Raises:

SmartSimError – if initialization fails

Return type:

Ensemble

Returns:

Ensemble instance

create_model(name: str, run_settings: smartsim.settings.base.RunSettings, params: Dict[str, Any] | None = None, path: str | None = None, enable_key_prefixing: bool = False, batch_settings: smartsim.settings.base.BatchSettings | None = None) → smartsim.entity.model.Model[source]#

Create a general purpose Model

The Model class is the most general encapsulation of executable code in SmartSim. Model instances are named references to pieces of a workflow that can be parameterized, and executed.

Model instances can be launched sequentially, as a batch job, or as a group by adding them into an Ensemble.

All Models require a reference to run settings to specify which executable to launch as well provide options for how to launch the executable with the underlying WLM. Furthermore, batch a reference to a batch settings can be added to launch the Model as a batch job through Experiment.start. If a Model with a reference to a set of batch settings is added to a larger entity with its own set of batch settings (for e.g. an Ensemble) the batch settings of the larger entity will take precedence and the batch setting of the Model will be strategically ignored.

Parameters supplied in the params argument can be written into configuration files supplied at runtime to the Model through Model.attach_generator_files. params can also be turned into executable arguments by calling Model.params_to_args

By default, Model instances will be executed in the exp_path/model_name directory if no path argument is supplied. If a Model instance is passed to Experiment.generate, a directory within the Experiment directory will be created to house the input and output files from the Model.

Example initialization of a Model instance

from smartsim import Experiment
run_settings = exp.create_run_settings("python", "run_pytorch_model.py")
model = exp.create_model("pytorch_model", run_settings)

# adding parameters to a model
run_settings = exp.create_run_settings("python", "run_pytorch_model.py")
train_params = {
    "batch": 32,
    "epoch": 10,
    "lr": 0.001
}
model = exp.create_model("pytorch_model", run_settings, params=train_params)
model.attach_generator_files(to_configure="./train.cfg")
exp.generate(model)

New in 0.4.0, Model instances can be colocated with an Orchestrator database shard through Model.colocate_db. This will launch a single Orchestrator instance on each compute host used by the (possibly distributed) application. This is useful for performant online inference or processing at runtime.

New in 0.4.2, Model instances can now be colocated with an Orchestrator database over either TCP or UDS using the Model.colocate_db_tcp or Model.colocate_db_uds method respectively. The original Model.colocate_db method is now deprecated, but remains as an alias for Model.colocate_db_tcp for backward compatibility.

Parameters:

name (str) – name of the Model
run_settings (RunSettings) – defines how Model should be run
params (Optional[Dict[str, Any]], default: None) – Model parameters for writing into configuration files
path (Optional[str], default: None) – path to where the Model should be executed at runtime
enable_key_prefixing (bool, default: False) – If True, data sent to the Orchestrator using SmartRedis from this Model will be prefixed with the Model name.
batch_settings (Optional[BatchSettings], default: None) – Settings to run Model individually as a batch job.

Raises:

SmartSimError – if initialization fails

Return type:

Model

Returns:

the created Model

Create a RunSettings instance.

run_command=”auto” will attempt to automatically match a run command on the system with a RunSettings class in SmartSim. If found, the class corresponding to that run_command will be created and returned.

If the local launcher is being used, auto detection will be turned off.

If a recognized run command is passed, the RunSettings instance will be a child class such as SrunSettings

If not supported by smartsim, the base RunSettings class will be created and returned with the specified run_command and run_args will be evaluated literally.

Run Commands with implemented helper classes:

aprun (ALPS)
srun (SLURM)
mpirun (OpenMPI)
jsrun (LSF)

Parameters:

run_command (str, default: 'auto') – command to run the executable
exe (str) – executable to run
exe_args (Optional[List[str]], default: None) – arguments to pass to the executable
run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments to pass to the run_command
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment variables to pass to the executable
container (Optional[Container], default: None) – if execution environment is containerized

Return type:

RunSettings

Returns:

the created RunSettings

finished(entity: smartsim.entity.entity.SmartSimEntity) → bool[source]#

Query if a job has completed.

An instance of Model or Ensemble can be passed as an argument.

Passing Orchestrator will return an error as a database deployment is never finished until stopped by the user.

Parameters:: entity (SmartSimEntity) – object launched by this Experiment
Return type:: bool
Returns:: True if the job has finished, False otherwise
Raises:: SmartSimError – if entity has not been launched by this Experiment

generate(*args: smartsim.entity.entity.SmartSimEntity | smartsim.entity.entityList.EntitySequence[smartsim.entity.entity.SmartSimEntity], tag: str | None = None, overwrite: bool = False, verbose: bool = False) → None[source]#

Generate the file structure for an Experiment

Experiment.generate creates directories for each entity passed to organize Experiments that launch many entities.

If files or directories are attached to Model objects using Model.attach_generator_files(), those files or directories will be symlinked, copied, or configured and written into the created directory for that instance.

Instances of Model, Ensemble and Orchestrator can all be passed as arguments to the generate method.

Parameters:

tag (Optional[str], default: None) – tag used in to_configure generator files
overwrite (bool, default: False) – overwrite existing folders and contents
verbose (bool, default: False) – log parameter settings to std out

Return type:

None

get_status(*args: smartsim.entity.entity.SmartSimEntity | smartsim.entity.entityList.EntitySequence[smartsim.entity.entity.SmartSimEntity]) → List[smartsim.status.SmartSimStatus][source]#

Query the status of launched entity instances

Return a smartsim.status string representing the status of the launched instance.

exp.get_status(model)

As with an Experiment method, multiple instance of varying types can be passed to and all statuses will be returned at once.

statuses = exp.get_status(model, ensemble, orchestrator)
complete = [s == smartsim.status.STATUS_COMPLETED for s in statuses]
assert all(complete)

Return type:: List[SmartSimStatus]
Returns:: status of the instances passed as arguments
Raises:: SmartSimError – if status retrieval fails

poll(interval: int = 10, verbose: bool = True, kill_on_interrupt: bool = True) → None[source]#

Monitor jobs through logging to stdout.

This method should only be used if jobs were launched with Experiment.start(block=False)

The internal specified will control how often the logging is performed, not how often the polling occurs. By default, internal polling is set to every second for local launcher jobs and every 10 seconds for all other launchers.

If internal polling needs to be slower or faster based on system or site standards, set the SMARTSIM_JM_INTERNAL environment variable to control the internal polling interval for SmartSim.

For more verbose logging output, the SMARTSIM_LOG_LEVEL environment variable can be set to debug

If kill_on_interrupt=True, then all jobs launched by this experiment are guaranteed to be killed when ^C (SIGINT) signal is received. If kill_on_interrupt=False, then it is not guaranteed that all jobs launched by this experiment will be killed, and the zombie processes will need to be manually killed.

Parameters:

interval (int, default: 10) – frequency (in seconds) of logging to stdout
verbose (bool, default: True) – set verbosity
kill_on_interrupt (bool, default: True) – flag for killing jobs when SIGINT is received

Raises:

SmartSimError – if poll request fails

Return type:

None

preview(*args: Any, verbosity_level: smartsim._core.control.previewrenderer.Verbosity = Verbosity.INFO, output_format: smartsim._core.control.previewrenderer.Format = Format.PLAINTEXT, output_filename: str | None = None) → None[source]#

Preview entity information prior to launch. This method aggregates multiple pieces of information to give users insight into what and how entities will be launched. Any instance of Model, Ensemble, or Orchestrator created by the Experiment can be passed as an argument to the preview method.

Verbosity levels:

info: Display user-defined fields and entities.
debug: Display user-defined field and entities and auto-generated
fields.
developer: Display user-defined field and entities, auto-generated
fields, and run commands.

Parameters:

verbosity_level (Verbosity, default: <Verbosity.INFO: 'info'>) – verbosity level specified by user, defaults to info.
output_format (Format, default: <Format.PLAINTEXT: 'plain_text'>) – Set output format. The possible accepted output formats are plain_text. Defaults to plain_text.
output_filename (Optional[str], default: None) – Specify name of file and extension to write preview data to. If no output filename is set, the preview will be output to stdout. Defaults to None.

Return type:

None

reconnect_orchestrator(checkpoint: str) → smartsim.database.orchestrator.Orchestrator[source]#

Reconnect to a running Orchestrator

This method can be used to connect to a Orchestrator deployment that was launched by a previous Experiment. This can be helpful in the case where separate runs of an Experiment wish to use the same Orchestrator instance currently running on a system.

Parameters:: checkpoint (str) – the smartsim_db.dat file created when an Orchestrator is launched
Return type:: Orchestrator

start(*args: smartsim.entity.entity.SmartSimEntity | smartsim.entity.entityList.EntitySequence[smartsim.entity.entity.SmartSimEntity], block: bool = True, summary: bool = False, kill_on_interrupt: bool = True) → None[source]#

Start passed instances using Experiment launcher

Any instance Model, Ensemble or Orchestrator instance created by the Experiment can be passed as an argument to the start method.

exp = Experiment(name="my_exp", launcher="slurm")
settings = exp.create_run_settings(exe="./path/to/binary")
model = exp.create_model("my_model", settings)
exp.start(model)

Multiple entity instances can also be passed to the start method at once no matter which type of instance they are. These will all be launched together.

exp.start(model_1, model_2, db, ensemble, block=True)
# alternatively
stage_1 = [model_1, model_2, db, ensemble]
exp.start(*stage_1, block=True)

If block==True the Experiment will poll the launched instances at runtime until all non-database jobs have completed. Database jobs must be killed by the user by passing them to Experiment.stop. This allows for multiple stages of a workflow to produce to and consume from the same Orchestrator database.

If kill_on_interrupt=True, then all jobs launched by this experiment are guaranteed to be killed when ^C (SIGINT) signal is received. If kill_on_interrupt=False, then it is not guaranteed that all jobs launched by this experiment will be killed, and the zombie processes will need to be manually killed.

Parameters:

block (bool, default: True) – block execution until all non-database jobs are finished
summary (bool, default: False) – print a launch summary prior to launch
kill_on_interrupt (bool, default: True) – flag for killing jobs when ^C (SIGINT) signal is received.

Return type:

None

stop(*args: smartsim.entity.entity.SmartSimEntity | smartsim.entity.entityList.EntitySequence[smartsim.entity.entity.SmartSimEntity]) → None[source]#

Stop specific instances launched by this Experiment

Instances of Model, Ensemble and Orchestrator can all be passed as arguments to the stop method.

Whichever launcher was specified at Experiment initialization will be used to stop the instance. For example, which using the slurm launcher, this equates to running scancel on the instance.

Example

exp.stop(model)
# multiple
exp.stop(model_1, model_2, db, ensemble)

Parameters:

args (Union[SmartSimEntity, EntitySequence[SmartSimEntity]]) – One or more SmartSimEntity or EntitySequence objects.

Raises:

TypeError – if wrong type
SmartSimError – if stop request fails

Return type:

None

summary(style: str = 'github') → str[source]#

Return a summary of the Experiment

The summary will show each instance that has been launched and completed in this Experiment

Parameters:: style (str, default: 'github') – the style in which the summary table is formatted, for a full list of styles see the table-format section of: astanin/python-tabulate
Return type:: str
Returns:: tabulate string of Experiment history

property telemetry: TelemetryConfiguration#

Return the telemetry configuration for this entity.

Returns:: configuration of telemetry for this entity

Settings#

Settings are provided to Model and Ensemble objects to provide parameters for how a job should be executed. Some are specifically meant for certain launchers like SbatchSettings is solely meant for system using Slurm as a workload manager. MpirunSettings for OpenMPI based jobs is supported by Slurm and PBSPro.

Types of Settings:

`RunSettings`(exe[, exe_args, run_command, ...])	Run parameters for a `Model`
`SrunSettings`(exe[, exe_args, run_args, ...])	Initialize run parameters for a slurm job with `srun`
`AprunSettings`(exe[, exe_args, run_args, ...])	Settings to run job with `aprun` command
`MpirunSettings`(exe[, exe_args, run_args, ...])	Settings to run job with `mpirun` command (MPI-standard)
`MpiexecSettings`(exe[, exe_args, run_args, ...])	Settings to run job with `mpiexec` command (MPI-standard)
`OrterunSettings`(exe[, exe_args, run_args, ...])	Settings to run job with `orterun` command (MPI-standard)
`JsrunSettings`(exe[, exe_args, run_args, ...])	Settings to run job with `jsrun` command
`DragonRunSettings`(exe[, exe_args, env_vars])	Initialize run parameters for a Dragon process
`SbatchSettings`([nodes, time, account, ...])	Specify run parameters for a Slurm batch job
`QsubBatchSettings`([nodes, ncpus, time, ...])	Specify `qsub` batch parameters for a job
`BsubBatchSettings`([nodes, time, project, ...])	Specify `bsub` batch parameters for a job

Settings objects can accept a container object that defines a container runtime, image, and arguments to use for the workload. Below is a list of supported container runtimes.

Types of Containers:

Singularity(*args, **kwargs)

Singularity (apptainer) container type.

RunSettings#

When running SmartSim on laptops and single node workstations, the base RunSettings object is used to parameterize jobs. RunSettings include a run_command parameter for local launches that utilize a parallel launch binary like mpirun, mpiexec, and others.

`RunSettings.add_exe_args`(args)	Add executable arguments to executable
`RunSettings.update_env`(env_vars)	Update the job environment variables

Run parameters for a Model

The base RunSettings class should only be used with the local launcher on single node, workstations, or laptops.

If no run_command is specified, the executable will be launched locally.

run_args passed as a dict will be interpreted literally for local RunSettings and added directly to the run_command e.g. run_args = {“-np”: 2} will be “-np 2”

Example initialization

rs = RunSettings("echo", "hello", "mpirun", run_args={"-np": "2"})

Parameters:

exe (str) – executable to run
exe_args (Union[str, List[str], None], default: None) – executable arguments
run_command (str, default: '') – launch binary (e.g. “srun”)
run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command (e.g. -np for mpiexec)
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with
container (Optional[Container], default: None) – container type for workload (e.g. “singularity”)

add_exe_args(args: str | List[str]) → None[source]#

Add executable arguments to executable

Parameters:: args (Union[str, List[str]]) – executable arguments
Return type:: None

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:: attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:: attached executable arguments

format_env_vars() → List[str][source]#

Build environment variable string

Return type:: List[str]
Returns:: formatted list of strings to export variables

format_run_args() → List[str][source]#

Return formatted run arguments

For RunSettings, the run arguments are passed literally with no formatting.

Return type:: List[str]
Returns:: list run arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) → None[source]#

Make job an MPMD job

Parameters:: settings (RunSettings) – RunSettings instance
Return type:: None

reserved_run_args: set[str] = {}#

property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:: attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:: launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) → None[source]#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]

Parameters:

arg (str) – name of the argument
value (Optional[str], default: None) – value of the argument
conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) → None[source]#

Set binding

Parameters:: binding (str) – Binding
Return type:: None

set_broadcast(dest_path: str | None = None) → None[source]#

Copy executable file to allocated compute nodes

Parameters:: dest_path (Optional[str], default: None) – Path to copy an executable file
Return type:: None

set_cpu_bindings(bindings: int | List[int]) → None[source]#

Set the cores to which MPI processes are bound

Parameters:: bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None[source]#

Set the number of cpus per task

Parameters:: cpus_per_task (int) – number of cpus per task
Return type:: None

set_excluded_hosts(host_list: str | List[str]) → None[source]#

Specify a list of hosts to exclude for launching this job

Parameters:: host_list (Union[str, List[str]]) – hosts to exclude
Return type:: None

set_hostlist(host_list: str | List[str]) → None[source]#

Specify the hostlist for this job

Parameters:: host_list (Union[str, List[str]]) – hosts to launch on
Return type:: None

set_hostlist_from_file(file_path: str) → None[source]#

Use the contents of a file to specify the hostlist for this job

Parameters:: file_path (str) – Path to the hostlist file
Return type:: None

set_memory_per_node(memory_per_node: int) → None[source]#

Set the amount of memory required per node in megabytes

Parameters:: memory_per_node (int) – Number of megabytes per node
Return type:: None

set_mpmd_preamble(preamble_lines: List[str]) → None[source]#

Set preamble to a file to make a job MPMD

Parameters:: preamble_lines (List[str]) – lines to put at the beginning of a file.
Return type:: None

set_node_feature(feature_list: str | List[str]) → None[source]#

Specify the node feature for this job

Parameters:: feature_list (Union[str, List[str]]) – node feature to launch on
Return type:: None

set_nodes(nodes: int) → None[source]#

Set the number of nodes

Parameters:: nodes (int) – number of nodes to run with
Return type:: None

set_quiet_launch(quiet: bool) → None[source]#

Set the job to run in quiet mode

Parameters:: quiet (bool) – Whether the job should be run quietly
Return type:: None

set_task_map(task_mapping: str) → None[source]#

Set a task mapping

Parameters:: task_mapping (str) – task mapping
Return type:: None

set_tasks(tasks: int) → None[source]#

Set the number of tasks to launch

Parameters:: tasks (int) – number of tasks to launch
Return type:: None

set_tasks_per_node(tasks_per_node: int) → None[source]#

Set the number of tasks per node

Parameters:: tasks_per_node (int) – number of tasks to launch per node
Return type:: None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) → None[source]#

Automatically format and set wall time

Parameters:

hours (int, default: 0) – number of hours to run job
minutes (int, default: 0) – number of minutes to run job
seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) → None[source]#

Set the job to run in verbose mode

Parameters:: verbose (bool) – Whether the job should be run verbosely
Return type:: None

set_walltime(walltime: str) → None[source]#

Set the formatted walltime

Parameters:: walltime (str) – Time in format required by launcher``
Return type:: None

update_env(env_vars: Dict[str, str | int | float | bool]) → None[source]#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:: env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add
Raises:: TypeError – if env_vars values cannot be coerced to strings
Return type:: None

SrunSettings#

SrunSettings can be used for running on existing allocations, running jobs in interactive allocations, and for adding srun steps to a batch.

`SrunSettings.set_nodes`(nodes)	Set the number of nodes
`SrunSettings.set_node_feature`(feature_list)	Specify the node feature for this job
`SrunSettings.set_tasks`(tasks)	Set the number of tasks for this job
`SrunSettings.set_tasks_per_node`(tasks_per_node)	Set the number of tasks for this job
`SrunSettings.set_walltime`(walltime)	Set the walltime of the job
`SrunSettings.set_hostlist`(host_list)	Specify the hostlist for this job
`SrunSettings.set_excluded_hosts`(host_list)	Specify a list of hosts to exclude for launching this job
`SrunSettings.set_cpus_per_task`(cpus_per_task)	Set the number of cpus to use per task
`SrunSettings.add_exe_args`(args)	Add executable arguments to executable
`SrunSettings.format_run_args`()	Return a list of slurm formatted run arguments
`SrunSettings.format_env_vars`()	Build bash compatible environment variable string for Slurm
`SrunSettings.update_env`(env_vars)	Update the job environment variables

Initialize run parameters for a slurm job with srun

SrunSettings should only be used on Slurm based systems.

If an allocation is specified, the instance receiving these run parameters will launch on that allocation.

Parameters:

exe (str) – executable to run
exe_args (Union[str, List[str], None], default: None) – executable arguments
run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – srun arguments without dashes
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment variables for job
alloc (Optional[str], default: None) – allocation ID if running on existing alloc

add_exe_args(args: str | List[str]) → None#

Add executable arguments to executable

Parameters:: args (Union[str, List[str]]) – executable arguments
Return type:: None

check_env_vars() → None[source]#

Warn a user trying to set a variable which is set in the environment

Given Slurm’s env var precedence, trying to export a variable which is already present in the environment will not work.

Return type:: None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:: attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:: attached executable arguments

format_comma_sep_env_vars() → Tuple[str, List[str]][source]#

Build environment variable string for Slurm

Slurm takes exports in comma separated lists the list starts with all as to not disturb the rest of the environment for more information on this, see the slurm documentation for srun

Return type:: Tuple[str, List[str]]
Returns:: the formatted string of environment variables

format_env_vars() → List[str][source]#

Build bash compatible environment variable string for Slurm

Return type:: List[str]
Returns:: the formatted string of environment variables

format_run_args() → List[str][source]#

Return a list of slurm formatted run arguments

Return type:: List[str]
Returns:: list of slurm arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) → None[source]#

Make a mpmd workload by combining two srun commands

This connects the two settings to be executed with a single Model instance

Parameters:: settings (RunSettings) – SrunSettings instance
Return type:: None

reserved_run_args: set[str] = {'D', 'chdir'}#

property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:: attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:: launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) → None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]

Parameters:

arg (str) – name of the argument
value (Optional[str], default: None) – value of the argument
conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) → None#

Set binding

Parameters:: binding (str) – Binding
Return type:: None

set_broadcast(dest_path: str | None = None) → None[source]#

Copy executable file to allocated compute nodes

This sets --bcast

Parameters:: dest_path (Optional[str], default: None) – Path to copy an executable file
Return type:: None

set_cpu_bindings(bindings: int | List[int]) → None[source]#

Bind by setting CPU masks on tasks

This sets --cpu-bind using the map_cpu:<list> option

Parameters:: bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None[source]#

Set the number of cpus to use per task

This sets --cpus-per-task

Parameters:: num_cpus – number of cpus to use per task
Return type:: None

set_excluded_hosts(host_list: str | List[str]) → None[source]#

Specify a list of hosts to exclude for launching this job

Parameters:: host_list (Union[str, List[str]]) – hosts to exclude
Raises:: TypeError –
Return type:: None

set_het_group(het_group: Iterable[int]) → None[source]#

Set the heterogeneous group for this job

this sets –het-group

Parameters:: het_group (Iterable[int]) – list of heterogeneous groups
Return type:: None

set_hostlist(host_list: str | List[str]) → None[source]#

Specify the hostlist for this job

This sets --nodelist

Parameters:: host_list (Union[str, List[str]]) – hosts to launch on
Raises:: TypeError – if not str or list of str
Return type:: None

set_hostlist_from_file(file_path: str) → None[source]#

Use the contents of a file to set the node list

This sets --nodefile

Parameters:: file_path (str) – Path to the hostlist file
Return type:: None

set_memory_per_node(memory_per_node: int) → None[source]#

Specify the real memory required per node

This sets --mem in megabytes

Parameters:: memory_per_node (int) – Amount of memory per node in megabytes
Return type:: None

set_mpmd_preamble(preamble_lines: List[str]) → None#

Set preamble to a file to make a job MPMD

Parameters:: preamble_lines (List[str]) – lines to put at the beginning of a file.
Return type:: None

set_node_feature(feature_list: str | List[str]) → None[source]#

Specify the node feature for this job

This sets -C

Parameters:: feature_list (Union[str, List[str]]) – node feature to launch on
Raises:: TypeError – if not str or list of str
Return type:: None

set_nodes(nodes: int) → None[source]#

Set the number of nodes

Effectively this is setting: srun --nodes <num_nodes>

Parameters:: nodes (int) – number of nodes to run with
Return type:: None

set_quiet_launch(quiet: bool) → None[source]#

Set the job to run in quiet mode

This sets --quiet

Parameters:: quiet (bool) – Whether the job should be run quietly
Return type:: None

set_task_map(task_mapping: str) → None#

Set a task mapping

Parameters:: task_mapping (str) – task mapping
Return type:: None

set_tasks(tasks: int) → None[source]#

Set the number of tasks for this job

This sets --ntasks

Parameters:: tasks (int) – number of tasks
Return type:: None

set_tasks_per_node(tasks_per_node: int) → None[source]#

Set the number of tasks for this job

This sets --ntasks-per-node

Parameters:: tasks_per_node (int) – number of tasks per node
Return type:: None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) → None#

Automatically format and set wall time

Parameters:

hours (int, default: 0) – number of hours to run job
minutes (int, default: 0) – number of minutes to run job
seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) → None[source]#

Set the job to run in verbose mode

This sets --verbose

Parameters:: verbose (bool) – Whether the job should be run verbosely
Return type:: None

set_walltime(walltime: str) → None[source]#

Set the walltime of the job

format = “HH:MM:SS”

Parameters:: walltime (str) – wall time
Return type:: None

update_env(env_vars: Dict[str, str | int | float | bool]) → None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:: env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add
Raises:: TypeError – if env_vars values cannot be coerced to strings
Return type:: None

AprunSettings#

AprunSettings can be used on any system that supports the Cray ALPS layer. SmartSim supports using AprunSettings on PBSPro WLM systems.

AprunSettings can be used in interactive session (on allocation) and within batch launches (e.g., QsubBatchSettings)

`AprunSettings.set_cpus_per_task`(cpus_per_task)	Set the number of cpus to use per task
`AprunSettings.set_hostlist`(host_list)	Specify the hostlist for this job
`AprunSettings.set_tasks`(tasks)	Set the number of tasks for this job
`AprunSettings.set_tasks_per_node`(tasks_per_node)	Set the number of tasks for this job
`AprunSettings.make_mpmd`(settings)	Make job an MPMD job
`AprunSettings.add_exe_args`(args)	Add executable arguments to executable
`AprunSettings.format_run_args`()	Return a list of ALPS formatted run arguments
`AprunSettings.format_env_vars`()	Format the environment variables for aprun
`AprunSettings.update_env`(env_vars)	Update the job environment variables

Settings to run job with aprun command

AprunSettings can be used for the pbs launcher.

Parameters:

exe (str) – executable
exe_args (Union[str, List[str], None], default: None) – executable arguments
run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) → None#

Add executable arguments to executable

Parameters:: args (Union[str, List[str]]) – executable arguments
Return type:: None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:: attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:: attached executable arguments

format_env_vars() → List[str][source]#

Format the environment variables for aprun

Return type:: List[str]
Returns:: list of env vars

format_run_args() → List[str][source]#

Return a list of ALPS formatted run arguments

Return type:: List[str]
Returns:: list of ALPS arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) → None[source]#

Make job an MPMD job

This method combines two AprunSettings into a single MPMD command joined with ‘:’

Parameters:: settings (RunSettings) – AprunSettings instance
Return type:: None

reserved_run_args: set[str] = {}#

property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:: attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:: launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) → None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]

Parameters:

arg (str) – name of the argument
value (Optional[str], default: None) – value of the argument
conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) → None#

Set binding

Parameters:: binding (str) – Binding
Return type:: None

set_broadcast(dest_path: str | None = None) → None#

Copy executable file to allocated compute nodes

Parameters:: dest_path (Optional[str], default: None) – Path to copy an executable file
Return type:: None

set_cpu_bindings(bindings: int | List[int]) → None[source]#

Specifies the cores to which MPI processes are bound

This sets --cpu-binding

Parameters:: bindings (Union[int, List[int]]) – List of cpu numbers
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None[source]#

Set the number of cpus to use per task

This sets --cpus-per-pe

Parameters:: cpus_per_task (int) – number of cpus to use per task
Return type:: None

set_excluded_hosts(host_list: str | List[str]) → None[source]#

Specify a list of hosts to exclude for launching this job

Parameters:: host_list (Union[str, List[str]]) – hosts to exclude
Raises:: TypeError – if not str or list of str
Return type:: None

set_hostlist(host_list: str | List[str]) → None[source]#

Specify the hostlist for this job

Parameters:: host_list (Union[str, List[str]]) – hosts to launch on
Raises:: TypeError – if not str or list of str
Return type:: None

set_hostlist_from_file(file_path: str) → None[source]#

Use the contents of a file to set the node list

This sets --node-list-file

Parameters:: file_path (str) – Path to the hostlist file
Return type:: None

set_memory_per_node(memory_per_node: int) → None[source]#

Specify the real memory required per node

This sets --memory-per-pe in megabytes

Parameters:: memory_per_node (int) – Per PE memory limit in megabytes
Return type:: None

set_mpmd_preamble(preamble_lines: List[str]) → None#

Set preamble to a file to make a job MPMD

Parameters:: preamble_lines (List[str]) – lines to put at the beginning of a file.
Return type:: None

set_node_feature(feature_list: str | List[str]) → None#

Specify the node feature for this job

Parameters:: feature_list (Union[str, List[str]]) – node feature to launch on
Return type:: None

set_nodes(nodes: int) → None#

Set the number of nodes

Parameters:: nodes (int) – number of nodes to run with
Return type:: None

set_quiet_launch(quiet: bool) → None[source]#

Set the job to run in quiet mode

This sets --quiet

Parameters:: quiet (bool) – Whether the job should be run quietly
Return type:: None

set_task_map(task_mapping: str) → None#

Set a task mapping

Parameters:: task_mapping (str) – task mapping
Return type:: None

set_tasks(tasks: int) → None[source]#

Set the number of tasks for this job

This sets --pes

Parameters:: tasks (int) – number of tasks
Return type:: None

set_tasks_per_node(tasks_per_node: int) → None[source]#

Set the number of tasks for this job

This sets --pes-per-node

Parameters:: tasks_per_node (int) – number of tasks per node
Return type:: None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) → None#

Automatically format and set wall time

Parameters:

hours (int, default: 0) – number of hours to run job
minutes (int, default: 0) – number of minutes to run job
seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) → None[source]#

Set the job to run in verbose mode

This sets --debug arg to the highest level

Parameters:: verbose (bool) – Whether the job should be run verbosely
Return type:: None

set_walltime(walltime: str) → None[source]#

Set the walltime of the job

Walltime is given in total number of seconds

Parameters:: walltime (str) – wall time
Return type:: None

update_env(env_vars: Dict[str, str | int | float | bool]) → None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:: env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add
Raises:: TypeError – if env_vars values cannot be coerced to strings
Return type:: None

DragonRunSettings#

DragonRunSettings can be used on systems that support Slurm or PBS, if Dragon is available in the Python environment (see _dragon_install for instructions on how to install it through smart).

DragonRunSettings can be used in interactive sessions (on allcation) and within batch launches (i.e. SbatchSettings or QsubBatchSettings, for Slurm and PBS sessions, respectively).

`DragonRunSettings.set_nodes`(nodes)	Set the number of nodes
`DragonRunSettings.set_tasks_per_node`(...)	Set the number of tasks for this job

class DragonRunSettings(exe: str, exe_args: str | List[str] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) → None[source]#

Initialize run parameters for a Dragon process

DragonRunSettings should only be used on systems where Dragon is available and installed in the current environment.

If an allocation is specified, the instance receiving these run parameters will launch on that allocation.

Parameters:

exe (str) – executable to run
exe_args (Union[str, List[str], None], default: None) – executable arguments, defaults to None
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment variables for job, defaults to None
alloc – allocation ID if running on existing alloc, defaults to None

add_exe_args(args: str | List[str]) → None#

Add executable arguments to executable

Parameters:: args (Union[str, List[str]]) – executable arguments
Return type:: None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:: attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:: attached executable arguments

format_env_vars() → List[str]#

Build environment variable string

Return type:: List[str]
Returns:: formatted list of strings to export variables

format_run_args() → List[str]#

Return formatted run arguments

For RunSettings, the run arguments are passed literally with no formatting.

Return type:: List[str]
Returns:: list run arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) → None#

Make job an MPMD job

Parameters:: settings (RunSettings) – RunSettings instance
Return type:: None

reserved_run_args: set[str] = {}#

property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:: attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:: launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) → None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]

Parameters:

arg (str) – name of the argument
value (Optional[str], default: None) – value of the argument
conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) → None#

Set binding

Parameters:: binding (str) – Binding
Return type:: None

set_broadcast(dest_path: str | None = None) → None#

Copy executable file to allocated compute nodes

Parameters:: dest_path (Optional[str], default: None) – Path to copy an executable file
Return type:: None

set_cpu_affinity(devices: List[int]) → None[source]#

Set the CPU affinity for this job

Parameters:: devices (List[int]) – list of CPU indices to execute on
Return type:: None

set_cpu_bindings(bindings: int | List[int]) → None#

Set the cores to which MPI processes are bound

Parameters:: bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None#

Set the number of cpus per task

Parameters:: cpus_per_task (int) – number of cpus per task
Return type:: None

set_excluded_hosts(host_list: str | List[str]) → None#

Specify a list of hosts to exclude for launching this job

Parameters:: host_list (Union[str, List[str]]) – hosts to exclude
Return type:: None

set_gpu_affinity(devices: List[int]) → None[source]#

Set the GPU affinity for this job

Parameters:: devices (List[int]) – list of GPU indices to execute on.
Return type:: None

set_hostlist(host_list: str | List[str]) → None#

Specify the hostlist for this job

Parameters:: host_list (Union[str, List[str]]) – hosts to launch on
Return type:: None

set_hostlist_from_file(file_path: str) → None#

Use the contents of a file to specify the hostlist for this job

Parameters:: file_path (str) – Path to the hostlist file
Return type:: None

set_memory_per_node(memory_per_node: int) → None#

Set the amount of memory required per node in megabytes

Parameters:: memory_per_node (int) – Number of megabytes per node
Return type:: None

set_mpmd_preamble(preamble_lines: List[str]) → None#

Set preamble to a file to make a job MPMD

Parameters:: preamble_lines (List[str]) – lines to put at the beginning of a file.
Return type:: None

set_node_feature(feature_list: str | List[str]) → None[source]#

Specify the node feature for this job

Parameters:: feature_list (Union[str, List[str]]) – a collection of strings representing the required node features. Currently supported node features are: “gpu”
Return type:: None

set_nodes(nodes: int) → None[source]#

Set the number of nodes

Parameters:: nodes (int) – number of nodes to run with
Return type:: None

set_quiet_launch(quiet: bool) → None#

Set the job to run in quiet mode

Parameters:: quiet (bool) – Whether the job should be run quietly
Return type:: None

set_task_map(task_mapping: str) → None#

Set a task mapping

Parameters:: task_mapping (str) – task mapping
Return type:: None

set_tasks(tasks: int) → None#

Set the number of tasks to launch

Parameters:: tasks (int) – number of tasks to launch
Return type:: None

set_tasks_per_node(tasks_per_node: int) → None[source]#

Set the number of tasks for this job

Parameters:: tasks_per_node (int) – number of tasks per node
Return type:: None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) → None#

Automatically format and set wall time

Parameters:

hours (int, default: 0) – number of hours to run job
minutes (int, default: 0) – number of minutes to run job
seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) → None#

Set the job to run in verbose mode

Parameters:: verbose (bool) – Whether the job should be run verbosely
Return type:: None

set_walltime(walltime: str) → None#

Set the formatted walltime

Parameters:: walltime (str) – Time in format required by launcher``
Return type:: None

update_env(env_vars: Dict[str, str | int | float | bool]) → None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:: env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add
Raises:: TypeError – if env_vars values cannot be coerced to strings
Return type:: None

JsrunSettings#

JsrunSettings can be used on any system that supports the IBM LSF launcher.

JsrunSettings can be used in interactive session (on allocation) and within batch launches (i.e. BsubBatchSettings)

`JsrunSettings.set_num_rs`(num_rs)	Set the number of resource sets to use
`JsrunSettings.set_cpus_per_rs`(cpus_per_rs)	Set the number of cpus to use per resource set
`JsrunSettings.set_gpus_per_rs`(gpus_per_rs)	Set the number of gpus to use per resource set
`JsrunSettings.set_rs_per_host`(rs_per_host)	Set the number of resource sets to use per host
`JsrunSettings.set_tasks`(tasks)	Set the number of tasks for this job
`JsrunSettings.set_tasks_per_rs`(tasks_per_rs)	Set the number of tasks per resource set
`JsrunSettings.set_binding`(binding)	Set binding
`JsrunSettings.make_mpmd`(settings)	Make step an MPMD (or SPMD) job.
`JsrunSettings.set_mpmd_preamble`(preamble_lines)	Set preamble used in ERF file.
`JsrunSettings.update_env`(env_vars)	Update the job environment variables
`JsrunSettings.set_erf_sets`(erf_sets)	Set resource sets used for ERF (SPMD or MPMD) steps.
`JsrunSettings.format_env_vars`()	Format environment variables.
`JsrunSettings.format_run_args`()	Return a list of LSF formatted run arguments

Settings to run job with jsrun command

JsrunSettings should only be used on LSF-based systems.

Parameters:

exe (str) – executable
exe_args (Union[str, List[str], None], default: None) – executable arguments
run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) → None#

Add executable arguments to executable

Parameters:: args (Union[str, List[str]]) – executable arguments
Return type:: None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:: attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:: attached executable arguments

format_env_vars() → List[str][source]#

Format environment variables. Each variable needs to be passed with --env. If a variable is set to None, its value is propagated from the current environment.

Return type:: List[str]
Returns:: formatted list of strings to export variables

format_run_args() → List[str][source]#

Return a list of LSF formatted run arguments

Return type:: List[str]
Returns:: list of LSF arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) → None[source]#

Make step an MPMD (or SPMD) job.

This method will activate job execution through an ERF file.

Optionally, this method adds an instance of JsrunSettings to the list of settings to be launched in the same ERF file.

Parameters:: settings (RunSettings) – JsrunSettings instance
Return type:: None

reserved_run_args: set[str] = {'chdir', 'h'}#

property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:: attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:: launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) → None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]

Parameters:

arg (str) – name of the argument
value (Optional[str], default: None) – value of the argument
conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) → None[source]#

Set binding

This sets --bind

Parameters:: binding (str) – Binding, e.g. packed:21
Return type:: None

set_broadcast(dest_path: str | None = None) → None#

Copy executable file to allocated compute nodes

Parameters:: dest_path (Optional[str], default: None) – Path to copy an executable file
Return type:: None

set_cpu_bindings(bindings: int | List[int]) → None#

Set the cores to which MPI processes are bound

Parameters:: bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound
Return type:: None

set_cpus_per_rs(cpus_per_rs: int) → None[source]#

Set the number of cpus to use per resource set

This sets --cpu_per_rs

Parameters:: cpus_per_rs (int) – number of cpus to use per resource set or ALL_CPUS
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None[source]#

Set the number of cpus per tasks.

This function is an alias for set_cpus_per_rs.

Parameters:: cpus_per_task (int) – number of cpus per resource set
Return type:: None

set_erf_sets(erf_sets: Dict[str, str]) → None[source]#

Set resource sets used for ERF (SPMD or MPMD) steps.

erf_sets is a dictionary used to fill the ERF line representing these settings, e.g. {“host”: “1”, “cpu”: “{0:21}, {21:21}”, “gpu”: “*”} can be used to specify rank (or rank_count), hosts, cpus, gpus, and memory. The key rank is used to give specific ranks, as in {“rank”: “1, 2, 5”}, while the key rank_count is used to specify the count only, as in {“rank_count”: “3”}. If both are specified, only rank is used.

Parameters:: hosts – dictionary of resources
Return type:: None

set_excluded_hosts(host_list: str | List[str]) → None#

Specify a list of hosts to exclude for launching this job

Parameters:: host_list (Union[str, List[str]]) – hosts to exclude
Return type:: None

set_gpus_per_rs(gpus_per_rs: int) → None[source]#

Set the number of gpus to use per resource set

This sets --gpu_per_rs

Parameters:: gpus_per_rs (int) – number of gpus to use per resource set or ALL_GPUS
Return type:: None

set_hostlist(host_list: str | List[str]) → None#

Specify the hostlist for this job

Parameters:: host_list (Union[str, List[str]]) – hosts to launch on
Return type:: None

set_hostlist_from_file(file_path: str) → None#

Use the contents of a file to specify the hostlist for this job

Parameters:: file_path (str) – Path to the hostlist file
Return type:: None

set_individual_output(suffix: str | None = None) → None[source]#

Set individual std output.

This sets --stdio_mode individual and inserts the suffix into the output name. The resulting output name will be self.name + suffix + .out.

Parameters:: suffix (Optional[str], default: None) – Optional suffix to add to output file names, it can contain %j, %h, %p, or %t, as specified by jsrun options.
Return type:: None

set_memory_per_node(memory_per_node: int) → None[source]#

Specify the number of megabytes of memory to assign to a resource set

Alias for set_memory_per_rs.

Parameters:: memory_per_node (int) – Number of megabytes per rs
Return type:: None

set_memory_per_rs(memory_per_rs: int) → None[source]#

Specify the number of megabytes of memory to assign to a resource set

This sets --memory_per_rs

Parameters:: memory_per_rs (int) – Number of megabytes per rs
Return type:: None

set_mpmd_preamble(preamble_lines: List[str]) → None[source]#

Set preamble used in ERF file. Typical lines include oversubscribe-cpu : allow or overlapping-rs : allow. Can be used to set launch_distribution. If it is not present, it will be inferred from the settings, or set to packed by default.

Parameters:: preamble_lines (List[str]) – lines to put at the beginning of the ERF file.
Return type:: None

set_node_feature(feature_list: str | List[str]) → None#

Specify the node feature for this job

Parameters:: feature_list (Union[str, List[str]]) – node feature to launch on
Return type:: None

set_nodes(nodes: int) → None#

Set the number of nodes

Parameters:: nodes (int) – number of nodes to run with
Return type:: None

set_num_rs(num_rs: str | int) → None[source]#

Set the number of resource sets to use

This sets --nrs.

Parameters:: num_rs (Union[str, int]) – Number of resource sets or ALL_HOSTS
Return type:: None

set_quiet_launch(quiet: bool) → None#

Set the job to run in quiet mode

Parameters:: quiet (bool) – Whether the job should be run quietly
Return type:: None

set_rs_per_host(rs_per_host: int) → None[source]#

Set the number of resource sets to use per host

This sets --rs_per_host

Parameters:: rs_per_host (int) – number of resource sets to use per host
Return type:: None

set_task_map(task_mapping: str) → None#

Set a task mapping

Parameters:: task_mapping (str) – task mapping
Return type:: None

set_tasks(tasks: int) → None[source]#

Set the number of tasks for this job

This sets --np

Parameters:: tasks (int) – number of tasks
Return type:: None

set_tasks_per_node(tasks_per_node: int) → None[source]#

Set the number of tasks per resource set.

This function is an alias for set_tasks_per_rs.

Parameters:: tasks_per_node (int) – number of tasks per resource set
Return type:: None

set_tasks_per_rs(tasks_per_rs: int) → None[source]#

Set the number of tasks per resource set

This sets --tasks_per_rs

Parameters:: tasks_per_rs (int) – number of tasks per resource set
Return type:: None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) → None#

Automatically format and set wall time

Parameters:

hours (int, default: 0) – number of hours to run job
minutes (int, default: 0) – number of minutes to run job
seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) → None#

Set the job to run in verbose mode

Parameters:: verbose (bool) – Whether the job should be run verbosely
Return type:: None

set_walltime(walltime: str) → None#

Set the formatted walltime

Parameters:: walltime (str) – Time in format required by launcher``
Return type:: None

update_env(env_vars: Dict[str, str | int | float | bool]) → None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:: env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add
Raises:: TypeError – if env_vars values cannot be coerced to strings
Return type:: None

MpirunSettings#

MpirunSettings are for launching with OpenMPI. MpirunSettings are supported on Slurm and PBSpro.

`MpirunSettings.set_cpus_per_task`(cpus_per_task)	Set the number of tasks for this job
`MpirunSettings.set_hostlist`(host_list)	Set the hostlist for the `mpirun` command
`MpirunSettings.set_tasks`(tasks)	Set the number of tasks for this job
`MpirunSettings.set_task_map`(task_mapping)	Set `mpirun` task mapping
`MpirunSettings.make_mpmd`(settings)	Make a mpmd workload by combining two `mpirun` commands
`MpirunSettings.add_exe_args`(args)	Add executable arguments to executable
`MpirunSettings.format_run_args`()	Return a list of MPI-standard formatted run arguments
`MpirunSettings.format_env_vars`()	Format the environment variables for mpirun
`MpirunSettings.update_env`(env_vars)	Update the job environment variables

Settings to run job with mpirun command (MPI-standard)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into mpirun arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters:

exe (str) – executable
exe_args (Union[str, List[str], None], default: None) – executable arguments
run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) → None#

Add executable arguments to executable

Parameters:: args (Union[str, List[str]]) – executable arguments
Return type:: None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:: attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:: attached executable arguments

format_env_vars() → List[str]#

Format the environment variables for mpirun

Return type:: List[str]
Returns:: list of env vars

format_run_args() → List[str]#

Return a list of MPI-standard formatted run arguments

Return type:: List[str]
Returns:: list of MPI-standard arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) → None#

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters:: settings (RunSettings) – MpirunSettings instance
Return type:: None

reserved_run_args: set[str] = {'wd', 'wdir'}#

property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:: attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:: launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) → None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]

Parameters:

arg (str) – name of the argument
value (Optional[str], default: None) – value of the argument
conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) → None#

Set binding

Parameters:: binding (str) – Binding
Return type:: None

set_broadcast(dest_path: str | None = None) → None#

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters:: dest_path (Optional[str], default: None) – Destination path (Ignored)
Return type:: None

set_cpu_binding_type(bind_type: str) → None#

Specifies the cores to which MPI processes are bound

This sets --bind-to for MPI compliant implementations

Parameters:: bind_type (str) – binding type
Return type:: None

set_cpu_bindings(bindings: int | List[int]) → None#

Set the cores to which MPI processes are bound

Parameters:: bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None#

Set the number of tasks for this job

This sets --cpus-per-proc for MPI compliant implementations

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters:: cpus_per_task (int) – number of tasks
Return type:: None

set_excluded_hosts(host_list: str | List[str]) → None#

Specify a list of hosts to exclude for launching this job

Parameters:: host_list (Union[str, List[str]]) – hosts to exclude
Return type:: None

set_hostlist(host_list: str | List[str]) → None#

Set the hostlist for the mpirun command

This sets --host

Parameters:: host_list (Union[str, List[str]]) – list of host names
Raises:: TypeError – if not str or list of str
Return type:: None

set_hostlist_from_file(file_path: str) → None#

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters:: file_path (str) – Path to the hostlist file
Return type:: None

set_memory_per_node(memory_per_node: int) → None#

Set the amount of memory required per node in megabytes

Parameters:: memory_per_node (int) – Number of megabytes per node
Return type:: None

set_mpmd_preamble(preamble_lines: List[str]) → None#

Set preamble to a file to make a job MPMD

Parameters:: preamble_lines (List[str]) – lines to put at the beginning of a file.
Return type:: None

set_node_feature(feature_list: str | List[str]) → None#

Specify the node feature for this job

Parameters:: feature_list (Union[str, List[str]]) – node feature to launch on
Return type:: None

set_nodes(nodes: int) → None#

Set the number of nodes

Parameters:: nodes (int) – number of nodes to run with
Return type:: None

set_quiet_launch(quiet: bool) → None#

Set the job to run in quiet mode

This sets --quiet

Parameters:: quiet (bool) – Whether the job should be run quietly
Return type:: None

set_task_map(task_mapping: str) → None#

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters:: task_mapping (str) – task mapping
Return type:: None

set_tasks(tasks: int) → None#

Set the number of tasks for this job

This sets -n for MPI compliant implementations

Parameters:: tasks (int) – number of tasks
Return type:: None

set_tasks_per_node(tasks_per_node: int) → None#

Set the number of tasks per node

Parameters:: tasks_per_node (int) – number of tasks to launch per node
Return type:: None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) → None#

Automatically format and set wall time

Parameters:

hours (int, default: 0) – number of hours to run job
minutes (int, default: 0) – number of minutes to run job
seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) → None#

Set the job to run in verbose mode

This sets --verbose

Parameters:: verbose (bool) – Whether the job should be run verbosely
Return type:: None

set_walltime(walltime: str) → None#

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters:: walltime (str) – number like string of seconds that a job will run in secs
Return type:: None

update_env(env_vars: Dict[str, str | int | float | bool]) → None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:: env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add
Raises:: TypeError – if env_vars values cannot be coerced to strings
Return type:: None

MpiexecSettings#

MpiexecSettings are for launching with OpenMPI’s mpiexec. MpirunSettings are supported on Slurm and PBSpro.

`MpiexecSettings.set_cpus_per_task`(cpus_per_task)	Set the number of tasks for this job
`MpiexecSettings.set_hostlist`(host_list)	Set the hostlist for the `mpirun` command
`MpiexecSettings.set_tasks`(tasks)	Set the number of tasks for this job
`MpiexecSettings.set_task_map`(task_mapping)	Set `mpirun` task mapping
`MpiexecSettings.make_mpmd`(settings)	Make a mpmd workload by combining two `mpirun` commands
`MpiexecSettings.add_exe_args`(args)	Add executable arguments to executable
`MpiexecSettings.format_run_args`()	Return a list of MPI-standard formatted run arguments
`MpiexecSettings.format_env_vars`()	Format the environment variables for mpirun
`MpiexecSettings.update_env`(env_vars)	Update the job environment variables

Settings to run job with mpiexec command (MPI-standard)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into mpiexec arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters:

exe (str) – executable
exe_args (Union[str, List[str], None], default: None) – executable arguments
run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) → None#

Add executable arguments to executable

Parameters:: args (Union[str, List[str]]) – executable arguments
Return type:: None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:: attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:: attached executable arguments

format_env_vars() → List[str]#

Format the environment variables for mpirun

Return type:: List[str]
Returns:: list of env vars

format_run_args() → List[str]#

Return a list of MPI-standard formatted run arguments

Return type:: List[str]
Returns:: list of MPI-standard arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) → None#

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters:: settings (RunSettings) – MpirunSettings instance
Return type:: None

reserved_run_args: set[str] = {'wd', 'wdir'}#

property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:: attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:: launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) → None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]

Parameters:

arg (str) – name of the argument
value (Optional[str], default: None) – value of the argument
conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) → None#

Set binding

Parameters:: binding (str) – Binding
Return type:: None

set_broadcast(dest_path: str | None = None) → None#

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters:: dest_path (Optional[str], default: None) – Destination path (Ignored)
Return type:: None

set_cpu_binding_type(bind_type: str) → None#

Specifies the cores to which MPI processes are bound

This sets --bind-to for MPI compliant implementations

Parameters:: bind_type (str) – binding type
Return type:: None

set_cpu_bindings(bindings: int | List[int]) → None#

Set the cores to which MPI processes are bound

Parameters:: bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None#

Set the number of tasks for this job

This sets --cpus-per-proc for MPI compliant implementations

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters:: cpus_per_task (int) – number of tasks
Return type:: None

set_excluded_hosts(host_list: str | List[str]) → None#

Specify a list of hosts to exclude for launching this job

Parameters:: host_list (Union[str, List[str]]) – hosts to exclude
Return type:: None

set_hostlist(host_list: str | List[str]) → None#

Set the hostlist for the mpirun command

This sets --host

Parameters:: host_list (Union[str, List[str]]) – list of host names
Raises:: TypeError – if not str or list of str
Return type:: None

set_hostlist_from_file(file_path: str) → None#

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters:: file_path (str) – Path to the hostlist file
Return type:: None

set_memory_per_node(memory_per_node: int) → None#

Set the amount of memory required per node in megabytes

Parameters:: memory_per_node (int) – Number of megabytes per node
Return type:: None

set_mpmd_preamble(preamble_lines: List[str]) → None#

Set preamble to a file to make a job MPMD

Parameters:: preamble_lines (List[str]) – lines to put at the beginning of a file.
Return type:: None

set_node_feature(feature_list: str | List[str]) → None#

Specify the node feature for this job

Parameters:: feature_list (Union[str, List[str]]) – node feature to launch on
Return type:: None

set_nodes(nodes: int) → None#

Set the number of nodes

Parameters:: nodes (int) – number of nodes to run with
Return type:: None

set_quiet_launch(quiet: bool) → None#

Set the job to run in quiet mode

This sets --quiet

Parameters:: quiet (bool) – Whether the job should be run quietly
Return type:: None

set_task_map(task_mapping: str) → None#

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters:: task_mapping (str) – task mapping
Return type:: None

set_tasks(tasks: int) → None#

Set the number of tasks for this job

This sets -n for MPI compliant implementations

Parameters:: tasks (int) – number of tasks
Return type:: None

set_tasks_per_node(tasks_per_node: int) → None#

Set the number of tasks per node

Parameters:: tasks_per_node (int) – number of tasks to launch per node
Return type:: None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) → None#

Automatically format and set wall time

Parameters:

hours (int, default: 0) – number of hours to run job
minutes (int, default: 0) – number of minutes to run job
seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) → None#

Set the job to run in verbose mode

This sets --verbose

Parameters:: verbose (bool) – Whether the job should be run verbosely
Return type:: None

set_walltime(walltime: str) → None#

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters:: walltime (str) – number like string of seconds that a job will run in secs
Return type:: None

update_env(env_vars: Dict[str, str | int | float | bool]) → None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:: env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add
Raises:: TypeError – if env_vars values cannot be coerced to strings
Return type:: None

OrterunSettings#

OrterunSettings are for launching with OpenMPI’s orterun. OrterunSettings are supported on Slurm and PBSpro.

`OrterunSettings.set_cpus_per_task`(cpus_per_task)	Set the number of tasks for this job
`OrterunSettings.set_hostlist`(host_list)	Set the hostlist for the `mpirun` command
`OrterunSettings.set_tasks`(tasks)	Set the number of tasks for this job
`OrterunSettings.set_task_map`(task_mapping)	Set `mpirun` task mapping
`OrterunSettings.make_mpmd`(settings)	Make a mpmd workload by combining two `mpirun` commands
`OrterunSettings.add_exe_args`(args)	Add executable arguments to executable
`OrterunSettings.format_run_args`()	Return a list of MPI-standard formatted run arguments
`OrterunSettings.format_env_vars`()	Format the environment variables for mpirun
`OrterunSettings.update_env`(env_vars)	Update the job environment variables

Settings to run job with orterun command (MPI-standard)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into orterun arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters:

exe (str) – executable
exe_args (Union[str, List[str], None], default: None) – executable arguments
run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command
env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) → None#

Add executable arguments to executable

Parameters:: args (Union[str, List[str]]) – executable arguments
Return type:: None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:: attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:: attached executable arguments

format_env_vars() → List[str]#

Format the environment variables for mpirun

Return type:: List[str]
Returns:: list of env vars

format_run_args() → List[str]#

Return a list of MPI-standard formatted run arguments

Return type:: List[str]
Returns:: list of MPI-standard arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) → None#

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters:: settings (RunSettings) – MpirunSettings instance
Return type:: None

reserved_run_args: set[str] = {'wd', 'wdir'}#

property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:: attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:: launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) → None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]

Parameters:

arg (str) – name of the argument
value (Optional[str], default: None) – value of the argument
conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) → None#

Set binding

Parameters:: binding (str) – Binding
Return type:: None

set_broadcast(dest_path: str | None = None) → None#

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters:: dest_path (Optional[str], default: None) – Destination path (Ignored)
Return type:: None

set_cpu_binding_type(bind_type: str) → None#

Specifies the cores to which MPI processes are bound

This sets --bind-to for MPI compliant implementations

Parameters:: bind_type (str) – binding type
Return type:: None

set_cpu_bindings(bindings: int | List[int]) → None#

Set the cores to which MPI processes are bound

Parameters:: bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None#

Set the number of tasks for this job

This sets --cpus-per-proc for MPI compliant implementations

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters:: cpus_per_task (int) – number of tasks
Return type:: None

set_excluded_hosts(host_list: str | List[str]) → None#

Specify a list of hosts to exclude for launching this job

Parameters:: host_list (Union[str, List[str]]) – hosts to exclude
Return type:: None

set_hostlist(host_list: str | List[str]) → None#

Set the hostlist for the mpirun command

This sets --host

Parameters:: host_list (Union[str, List[str]]) – list of host names
Raises:: TypeError – if not str or list of str
Return type:: None

set_hostlist_from_file(file_path: str) → None#

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters:: file_path (str) – Path to the hostlist file
Return type:: None

set_memory_per_node(memory_per_node: int) → None#

Set the amount of memory required per node in megabytes

Parameters:: memory_per_node (int) – Number of megabytes per node
Return type:: None

set_mpmd_preamble(preamble_lines: List[str]) → None#

Set preamble to a file to make a job MPMD

Parameters:: preamble_lines (List[str]) – lines to put at the beginning of a file.
Return type:: None

set_node_feature(feature_list: str | List[str]) → None#

Specify the node feature for this job

Parameters:: feature_list (Union[str, List[str]]) – node feature to launch on
Return type:: None

set_nodes(nodes: int) → None#

Set the number of nodes

Parameters:: nodes (int) – number of nodes to run with
Return type:: None

set_quiet_launch(quiet: bool) → None#

Set the job to run in quiet mode

This sets --quiet

Parameters:: quiet (bool) – Whether the job should be run quietly
Return type:: None

set_task_map(task_mapping: str) → None#

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters:: task_mapping (str) – task mapping
Return type:: None

set_tasks(tasks: int) → None#

Set the number of tasks for this job

This sets -n for MPI compliant implementations

Parameters:: tasks (int) – number of tasks
Return type:: None

set_tasks_per_node(tasks_per_node: int) → None#

Set the number of tasks per node

Parameters:: tasks_per_node (int) – number of tasks to launch per node
Return type:: None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) → None#

Automatically format and set wall time

Parameters:

hours (int, default: 0) – number of hours to run job
minutes (int, default: 0) – number of minutes to run job
seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) → None#

Set the job to run in verbose mode

This sets --verbose

Parameters:: verbose (bool) – Whether the job should be run verbosely
Return type:: None

set_walltime(walltime: str) → None#

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters:: walltime (str) – number like string of seconds that a job will run in secs
Return type:: None

update_env(env_vars: Dict[str, str | int | float | bool]) → None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:: env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add
Raises:: TypeError – if env_vars values cannot be coerced to strings
Return type:: None

SbatchSettings#

SbatchSettings are used for launching batches onto Slurm WLM systems.

`SbatchSettings.set_account`(account)	Set the account for this batch job
`SbatchSettings.set_batch_command`(command)	Set the command used to launch the batch e.g.
`SbatchSettings.set_nodes`(num_nodes)	Set the number of nodes for this batch job
`SbatchSettings.set_hostlist`(host_list)	Specify the hostlist for this job
`SbatchSettings.set_partition`(partition)	Set the partition for the batch job
`SbatchSettings.set_queue`(queue)	alias for set_partition
`SbatchSettings.set_walltime`(walltime)	Set the walltime of the job
`SbatchSettings.format_batch_args`()	Get the formatted batch arguments for a preview

class SbatchSettings(nodes: int | None = None, time: str = '', account: str | None = None, batch_args: Dict[str, str | None] | None = None, **kwargs: Any) → None[source]#

Specify run parameters for a Slurm batch job

Slurm sbatch arguments can be written into batch_args as a dictionary. e.g. {‘ntasks’: 1}

If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}

Initialization values provided (nodes, time, account) will overwrite the same arguments in batch_args if present

Parameters:

nodes (Optional[int], default: None) – number of nodes
time (str, default: '') – walltime for job, e.g. “10:00:00” for 10 hours
account (Optional[str], default: None) – account for job
batch_args (Optional[Dict[str, Optional[str]]], default: None) – extra batch arguments

add_preamble(lines: List[str]) → None#

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters:: line – lines to add to preamble.
Return type:: None

property batch_args: Dict[str, str | None]#

Retrieve attached batch arguments

Returns:: attached batch arguments

property batch_cmd: str#

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns:: batch command

format_batch_args() → List[str][source]#

Get the formatted batch arguments for a preview

Return type:: List[str]
Returns:: batch arguments for Sbatch

property preamble: Iterable[str]#

Return an iterable of preamble clauses to be prepended to the batch file

Returns:: attached preamble clauses

set_account(account: str) → None[source]#

Set the account for this batch job

Parameters:: account (str) – account id
Return type:: None

set_batch_command(command: str) → None#

Set the command used to launch the batch e.g. sbatch

Parameters:: command (str) – batch command
Return type:: None

set_cpus_per_task(cpus_per_task: int) → None[source]#

Set the number of cpus to use per task

This sets --cpus-per-task

Parameters:: num_cpus – number of cpus to use per task
Return type:: None

set_hostlist(host_list: str | List[str]) → None[source]#

Specify the hostlist for this job

Parameters:: host_list (Union[str, List[str]]) – hosts to launch on
Raises:: TypeError – if not str or list of str
Return type:: None

set_nodes(num_nodes: int) → None[source]#

Set the number of nodes for this batch job

Parameters:: num_nodes (int) – number of nodes
Return type:: None

set_partition(partition: str) → None[source]#

Set the partition for the batch job

Parameters:: partition (str) – partition name
Return type:: None

set_queue(queue: str) → None[source]#

alias for set_partition

Sets the partition for the slurm batch job

Parameters:: queue (str) – the partition to run the batch job on
Return type:: None

set_walltime(walltime: str) → None[source]#

Set the walltime of the job

format = “HH:MM:SS”

Parameters:: walltime (str) – wall time
Return type:: None

QsubBatchSettings#

QsubBatchSettings are used to configure jobs that should be launched as a batch on PBSPro systems.

`QsubBatchSettings.set_account`(account)	Set the account for this batch job
`QsubBatchSettings.set_batch_command`(command)	Set the command used to launch the batch e.g.
`QsubBatchSettings.set_nodes`(num_nodes)	Set the number of nodes for this batch job
`QsubBatchSettings.set_ncpus`(num_cpus)	Set the number of cpus obtained in each node.
`QsubBatchSettings.set_queue`(queue)	Set the queue for the batch job
`QsubBatchSettings.set_resource`(...)	Set a resource value for the Qsub batch
`QsubBatchSettings.set_walltime`(walltime)	Set the walltime of the job
`QsubBatchSettings.format_batch_args`()	Get the formatted batch arguments for a preview

Specify qsub batch parameters for a job

nodes, and ncpus are used to create the select statement for PBS if a select statement is not included in the resources. If both are supplied the value for select statement supplied in resources will override.

Parameters:

nodes (Optional[int], default: None) – number of nodes for batch
ncpus (Optional[int], default: None) – number of cpus per node
time (Optional[str], default: None) – walltime for batch job
queue (Optional[str], default: None) – queue to run batch in
account (Optional[str], default: None) – account for batch launch
resources (Optional[Dict[str, Union[str, int]]], default: None) – overrides for resource arguments
batch_args (Optional[Dict[str, Optional[str]]], default: None) – overrides for PBS batch arguments

add_preamble(lines: List[str]) → None#

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters:: line – lines to add to preamble.
Return type:: None

property batch_args: Dict[str, str | None]#

Retrieve attached batch arguments

Returns:: attached batch arguments

property batch_cmd: str#

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns:: batch command

format_batch_args() → List[str][source]#

Get the formatted batch arguments for a preview

Return type:: List[str]
Returns:: batch arguments for Qsub
Raises:: ValueError – if options are supplied without values

property preamble: Iterable[str]#

Return an iterable of preamble clauses to be prepended to the batch file

Returns:: attached preamble clauses

property resources: Dict[str, str | int]#

set_account(account: str) → None[source]#

Set the account for this batch job

Parameters:: acct – account id
Return type:: None

set_batch_command(command: str) → None#

Set the command used to launch the batch e.g. sbatch

Parameters:: command (str) – batch command
Return type:: None

set_hostlist(host_list: str | List[str]) → None[source]#

Specify the hostlist for this job

Parameters:: host_list (Union[str, List[str]]) – hosts to launch on
Raises:: TypeError – if not str or list of str
Return type:: None

set_ncpus(num_cpus: int | str) → None[source]#

Set the number of cpus obtained in each node.

If a select argument is provided in QsubBatchSettings.resources, then this value will be overridden

Parameters:: num_cpus (Union[int, str]) – number of cpus per node in select
Return type:: None

set_nodes(num_nodes: int) → None[source]#

Set the number of nodes for this batch job

In PBS, ‘select’ is the more primitive way of describing how many nodes to allocate for the job. ‘nodes’ is equivalent to ‘select’ with a ‘place’ statement. Assuming that only advanced users would use ‘set_resource’ instead, defining the number of nodes here is sets the ‘nodes’ resource.

Parameters:: num_nodes (int) – number of nodes
Return type:: None

set_queue(queue: str) → None[source]#

Set the queue for the batch job

Parameters:: queue (str) – queue name
Return type:: None

set_resource(resource_name: str, value: str | int) → None[source]#

Set a resource value for the Qsub batch

If a select statement is provided, the nodes and ncpus arguments will be overridden. Likewise for Walltime

Parameters:

resource_name (str) – name of resource, e.g. walltime
value (Union[str, int]) – value

Return type:

None

set_walltime(walltime: str) → None[source]#

Set the walltime of the job

format = “HH:MM:SS”

If a walltime argument is provided in QsubBatchSettings.resources, then this value will be overridden

Parameters:: walltime (str) – wall time
Return type:: None

BsubBatchSettings#

BsubBatchSettings are used to configure jobs that should be launched as a batch on LSF systems.

`BsubBatchSettings.set_walltime`(walltime)	Set the walltime
`BsubBatchSettings.set_smts`(smts)	Set SMTs
`BsubBatchSettings.set_project`(project)	Set the project
`BsubBatchSettings.set_nodes`(num_nodes)	Set the number of nodes for this batch job
`BsubBatchSettings.set_expert_mode_req`(...)	Set allocation for expert mode.
`BsubBatchSettings.set_hostlist`(host_list)	Specify the hostlist for this job
`BsubBatchSettings.set_tasks`(tasks)	Set the number of tasks for this job
`BsubBatchSettings.format_batch_args`()	Get the formatted batch arguments for a preview

Specify bsub batch parameters for a job

Parameters:

nodes (Optional[int], default: None) – number of nodes for batch
time (Optional[str], default: None) – walltime for batch job in format hh:mm
project (Optional[str], default: None) – project for batch launch
batch_args (Optional[Dict[str, Optional[str]]], default: None) – overrides for LSF batch arguments
smts (int, default: 0) – SMTs

add_preamble(lines: List[str]) → None#

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters:: line – lines to add to preamble.
Return type:: None

property batch_args: Dict[str, str | None]#

Retrieve attached batch arguments

Returns:: attached batch arguments

property batch_cmd: str#

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns:: batch command

format_batch_args() → List[str][source]#

Get the formatted batch arguments for a preview

Return type:: List[str]
Returns:: list of batch arguments for Qsub

property preamble: Iterable[str]#

Return an iterable of preamble clauses to be prepended to the batch file

Returns:: attached preamble clauses

set_account(account: str) → None[source]#

Set the project

this function is an alias for set_project.

Parameters:: account (str) – project name
Return type:: None

set_batch_command(command: str) → None#

Set the command used to launch the batch e.g. sbatch

Parameters:: command (str) – batch command
Return type:: None

set_expert_mode_req(res_req: str, slots: int) → None[source]#

Set allocation for expert mode. This will activate expert mode (-csm) and disregard all other allocation options.

This sets -csm -n slots -R res_req

Parameters:

res_req (str) – specific resource requirements
slots (int) – number of resources to allocate

Return type:

None

set_hostlist(host_list: str | List[str]) → None[source]#

Specify the hostlist for this job

Parameters:: host_list (Union[str, List[str]]) – hosts to launch on
Raises:: TypeError – if not str or list of str
Return type:: None

set_nodes(num_nodes: int) → None[source]#

Set the number of nodes for this batch job

This sets -nnodes.

Parameters:: nodes – number of nodes
Return type:: None

set_project(project: str) → None[source]#

Set the project

This sets -P.

Parameters:: time – project name
Return type:: None

set_queue(queue: str) → None[source]#

Set the queue for this job

Parameters:: queue (str) – The queue to submit the job on
Return type:: None

set_smts(smts: int) → None[source]#

Set SMTs

This sets -alloc_flags. If the user sets SMT explicitly through -alloc_flags, then that takes precedence.

Parameters:: smts (int) – SMT (e.g on Summit: 1, 2, or 4)
Return type:: None

set_tasks(tasks: int) → None[source]#

Set the number of tasks for this job

This sets -n

Parameters:: tasks (int) – number of tasks
Return type:: None

set_walltime(walltime: str) → None[source]#

Set the walltime

This sets -W.

Parameters:: walltime (str) – Time in hh:mm format, e.g. “10:00” for 10 hours, if time is supplied in hh:mm:ss format, seconds will be ignored and walltime will be set as hh:mm
Return type:: None

Singularity#

Singularity is a type of Container that can be passed to a RunSettings class or child class to enable running the workload in a container.

class Singularity(*args: Any, **kwargs: Any) → None[source]#

Singularity (apptainer) container type. To be passed into a RunSettings class initializer or Experiment.create_run_settings.

Note

Singularity integration is currently tested with Apptainer 1.0 with slurm and PBS workload managers only.

Also, note that user-defined bind paths (mount argument) may be disabled by a system administrator

Parameters:

image – local or remote path to container image, e.g. docker://sylabsio/lolcow
args (Any) – arguments to ‘singularity exec’ command
mount – paths to mount (bind) from host machine into image.

Orchestrator#

`Orchestrator.__init__`([path, port, ...])	Initialize an `Orchestrator` reference for local launch
`Orchestrator.db_identifier`	Return the DB identifier, which is common to a DB and all of its nodes
`Orchestrator.num_shards`	Return the number of DB shards contained in the Orchestrator.
`Orchestrator.db_nodes`	Read only property for the number of nodes an `Orchestrator` is launched across.
`Orchestrator.hosts`	Return the hostnames of Orchestrator instance hosts
`Orchestrator.reset_hosts`()	Clear hosts or reset them to last user choice
`Orchestrator.remove_stale_files`()	Can be used to remove database files of a previous launch
`Orchestrator.get_address`()	Return database addresses
`Orchestrator.is_active`()	Check if the database is active
`Orchestrator.set_cpus`(num_cpus)	Set the number of CPUs available to each database shard
`Orchestrator.set_walltime`(walltime)	Set the batch walltime of the orchestrator
`Orchestrator.set_hosts`(host_list)	Specify the hosts for the `Orchestrator` to launch on
`Orchestrator.set_batch_arg`(arg[, value])	Set a batch argument the orchestrator should launch with
`Orchestrator.set_run_arg`(arg[, value])	Set a run argument the orchestrator should launch each node with (it will be passed to jrun)
`Orchestrator.enable_checkpoints`(frequency)	Sets the database's save configuration to save the DB every 'frequency' seconds given that at least one write operation against the DB occurred in that time.
`Orchestrator.set_max_memory`(mem)	Sets the max memory configuration.
`Orchestrator.set_eviction_strategy`(strategy)	Sets how the database will select what to remove when 'maxmemory' is reached.
`Orchestrator.set_max_clients`([clients])	Sets the max number of connected clients at the same time.
`Orchestrator.set_max_message_size`([size])	Sets the database's memory size limit for bulk requests, which are elements representing single strings.
`Orchestrator.set_db_conf`(key, value)	Set any valid configuration at runtime without the need to restart the database.
`Orchestrator.telemetry`	Return the telemetry configuration for this entity.
`Orchestrator.checkpoint_file`	Get the path to the checkpoint file for this Orchestrator
`Orchestrator.batch`	Property indicating whether or not the entity sequence should be launched as a batch job

Orchestrator#

class Orchestrator(path: str | None = '/usr/local/src/SmartSim/doc', port: int = 6379, interface: str | List[str] = 'lo', launcher: str = 'local', run_command: str = 'auto', db_nodes: int = 1, batch: bool = False, hosts: str | List[str] | None = None, account: str | None = None, time: str | None = None, alloc: str | None = None, single_cmd: bool = False, *, threads_per_queue: int | None = None, inter_op_threads: int | None = None, intra_op_threads: int | None = None, db_identifier: str = 'orchestrator', **kwargs: Any) → None[source]#

The Orchestrator is an in-memory database that can be launched alongside entities in SmartSim. Data can be transferred between entities by using one of the Python, C, C++ or Fortran clients within an entity.

Initialize an Orchestrator reference for local launch

Extra configurations for RedisAI

Parameters:

path (Optional[str], default: '/usr/local/src/SmartSim/doc') – path to location of Orchestrator directory
port (int, default: 6379) – TCP/IP port
interface (Union[str, List[str]], default: 'lo') – network interface(s)
launcher (str, default: 'local') – type of launcher being used, options are “slurm”, “pbs”, “lsf”, or “local”. If set to “auto”, an attempt will be made to find an available launcher on the system.
run_command (str, default: 'auto') – specify launch binary or detect automatically
db_nodes (int, default: 1) – number of database shards
batch (bool, default: False) – run as a batch workload
hosts (Union[str, List[str], None], default: None) – specify hosts to launch on
account (Optional[str], default: None) – account to run batch on
time (Optional[str], default: None) – walltime for batch ‘HH:MM:SS’ format
alloc (Optional[str], default: None) – allocation to launch database on
single_cmd (bool, default: False) – run all shards with one (MPMD) command
threads_per_queue (Optional[int], default: None) – threads per GPU device
inter_op_threads (Optional[int], default: None) – threads across CPU operations
intra_op_threads (Optional[int], default: None) – threads per CPU operation
db_identifier (str, default: 'orchestrator') – an identifier to distinguish this orchestrator in multiple-database experiments

property batch: bool#

Property indicating whether or not the entity sequence should be launched as a batch job

Returns:: True if entity sequence should be launched as a batch job, False if the members will be launched individually.

property checkpoint_file: str#

Get the path to the checkpoint file for this Orchestrator

Returns:: Path to the checkpoint file if it exists, otherwise a None

property db_identifier: str#

Return the DB identifier, which is common to a DB and all of its nodes

Returns:: DB identifier

property db_models: Iterable[smartsim.entity.DBModel]#: Return an immutable collection of attached models

property db_nodes: int#

Read only property for the number of nodes an Orchestrator is launched across. Notice that SmartSim currently assumes that each shard will be launched on its own node. Therefore this property is currently an alias to the num_shards attribute.

Returns:: Number of database nodes

property db_scripts: Iterable[smartsim.entity.DBScript]#: Return an immutable collection of attached scripts

enable_checkpoints(frequency: int) → None[source]#

Sets the database’s save configuration to save the DB every ‘frequency’ seconds given that at least one write operation against the DB occurred in that time. E.g., if frequency is 900, then the database will save to disk after 900 seconds if there is at least 1 change to the dataset.

Parameters:: frequency (int) – the given number of seconds before the DB saves
Return type:: None

get_address() → List[str][source]#

Return database addresses

Return type:: List[str]
Returns:: addresses
Raises:: SmartSimError – If database address cannot be found or is not active

property hosts: List[str]#

Return the hostnames of Orchestrator instance hosts

Note that this will only be populated after the orchestrator has been launched by SmartSim.

Returns:: the hostnames of Orchestrator instance hosts

is_active() → bool[source]#

Check if the database is active

Return type:: bool
Returns:: True if database is active, False otherwise

property num_shards: int#

Return the number of DB shards contained in the Orchestrator. This might differ from the number of DBNode objects, as each DBNode may start more than one shard (e.g. with MPMD).

Returns:: the number of DB shards contained in the Orchestrator

remove_stale_files() → None[source]#

Can be used to remove database files of a previous launch

Return type:: None

reset_hosts() → None[source]#

Clear hosts or reset them to last user choice

Return type:: None

set_batch_arg(arg: str, value: str | None = None) → None[source]#

Set a batch argument the orchestrator should launch with

Some commonly used arguments such as –job-name are used by SmartSim and will not be allowed to be set.

Parameters:

arg (str) – batch argument to set e.g. “exclusive”
value (Optional[str], default: None) – batch param - set to None if no param value

Raises:

SmartSimError – if orchestrator not launching as batch

Return type:

None

set_cpus(num_cpus: int) → None[source]#

Set the number of CPUs available to each database shard

This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.

Parameters:: num_cpus (int) – number of cpus to set
Return type:: None

set_db_conf(key: str, value: str) → None[source]#

Set any valid configuration at runtime without the need to restart the database. All configuration parameters that are set are immediately loaded by the database and will take effect starting with the next command executed.

Parameters:

key (str) – the configuration parameter
value (str) – the database configuration parameter’s new value

Return type:

None

set_eviction_strategy(strategy: str) → None[source]#

Sets how the database will select what to remove when ‘maxmemory’ is reached. The default is noeviction.

Parameters:

strategy (str) – The max memory policy to use e.g. “volatile-lru”, “allkeys-lru”, etc.

Raises:

SmartSimError – If ‘strategy’ is an invalid maxmemory policy
SmartSimError – If database is not active

Return type:

None

set_hosts(host_list: List[str] | str) → None[source]#

Specify the hosts for the Orchestrator to launch on

Parameters:: host_list (Union[List[str], str]) – list of host (compute node names)
Raises:: TypeError – if wrong type
Return type:: None

set_max_clients(clients: int = 50000) → None[source]#

Sets the max number of connected clients at the same time. When the number of DB shards contained in the orchestrator is more than two, then every node will use two connections, one incoming and another outgoing.

Parameters:: clients (int, default: 50000) – the maximum number of connected clients
Return type:: None

set_max_memory(mem: str) → None[source]#

Sets the max memory configuration. By default there is no memory limit. Setting max memory to zero also results in no memory limit. Once a limit is surpassed, keys will be removed according to the eviction strategy. The specified memory size is case insensitive and supports the typical forms of:

1k => 1000 bytes

1kb => 1024 bytes

1m => 1000000 bytes

1mb => 1024*1024 bytes

1g => 1000000000 bytes

1gb => 1024*1024*1024 bytes

Parameters:

mem (str) – the desired max memory size e.g. 3gb

Raises:

SmartSimError – If ‘mem’ is an invalid memory value
SmartSimError – If database is not active

Return type:

None

set_max_message_size(size: int = 1073741824) → None[source]#

Sets the database’s memory size limit for bulk requests, which are elements representing single strings. The default is 1 gigabyte. Message size must be greater than or equal to 1mb. The specified memory size should be an integer that represents the number of bytes. For example, to set the max message size to 1gb, use 1024*1024*1024.

Parameters:: size (int, default: 1073741824) – maximum message size in bytes
Return type:: None

set_path(new_path: str) → None#

Return type:: None

set_run_arg(arg: str, value: str | None = None) → None[source]#

Set a run argument the orchestrator should launch each node with (it will be passed to jrun)

Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “n”, “N”, etc.

Parameters:

arg (str) – run argument to set
value (Optional[str], default: None) – run parameter - set to None if no parameter value

Return type:

None

set_walltime(walltime: str) → None[source]#

Set the batch walltime of the orchestrator

Note: This will only effect orchestrators launched as a batch

Parameters:: walltime (str) – amount of time e.g. 10 hours is 10:00:00
Raises:: SmartSimError – if orchestrator isn’t launching as batch
Return type:: None

property telemetry: TelemetryConfiguration#

Return the telemetry configuration for this entity.

Returns:: configuration of telemetry for this entity

property type: str#: Return the name of the class

Model#

`Model.__init__`(name, params, run_settings[, ...])	Initialize a `Model`
`Model.attach_generator_files`([to_copy, ...])	Attach files to an entity for generation
`Model.colocate_db`(args, *kwargs)	An alias for `Model.colocate_db_tcp`
`Model.colocate_db_tcp`([port, ifname, ...])	Colocate an Orchestrator instance with this Model over TCP/IP.
`Model.colocate_db_uds`([unix_socket, ...])	Colocate an Orchestrator instance with this Model over UDS.
`Model.colocated`	Return True if this Model will run with a colocated Orchestrator
`Model.add_ml_model`(name, backend[, model, ...])	A TF, TF-lite, PT, or ONNX model to load into the DB at runtime
`Model.add_script`(name[, script, ...])	TorchScript to launch with this Model instance
`Model.add_function`(name[, function, device, ...])	TorchScript function to launch with this Model instance
`Model.params_to_args`()	Convert parameters to command line arguments and update run settings.
`Model.register_incoming_entity`(incoming_entity)	Register future communication between entities.
`Model.enable_key_prefixing`()	If called, the entity will prefix its keys with its own model name
`Model.disable_key_prefixing`()	If called, the entity will not prefix its keys with its own model name
`Model.query_key_prefixing`()	Inquire as to whether this entity will prefix its keys with its name

Model#

class Model(name: str, params: Dict[str, str], run_settings: smartsim.settings.base.RunSettings, path: str | None = '/usr/local/src/SmartSim/doc', params_as_args: List[str] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None)[source]#

Bases: SmartSimEntity

Initialize a Model

Parameters:

name (str) – name of the model
params (Dict[str, str]) – model parameters for writing into configuration files or to be passed as command line arguments to executable.
path (Optional[str], default: '/usr/local/src/SmartSim/doc') – path to output, error, and configuration files
run_settings (RunSettings) – launcher settings specified in the experiment
params_as_args (Optional[List[str]], default: None) – list of parameters which have to be interpreted as command line arguments to be added to run_settings
batch_settings (Optional[BatchSettings], default: None) – Launcher settings for running the individual model as a batch job

add_function(name: str, function: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) → None[source]#

TorchScript function to launch with this Model instance

Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of this Model instance.

For converged orchestrators, the add_script() method should be used.

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the model being stored in the first N devices of type device.

Parameters:

name (str) – key to store function under
function (Optional[str], default: None) – TorchScript function code
device (str, default: 'CPU') – device for script execution
devices_per_node (int, default: 1) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
first_device (int, default: 0) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

Return type:

None

add_ml_model(name: str, backend: str, model: bytes | None = None, model_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0, batch_size: int = 0, min_batch_size: int = 0, min_batch_timeout: int = 0, tag: str = '', inputs: List[str] | None = None, outputs: List[str] | None = None) → None[source]#

A TF, TF-lite, PT, or ONNX model to load into the DB at runtime

Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance

One of either model (in memory representation) or model_path (file) must be provided

Parameters:

name (str) – key to store model under
backend (str) – name of the backend (TORCH, TF, TFLITE, ONNX)
model (Optional[bytes], default: None) – A model in memory (only supported for non-colocated orchestrators)
model_path (Optional[str], default: None) – serialized model
device (str, default: 'CPU') – name of device for execution
devices_per_node (int, default: 1) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
first_device (int, default: 0) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
batch_size (int, default: 0) – batch size for execution
min_batch_size (int, default: 0) – minimum batch size for model execution
min_batch_timeout (int, default: 0) – time to wait for minimum batch size
tag (str, default: '') – additional tag for model information
inputs (Optional[List[str]], default: None) – model inputs (TF only)
outputs (Optional[List[str]], default: None) – model outupts (TF only)

Return type:

None

add_script(name: str, script: str | None = None, script_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) → None[source]#

TorchScript to launch with this Model instance

Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the script being stored in the first N devices of type device; alternatively, setting first_device=M will result in the script being stored on nodes M through M + N - 1.

One of either script (in memory string representation) or script_path (file) must be provided

Parameters:

name (str) – key to store script under
script (Optional[str], default: None) – TorchScript code (only supported for non-colocated orchestrators)
script_path (Optional[str], default: None) – path to TorchScript code
device (str, default: 'CPU') – device for script execution
devices_per_node (int, default: 1) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
first_device (int, default: 0) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

Return type:

None

attach_generator_files(to_copy: List[str] | None = None, to_symlink: List[str] | None = None, to_configure: List[str] | None = None) → None[source]#

Attach files to an entity for generation

Attach files needed for the entity that, upon generation, will be located in the path of the entity. Invoking this method after files have already been attached will overwrite the previous list of entity files.

During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.

Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;

Parameters:

to_copy (Optional[List[str]], default: None) – files to copy
to_symlink (Optional[List[str]], default: None) – files to symlink
to_configure (Optional[List[str]], default: None) – input files with tagged parameters

Return type:

None

property attached_files_table: str#

Return a list of attached files as a plain text table

Returns:: String version of table

colocate_db(*args: Any, **kwargs: Any) → None[source]#

An alias for Model.colocate_db_tcp

Return type:: None

colocate_db_tcp(port: int = 6379, ifname: str | list[str] = 'lo', db_cpus: int = 1, custom_pinning: Iterable[int | Iterable[int]] | None = None, debug: bool = False, db_identifier: str = '', **kwargs: Any) → None[source]#

Colocate an Orchestrator instance with this Model over TCP/IP.

This method will initialize settings which add an unsharded database to this Model instance. Only this Model will be able to communicate with this colocated database by using the loopback TCP interface.

Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.

ex. kwargs = {
    maxclients: 100000,
    threads_per_queue: 1,
    inter_op_threads: 1,
    intra_op_threads: 1,
    server_threads: 2 # keydb only
}

Generally these don’t need to be changed.

Parameters:

port (int, default: 6379) – port to use for orchestrator database
ifname (Union[str, list[str]], default: 'lo') – interface to use for orchestrator
db_cpus (int, default: 1) – number of cpus to use for orchestrator
custom_pinning (Optional[Iterable[Union[int, Iterable[int]]]], default: None) – CPUs to pin the orchestrator to. Passing an empty iterable disables pinning
debug (bool, default: False) – launch Model with extra debug information about the colocated db
kwargs (Any) – additional keyword arguments to pass to the orchestrator database

Return type:

None

colocate_db_uds(unix_socket: str = '/tmp/redis.socket', socket_permissions: int = 755, db_cpus: int = 1, custom_pinning: Iterable[int | Iterable[int]] | None = None, debug: bool = False, db_identifier: str = '', **kwargs: Any) → None[source]#

Colocate an Orchestrator instance with this Model over UDS.

This method will initialize settings which add an unsharded database to this Model instance. Only this Model will be able to communicate with this colocated database by using Unix Domain sockets.

Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.

example_kwargs = {
    "maxclients": 100000,
    "threads_per_queue": 1,
    "inter_op_threads": 1,
    "intra_op_threads": 1,
    "server_threads": 2 # keydb only
}

Generally these don’t need to be changed.

Parameters:

unix_socket (str, default: '/tmp/redis.socket') – path to where the socket file will be created
socket_permissions (int, default: 755) – permissions for the socketfile
db_cpus (int, default: 1) – number of cpus to use for orchestrator
custom_pinning (Optional[Iterable[Union[int, Iterable[int]]]], default: None) – CPUs to pin the orchestrator to. Passing an empty iterable disables pinning
debug (bool, default: False) – launch Model with extra debug information about the colocated db
kwargs (Any) – additional keyword arguments to pass to the orchestrator database

Return type:

None

property colocated: bool#

Return True if this Model will run with a colocated Orchestrator

Returns:: Return True of the Model will run with a colocated Orchestrator

property db_models: Iterable[DBModel]#

Retrieve an immutable collection of attached models

Returns:: Return an immutable collection of attached models

property db_scripts: Iterable[DBScript]#

Retrieve an immutable collection attached of scripts

Returns:: Return an immutable collection of attached scripts

disable_key_prefixing() → None[source]#

If called, the entity will not prefix its keys with its own model name

Return type:: None

enable_key_prefixing() → None[source]#

If called, the entity will prefix its keys with its own model name

Return type:: None

params_to_args() → None[source]#

Convert parameters to command line arguments and update run settings.

Return type:: None

print_attached_files() → None[source]#

Print a table of the attached files on std out

Return type:: None

query_key_prefixing() → bool[source]#

Inquire as to whether this entity will prefix its keys with its name

Return type:: bool
Returns:: Return True if entity will prefix its keys with its name

register_incoming_entity(incoming_entity: smartsim.entity.entity.SmartSimEntity) → None[source]#

Register future communication between entities.

Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity

Parameters:: incoming_entity (SmartSimEntity) – The entity that data will be received from
Raises:: SmartSimError – if incoming entity has already been registered
Return type:: None

property type: str#: Return the name of the class

Ensemble#

`Ensemble.__init__`(name, params[, path, ...])	Initialize an Ensemble of Model instances.
`Ensemble.add_model`(model)	Add a model to this ensemble
`Ensemble.add_ml_model`(name, backend[, ...])	A TF, TF-lite, PT, or ONNX model to load into the DB at runtime
`Ensemble.add_script`(name[, script, ...])	TorchScript to launch with every entity belonging to this ensemble
`Ensemble.add_function`(name[, function, ...])	TorchScript function to launch with every entity belonging to this ensemble
`Ensemble.attach_generator_files`([to_copy, ...])	Attach files to each model within the ensemble for generation
`Ensemble.enable_key_prefixing`()	If called, each model within this ensemble will prefix its key with its own model name.
`Ensemble.models`	An alias for a shallow copy of the `entities` attribute
`Ensemble.query_key_prefixing`()	Inquire as to whether each model within the ensemble will prefix their keys
`Ensemble.register_incoming_entity`(...)	Register future communication between entities.

Ensemble#

class Ensemble(name: str, params: Dict[str, Any], path: str | None = '/usr/local/src/SmartSim/doc', params_as_args: List[str] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None, run_settings: smartsim.settings.base.RunSettings | None = None, perm_strat: str = 'all_perm', **kwargs: Any) → None[source]#

Bases: EntityList[Model]

Ensemble is a group of Model instances that can be treated as a reference to a single instance.

Initialize an Ensemble of Model instances.

The kwargs argument can be used to pass custom input parameters to the permutation strategy.

Parameters:

name (str) – name of the ensemble
params (Dict[str, Any]) – parameters to expand into Model members
params_as_args (Optional[List[str]], default: None) – list of params that should be used as command line arguments to the Model member executables and not written to generator files
batch_settings (Optional[BatchSettings], default: None) – describes settings for Ensemble as batch workload
run_settings (Optional[RunSettings], default: None) – describes how each Model should be executed
replicas – number of Model replicas to create - a keyword argument of kwargs
perm_strategy – strategy for expanding params into Model instances from params argument options are “all_perm”, “step”, “random” or a callable function.

Returns:

Ensemble instance

add_function(name: str, function: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) → None[source]#

TorchScript function to launch with every entity belonging to this ensemble

Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of every entity belonging to this ensemble.

For converged orchestrators, the add_script() method should be used.

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the script being stored in the first N devices of type device; alternatively, setting first_device=M will result in the script being stored on nodes M through M + N - 1.

Parameters:

name (str) – key to store function under
function (Optional[str], default: None) – TorchScript code
device (str, default: 'CPU') – device for script execution
devices_per_node (int, default: 1) – number of devices on each host
first_device (int, default: 0) – first device to use on each host

Return type:

None

add_ml_model(name: str, backend: str, model: bytes | None = None, model_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0, batch_size: int = 0, min_batch_size: int = 0, min_batch_timeout: int = 0, tag: str = '', inputs: List[str] | None = None, outputs: List[str] | None = None) → None[source]#

A TF, TF-lite, PT, or ONNX model to load into the DB at runtime

Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble

One of either model (in memory representation) or model_path (file) must be provided

Parameters:

name (str) – key to store model under
model (Optional[bytes], default: None) – model in memory
model_path (Optional[str], default: None) – serialized model
backend (str) – name of the backend (TORCH, TF, TFLITE, ONNX)
device (str, default: 'CPU') – name of device for execution
devices_per_node (int, default: 1) – number of GPUs per node in multiGPU nodes
first_device (int, default: 0) – first device in multi-GPU nodes to use for execution, defaults to 0; ignored if devices_per_node is 1
batch_size (int, default: 0) – batch size for execution
min_batch_size (int, default: 0) – minimum batch size for model execution
min_batch_timeout (int, default: 0) – time to wait for minimum batch size
tag (str, default: '') – additional tag for model information
inputs (Optional[List[str]], default: None) – model inputs (TF only)
outputs (Optional[List[str]], default: None) – model outupts (TF only)

Return type:

None

add_model(model: smartsim.entity.model.Model) → None[source]#

Add a model to this ensemble

Parameters:

model (Model) – model instance to be added

Raises:

TypeError – if model is not an instance of Model
EntityExistsError – if model already exists in this ensemble

Return type:

None

add_script(name: str, script: str | None = None, script_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) → None[source]#

TorchScript to launch with every entity belonging to this ensemble

Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the model being stored in the first N devices of type device.

One of either script (in memory string representation) or script_path (file) must be provided

Parameters:

name (str) – key to store script under
script (Optional[str], default: None) – TorchScript code
script_path (Optional[str], default: None) – path to TorchScript code
device (str, default: 'CPU') – device for script execution
devices_per_node (int, default: 1) – number of devices on each host
first_device (int, default: 0) – first device to use on each host

Return type:

None

attach_generator_files(to_copy: List[str] | None = None, to_symlink: List[str] | None = None, to_configure: List[str] | None = None) → None[source]#

Attach files to each model within the ensemble for generation

Attach files needed for the entity that, upon generation, will be located in the path of the entity.

During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.

Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;

Parameters:

to_copy (Optional[List[str]], default: None) – files to copy
to_symlink (Optional[List[str]], default: None) – files to symlink
to_configure (Optional[List[str]], default: None) – input files with tagged parameters

Return type:

None

property attached_files_table: str#

Return a plain-text table with information about files attached to models belonging to this ensemble.

Returns:: A table of all files attached to all models

property batch: bool#

Property indicating whether or not the entity sequence should be launched as a batch job

Returns:: True if entity sequence should be launched as a batch job, False if the members will be launched individually.

property db_models: Iterable[smartsim.entity.DBModel]#: Return an immutable collection of attached models

property db_scripts: Iterable[smartsim.entity.DBScript]#: Return an immutable collection of attached scripts

enable_key_prefixing() → None[source]#

If called, each model within this ensemble will prefix its key with its own model name.

Return type:: None

property models: Collection[Model]#: An alias for a shallow copy of the entities attribute

print_attached_files() → None[source]#

Print table of attached files to std out

Return type:: None

query_key_prefixing() → bool[source]#

Inquire as to whether each model within the ensemble will prefix their keys

Return type:: bool
Returns:: True if all models have key prefixing enabled, False otherwise

register_incoming_entity(incoming_entity: smartsim.entity.entity.SmartSimEntity) → None[source]#

Register future communication between entities.

Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity

Only python clients can have multiple incoming connections

Parameters:: incoming_entity (SmartSimEntity) – The entity that data will be received from
Return type:: None

property type: str#: Return the name of the class

Machine Learning#

SmartSim includes built-in utilities for supporting TensorFlow, Keras, and Pytorch.

TensorFlow#

SmartSim includes built-in utilities for supporting TensorFlow and Keras in training and inference.

freeze_model(model: keras.src.models.model.Model, output_dir: str, file_name: str) → Tuple[str, List[str], List[str]][source]#

Freeze a Keras or TensorFlow Graph

to use a Keras or TensorFlow model in SmartSim, the model must be frozen and the inputs and outputs provided to the smartredis.client.set_model_from_file() method.

This utiliy function provides everything users need to take a trained model and put it inside an orchestrator instance

Parameters:

model (Model) – TensorFlow or Keras model
output_dir (str) – output dir to save model file to
file_name (str) – name of model file to create

Return type:

Tuple[str, List[str], List[str]]

Returns:

path to model file, model input layer names, model output layer names

serialize_model(model: keras.src.models.model.Model) → Tuple[str, List[str], List[str]][source]#

Serialize a Keras or TensorFlow Graph

to use a Keras or TensorFlow model in SmartSim, the model must be frozen and the inputs and outputs provided to the smartredis.client.set_model() method.

This utiliy function provides everything users need to take a trained model and put it inside an orchestrator instance.

Parameters:: model (Model) – TensorFlow or Keras model
Return type:: Tuple[str, List[str], List[str]]
Returns:: serialized model, model input layer names, model output layer names

class StaticDataGenerator(**kwargs: Any) → None[source]#

Bases: _TFDataGenerationCommon

A class to download a dataset from the DB.

Details about parameters and features of this class can be found in the documentation of DataDownloader, of which it is just a TensorFlow-specialized sub-class with dynamic=False.

init_samples(init_trials: int = -1, wait_interval: float = 10.0) → None#

Initialize samples (and targets, if needed).

A new attempt to download samples will be made every ten seconds, for init_trials times.

Parameters:: init_trials (int, default: -1) – maximum number of attempts to fetch data
Return type:: None

property need_targets: bool#

Compute if targets have to be downloaded.

Returns:: Whether targets (or labels) should be downloaded

property num_batches#

Number of batches in the PyDataset.

Returns:: The number of batches in the PyDataset or None to indicate that the dataset is infinite.

on_epoch_begin()#: Method called at the beginning of every epoch.

on_epoch_end() → None#

Callback called at the end of each training epoch

If self.shuffle is set to True, data is shuffled.

Return type:: None

class DynamicDataGenerator(**kwargs: Any) → None[source]#

Bases: _TFDataGenerationCommon

A class to download batches from the DB.

Details about parameters and features of this class can be found in the documentation of DataDownloader, of which it is just a TensorFlow-specialized sub-class with dynamic=True.

init_samples(init_trials: int = -1, wait_interval: float = 10.0) → None#

Initialize samples (and targets, if needed).

A new attempt to download samples will be made every ten seconds, for init_trials times.

Parameters:: init_trials (int, default: -1) – maximum number of attempts to fetch data
Return type:: None

property need_targets: bool#

Compute if targets have to be downloaded.

Returns:: Whether targets (or labels) should be downloaded

property num_batches#

Number of batches in the PyDataset.

Returns:: The number of batches in the PyDataset or None to indicate that the dataset is infinite.

on_epoch_begin()#: Method called at the beginning of every epoch.

on_epoch_end() → None[source]#

Callback called at the end of each training epoch

Update data (the DB is queried for new batches) and if self.shuffle is set to True, data is also shuffled.

Return type:: None

PyTorch#

SmartSim includes built-in utilities for supporting PyTorch in training and inference.

class StaticDataGenerator(**kwargs: Any) → None[source]#

Bases: _TorchDataGenerationCommon

A class to download a dataset from the DB.

Details about parameters and features of this class can be found in the documentation of DataDownloader, of which it is just a PyTorch-specialized sub-class with dynamic=False and init_samples=False.

When used in the DataLoader defined in this class, samples are initialized automatically before training. Other data loaders using this generator should implement the same behavior.

init_samples(init_trials: int = -1, wait_interval: float = 10.0) → None#

Initialize samples (and targets, if needed).

A new attempt to download samples will be made every ten seconds, for init_trials times.

Parameters:: init_trials (int, default: -1) – maximum number of attempts to fetch data
Return type:: None

property need_targets: bool#

Compute if targets have to be downloaded.

Returns:: Whether targets (or labels) should be downloaded

class DynamicDataGenerator(**kwargs: Any) → None[source]#

Bases: _TorchDataGenerationCommon

A class to download batches from the DB.

Details about parameters and features of this class can be found in the documentation of DataDownloader, of which it is just a PyTorch-specialized sub-class with dynamic=True and init_samples=False.

When used in the DataLoader defined in this class, samples are initialized automatically before training. Other data loaders using this generator should implement the same behavior.

init_samples(init_trials: int = -1, wait_interval: float = 10.0) → None#

Initialize samples (and targets, if needed).

A new attempt to download samples will be made every ten seconds, for init_trials times.

Parameters:: init_trials (int, default: -1) – maximum number of attempts to fetch data
Return type:: None

property need_targets: bool#

Compute if targets have to be downloaded.

Returns:: Whether targets (or labels) should be downloaded

class DataLoader(dataset: smartsim.ml.torch.data._TorchDataGenerationCommon, **kwargs: Any) → None[source]#

Bases: DataLoader

DataLoader to be used as a wrapper of StaticDataGenerator or DynamicDataGenerator

This is just a sub-class of torch.utils.data.DataLoader which sets up sources of a data generator correctly. DataLoader parameters such as num_workers can be passed at initialization. batch_size should always be set to None.

Slurm#

`get_allocation`([nodes, time, account, options])	Request an allocation
`release_allocation`(alloc_id)	Free an allocation's resources
`validate`([nodes, ppn, partition])	Check that there are sufficient resources in the provided Slurm partitions.
`get_default_partition`()	Returns the default partition from Slurm
`get_hosts`()	Get the name of the nodes used in a slurm allocation.
`get_queue`()	Get the name of queue in a slurm allocation.
`get_tasks`()	Get the number of tasks in a slurm allocation.
`get_tasks_per_node`()	Get the number of tasks per each node in a slurm allocation.

get_allocation(nodes: int = 1, time: str | None = None, account: str | None = None, options: Dict[str, str] | None = None) → str[source]#

Request an allocation

This function requests an allocation with the specified arguments. Anything passed to the options will be processed as a Slurm argument and appended to the salloc command with the appropriate prefix (e.g. “-” or “–“).

The options can be used to pass extra settings to the workload manager such as the following for Slurm:

nodelist=”nid00004”

For arguments without a value, pass None or and empty string as the value. For Slurm:

exclusive=None

Parameters:

nodes (int, default: 1) – number of nodes for the allocation
time (Optional[str], default: None) – wall time of the allocation, HH:MM:SS format
account (Optional[str], default: None) – account id for allocation
options (Optional[Dict[str, str]], default: None) – additional options for the slurm wlm

Raises:

LauncherError – if the allocation is not successful

Return type:

str

Returns:

the id of the allocation

get_default_partition() → str[source]#

Returns the default partition from Slurm

This default partition is assumed to be the partition with a star following its partition name in sinfo output

Return type:: str
Returns:: the name of the default partition

get_hosts() → List[str][source]#

Get the name of the nodes used in a slurm allocation.

Note

This method requires access to scontrol from the node on which it is run

Return type:

List[str]

Returns:

Names of the host nodes

Raises:

LauncherError – Could not access scontrol
SmartSimError – SLURM_JOB_NODELIST is not set

get_queue() → str[source]#

Get the name of queue in a slurm allocation.

Return type:: str
Returns:: The name of the queue
Raises:: SmartSimError – SLURM_JOB_PARTITION is not set

get_tasks() → int[source]#

Get the number of tasks in a slurm allocation.

Return type:: int
Returns:: Then number of tasks in the allocation
Raises:: SmartSimError – SLURM_NTASKS is not set

get_tasks_per_node() → Dict[str, int][source]#

Get the number of tasks per each node in a slurm allocation.

Note

This method requires access to scontrol from the node on which it is run

Return type:: Dict[str, int]
Returns:: Map of nodes to number of tasks on that node
Raises:: SmartSimError – SLURM_TASKS_PER_NODE is not set

release_allocation(alloc_id: str) → None[source]#

Free an allocation’s resources

Parameters:: alloc_id (str) – allocation id
Raises:: LauncherError – if allocation could not be freed
Return type:: None

validate(nodes: int = 1, ppn: int = 1, partition: str | None = None) → bool[source]#

Check that there are sufficient resources in the provided Slurm partitions.

if no partition is provided, the default partition is found and used.

Parameters:

nodes (int, default: 1) – Override the default node count to validate
ppn (int, default: 1) – Override the default processes per node to validate
partition (Optional[str], default: None) – partition to validate

Raises:

LauncherError

Return type:

bool

Returns:

True if resources are available, False otherwise

SmartSim API

Contents

SmartSim API#

Experiment#

Settings#

RunSettings#

SrunSettings#

AprunSettings#

DragonRunSettings#

JsrunSettings#

MpirunSettings#

MpiexecSettings#

OrterunSettings#

SbatchSettings#

QsubBatchSettings#

BsubBatchSettings#

Singularity#

Orchestrator#

Orchestrator#

Model#

Model#

Ensemble#

Ensemble#

Machine Learning#

TensorFlow#

PyTorch#

Slurm#