SmartSim API¶

Experiment¶

`Experiment.__init__`(name[, exp_path, launcher])	Example initialization
`Experiment.start`(*args[, block, summary])	Launch instances passed as arguments
`Experiment.stop`(*args)	Stop specific instances launched by this `Experiment`
`Experiment.create_ensemble`(name[, params, …])	Create an `Ensemble` of `Model` instances
`Experiment.create_model`(name, run_settings)	Create a `Model` By default, all `Model` instances start with the cwd as their path unless specified.
`Experiment.generate`(*args[, tag, overwrite])	Generate the file structure for an `Experiment`
`Experiment.poll`([interval, verbose])	Monitor jobs through logging to stdout.
`Experiment.finished`(entity)	Query if a job has completed
`Experiment.get_status`(*args)	Query the status of the specific job(s)
`Experiment.reconnect_orchestrator`(checkpoint)	Reconnect to a running `Orchestrator`
`Experiment.summary`()	Return a summary of the `Experiment`

class Experiment(name, exp_path=None, launcher='local')[source]¶

Bases: object

Experiments are the main user interface in SmartSim.

Experiments can create instances to launch called Model and Ensemble. Through the Experiment interface, users can programmatically create, configure, start, stop, poll and query the instances they create.

Example initialization

exp = Experiment(name="my_exp", launcher="local")

Parameters

name (str) – name for the Experiment
exp_path (str, optional) – path to location of Experiment directory if generated
launcher (str, optional) – type of launcher being used, options are “slurm”, “pbs”, “cobalt”, “lsf”, or “local”. Defaults to “local”

create_ensemble(name, params=None, batch_settings=None, run_settings=None, replicas=None, perm_strategy='all_perm', **kwargs)[source]¶

Create an Ensemble of Model instances

Ensembles can be launched sequentially or as a batch if using a non-local launcher. e.g. slurm

Ensembles require one of the following combinations of arguments

run_settings and params

run_settings and replicas

batch_settings

batch_settings, run_settings, and params

batch_settings, run_settings, and replicas

If given solely batch settings, an empty ensemble will be created that models can be added to manually through Ensemble.add_model(). The entire ensemble will launch as one batch.

Provided batch and run settings, either params or replicas must be passed and the entire ensemble will launch as a single batch.

Provided solely run settings, either params or replicas must be passed and the ensemble members will each launch sequentially.

The kwargs argument can be used to pass custom input parameters to the permutation strategy.

Parameters

name (str) – name of the ensemble
params (dict[str, Any]) – parameters to expand into Model members
batch_settings (BatchSettings) – describes settings for Ensemble as batch workload
run_settings (RunSettings) – describes how each Model should be executed
replicas (int) – number of replicas to create
perm_strategy (str, optional) – strategy for expanding params into Model instances from params argument options are “all_perm”, “stepped”, “random” or a callable function. Default is “all_perm”.

Raises

SmartSimError – if initialization fails

Returns

Ensemble instance

Return type

Ensemble

create_model(name, run_settings, params=None, path=None, enable_key_prefixing=False)[source]¶

Create a Model By default, all Model instances start with the cwd as their path unless specified. Regardless of if path is specified, upon user passing the instance to Experiment.generate(), the Model path will be overwritten and replaced with the created directory for the Model

Parameters

name (str) – name of the model
run_settings (RunSettings) – defines how Model should be run,
params (dict, optional) – model parameters for writing into configuration files
path (str, optional) – path to where the model should be executed at runtime
enable_key_prefixing (bool, optional) – If True, data sent to the Orchestrator using SmartRedis from this Model will be prefixed with the Model name. Default is True.

Raises

SmartSimError – if initialization fails

Returns

the created Model

Return type

Model

finished(entity)[source]¶

Query if a job has completed

An instance of Model or Ensemble can be passed as an argument.

Parameters: entity (Model | Ensemble) – object launched by this Experiment
Returns: True if job has completed, False otherwise
Return type: bool
Raises: SmartSimError – if entity has not been launched by this Experiment

generate(*args, tag=None, overwrite=False)[source]¶

Generate the file structure for an Experiment

Experiment.generate creates directories for each instance passed to organize Experiments that launch many instances

If files or directories are attached to Model objects using Model.attach_generator_files(), those files or directories will be symlinked, copied, or configured and written into the created directory for that instance.

Instances of Model, Ensemble and Orchestrator can all be passed as arguments to the generate method.

Parameters

tag (str, optional) – tag used in to_configure generator files
overwrite (bool, optional) – overwrite existing folders and contents, defaults to False

get_status(*args)[source]¶

Query the status of the specific job(s)

Instances of Model, Ensemble and Orchestrator can all be passed as arguments to Experiment.get_status()

Returns

status of the specific job(s)

Return type

list[str]

Raises

SmartSimError – if status retrieval fails
TypeError –

poll(interval=10, verbose=True)[source]¶

Monitor jobs through logging to stdout.

This method should only be used if jobs were launched with Experiment.start(block=False)

Parameters

interval (int, optional) – frequency (in seconds) of logging to stdout, defaults to 10 seconds
verbose (bool, optional) – set verbosity, defaults to True

Raises

SmartSimError –

reconnect_orchestrator(checkpoint)[source]¶

Reconnect to a running Orchestrator

This method can be used to connect to a Redis deployment that was launched by a previous Experiment. This way users can run many experiments utilizing the same Redis deployment

Parameters: checkpoint (str) – the smartsim_db.dat file created when an Orchestrator is launched

start(*args, block=True, summary=False)[source]¶

Launch instances passed as arguments

Start the Experiment by turning specified instances into jobs for the underlying launcher and launching them.

Instances of Model, Ensemble and Orchestrator can all be passed as arguments to the start method. Passing more than one Orchestrator as arguments is forbidden.

Parameters

block (bool, optional) – block execution until all non-database jobs are finished, defaults to True
summary (bool, optional) – print a launch summary prior to launch, defaults to False

stop(*args)[source]¶

Stop specific instances launched by this Experiment

Instances of Model, Ensemble and Orchestrator can all be passed as arguments to the stop method.

Raises

TypeError – if wrong type
SmartSimError – if stop request fails

summary()[source]¶

Return a summary of the Experiment

The summary will show each instance that has been launched and completed in this Experiment

Returns: pandas Dataframe of Experiment history
Return type: pd.DataFrame

Settings¶

Settings are provided to Model and Ensemble objects to provide parameters for how a job should be executed. Some are specifically meant for certain launchers like SbatchSettings is solely meant for system using Slurm as a workload manager. MpirunSettings for OpenMPI based jobs is supported by Slurm, PBSPro, and Cobalt.

Types of Settings:

`RunSettings`(exe[, exe_args, run_command, …])	Run parameters for a `Model`
`SrunSettings`(exe[, exe_args, run_args, …])	Initialize run parameters for a slurm job with `srun`
`SbatchSettings`([nodes, time, account, …])	Specify run parameters for a Slurm batch job
`AprunSettings`(exe[, exe_args, run_args, …])	Settings to run job with `aprun` command
`QsubBatchSettings`([nodes, ncpus, time, …])	Specify `qsub` batch parameters for a job
`CobaltBatchSettings`([nodes, time, queue, …])	Specify settings for a Cobalt `qsub` batch launch
`MpirunSettings`(exe[, exe_args, run_args, …])	Settings to run job with `mpirun` command (OpenMPI)
`JsrunSettings`(exe[, exe_args, run_args, …])	Settings to run job with `jsrun` command
`BsubBatchSettings`([nodes, time, project, …])	Specify `bsub` batch parameters for a job

Local¶

When running SmartSim on laptops and single node workstations, the base RunSettings object is used to parameterize jobs. RunSettings include a run_command parameter for local launches that utilize a parallel launch binary like mpirun, mpiexec, and others.

`RunSettings.add_exe_args`(args)	Add executable arguments to executable
`RunSettings.update_env`(env_vars)	Update the job environment variables

class RunSettings(exe, exe_args=None, run_command='', run_args=None, env_vars=None)[source]¶

Run parameters for a Model

The base RunSettings class should only be used with the local launcher on single node, workstations, or laptops.

If no run_command is specified, the executable will be launched locally.

run_args passed as a dict will be interpreted literally for local RunSettings and added directly to the run_command e.g. run_args = {“-np”: 2} will be “-np 2”

Example initialization

rs = RunSettings("echo", "hello", "mpirun", run_args={"-np": "2"})

Parameters

exe (str) – executable to run
exe_args (str | list[str], optional) – executable arguments, defaults to None
run_command (str, optional) – launch binary (e.g. “srun”), defaults to empty str
run_args (dict[str, str], optional) – arguments for run command (e.g. -np for mpiexec), defaults to None
env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)[source]¶

Add executable arguments to executable

Parameters: args (str | list[str]) – executable arguments
Raises: TypeError – if exe args are not strings

format_run_args()[source]¶

Return formatted run arguments

For RunSettings, the run arguments are passed literally with no formatting.

Returns: list run arguments for these settings
Return type: list[str]

property run_command¶

Return the launch binary used to launch the executable

Returns: launch binary e.g. mpiexec
Type: str

update_env(env_vars)[source]¶

Update the job environment variables

Parameters: env_vars (dict[str, str]) – environment variables to update or add

SrunSettings¶

SrunSettings can be used for running on existing allocations, running jobs in interactive allocations, and for adding srun steps to a batch.

`SrunSettings.set_cpus_per_task`(num_cpus)	Set the number of cpus to use per task
`SrunSettings.set_hostlist`(host_list)	Specify the hostlist for this job
`SrunSettings.set_nodes`(num_nodes)	Set the number of nodes
`SrunSettings.set_tasks`(num_tasks)	Set the number of tasks for this job
`SrunSettings.set_tasks_per_node`(num_tpn)	Set the number of tasks for this job
`SrunSettings.add_exe_args`(args)	Add executable arguments to executable
`SrunSettings.format_run_args`()	return a list of slurm formatted run arguments
`SrunSettings.format_env_vars`()	Build environment variable string for Slurm
`SrunSettings.update_env`(env_vars)	Update the job environment variables

class SrunSettings(exe, exe_args=None, run_args=None, env_vars=None, alloc=None)[source]¶

Initialize run parameters for a slurm job with srun

SrunSettings should only be used on Slurm based systems.

If an allocation is specified, the instance receiving these run parameters will launch on that allocation.

Parameters

exe (str) – executable to run
exe_args (list[str] | str, optional) – executable arguments, defaults to Noe
run_args (dict[str, str | None], optional) – srun arguments without dashes, defaults to None
env_vars (dict[str, str], optional) – environment variables for job, defaults to None
alloc (str, optional) – allocation ID if running on existing alloc, defaults to None

add_exe_args(args)¶

Add executable arguments to executable

Parameters: args (str | list[str]) – executable arguments
Raises: TypeError – if exe args are not strings

format_env_vars()[source]¶

Build environment variable string for Slurm

Slurm takes exports in comma separated lists the list starts with all as to not disturb the rest of the environment for more information on this, see the slurm documentation for srun

Returns: the formatted string of environment variables
Return type: str

format_run_args()[source]¶

return a list of slurm formatted run arguments

Returns: list of slurm arguments for these settings
Return type: list[str]

property run_command¶

Return the launch binary used to launch the executable

Returns: launch binary e.g. mpiexec
Type: str

set_cpus_per_task(num_cpus)[source]¶

Set the number of cpus to use per task

This sets --cpus-per-task

Parameters: num_cpus (int) – number of cpus to use per task

set_hostlist(host_list)[source]¶

Specify the hostlist for this job

Parameters: host_list (str | list[str]) – hosts to launch on
Raises: TypeError – if not str or list of str

set_nodes(num_nodes)[source]¶

Set the number of nodes

Effectively this is setting: srun --nodes <num_nodes>

Parameters: num_nodes (int) – number of nodes to run with

set_tasks(num_tasks)[source]¶

Set the number of tasks for this job

This sets --ntasks

Parameters: num_tasks (int) – number of tasks

set_tasks_per_node(num_tpn)[source]¶

Set the number of tasks for this job

This sets --ntasks-per-node

Parameters: num_tpn (int) – number of tasks per node

update_env(env_vars)¶

Update the job environment variables

Parameters: env_vars (dict[str, str]) – environment variables to update or add

SbatchSettings¶

SbatchSettings are used for launching batches onto Slurm WLM systems.

`SbatchSettings.set_account`(acct)	Set the account for this batch job
`SbatchSettings.set_batch_command`(command)	Set the command used to launch the batch e.g.
`SbatchSettings.set_nodes`(num_nodes)	Set the number of nodes for this batch job
`SbatchSettings.set_hostlist`(host_list)	Specify the hostlist for this job
`SbatchSettings.set_partition`(partition)	Set the partition for the batch job
`SbatchSettings.set_walltime`(walltime)	Set the walltime of the job
`SbatchSettings.format_batch_args`()	Get the formatted batch arguments for a preview

class SbatchSettings(nodes=None, time='', account=None, batch_args=None)[source]¶

Specify run parameters for a Slurm batch job

Slurm sbatch arguments can be written into batch_args as a dictionary. e.g. {‘ntasks’: 1}

If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}

Initialization values provided (nodes, time, account) will overwrite the same arguments in batch_args if present

Parameters

nodes (int, optional) – number of nodes, defaults to None
time (str, optional) – walltime for job, e.g. “10:00:00” for 10 hours
account (str, optional) – account for job, defaults to None
batch_args (dict[str, str], optional) – extra batch arguments, defaults to None

add_preamble(lines)¶

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters: line (str or list[str]) – lines to add to preamble.

property batch_cmd¶

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns: batch command
Type: str

format_batch_args()[source]¶

Get the formatted batch arguments for a preview

Returns: batch arguments for Sbatch
Return type: list[str]

set_account(acct)[source]¶

Set the account for this batch job

Parameters: acct (str) – account id

set_batch_command(command)¶

Set the command used to launch the batch e.g. sbatch

Parameters: command (str) – batch command

set_hostlist(host_list)[source]¶

Specify the hostlist for this job

Parameters: host_list (str | list[str]) – hosts to launch on
Raises: TypeError – if not str or list of str

set_nodes(num_nodes)[source]¶

Set the number of nodes for this batch job

Parameters: num_nodes (int) – number of nodes

set_partition(partition)[source]¶

Set the partition for the batch job

Parameters: partition (str) – partition name

set_walltime(walltime)[source]¶

Set the walltime of the job

format = “HH:MM:SS”

Parameters: walltime (str) – wall time

AprunSettings¶

AprunSettings can be used on any system that supports the Cray ALPS layer. SmartSim supports using AprunSettings on PBSPro and Cobalt WLM systems.

AprunSettings can be used in interactive session (on allocation) and within batch launches (e.g., QsubBatchSettings)

`AprunSettings.set_cpus_per_task`(num_cpus)	Set the number of cpus to use per task
`AprunSettings.set_hostlist`(host_list)	Specify the hostlist for this job
`AprunSettings.set_tasks`(num_tasks)	Set the number of tasks for this job
`AprunSettings.set_tasks_per_node`(num_tpn)	Set the number of tasks for this job
`AprunSettings.make_mpmd`(aprun_settings)	Make job an MPMD job
`AprunSettings.add_exe_args`(args)	Add executable arguments to executable
`AprunSettings.format_run_args`()	Return a list of ALPS formatted run arguments
`AprunSettings.format_env_vars`()	Format the environment variables for aprun
`AprunSettings.update_env`(env_vars)	Update the job environment variables

class AprunSettings(exe, exe_args=None, run_args=None, env_vars=None)[source]¶

Settings to run job with aprun command

AprunSettings can be used for both the pbs and cobalt launchers.

Parameters

exe (str) – executable
exe_args (str | list[str], optional) – executable arguments, defaults to None
run_args (dict[str, str], optional) – arguments for run command, defaults to None
env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)¶

Add executable arguments to executable

Parameters: args (str | list[str]) – executable arguments
Raises: TypeError – if exe args are not strings

format_env_vars()[source]¶

Format the environment variables for aprun

Returns: list of env vars
Return type: list[str]

format_run_args()[source]¶

Return a list of ALPS formatted run arguments

Returns: list of ALPS arguments for these settings
Return type: list[str]

make_mpmd(aprun_settings)[source]¶

Make job an MPMD job

This method combines two AprunSettings into a single MPMD command joined with ‘:’

Parameters: aprun_settings (AprunSettings) – AprunSettings instance

property run_command¶

Return the launch binary used to launch the executable

Returns: launch binary e.g. mpiexec
Type: str

set_cpus_per_task(num_cpus)[source]¶

Set the number of cpus to use per task

This sets --cpus-per-pe

Parameters: num_cpus (int) – number of cpus to use per task

set_hostlist(host_list)[source]¶

Specify the hostlist for this job

Parameters: host_list (str | list[str]) – hosts to launch on
Raises: TypeError – if not str or list of str

set_tasks(num_tasks)[source]¶

Set the number of tasks for this job

This sets --pes

Parameters: num_tasks (int) – number of tasks

set_tasks_per_node(num_tpn)[source]¶

Set the number of tasks for this job

This sets --pes-per-node

Parameters: num_tpn (int) – number of tasks per node

update_env(env_vars)¶

Update the job environment variables

Parameters: env_vars (dict[str, str]) – environment variables to update or add

QsubBatchSettings¶

QsubBatchSettings are used to configure jobs that should be launched as a batch on PBSPro systems.

`QsubBatchSettings.set_account`(acct)	Set the account for this batch job
`QsubBatchSettings.set_batch_command`(command)	Set the command used to launch the batch e.g.
`QsubBatchSettings.set_nodes`(num_nodes)	Set the number of nodes for this batch job
`QsubBatchSettings.set_ncpus`(num_cpus)	Set the number of cpus obtained in each node.
`QsubBatchSettings.set_queue`(queue)	Set the queue for the batch job
`QsubBatchSettings.set_resource`(…)	Set a resource value for the Qsub batch
`QsubBatchSettings.set_walltime`(walltime)	Set the walltime of the job
`QsubBatchSettings.format_batch_args`()	Get the formatted batch arguments for a preview

class QsubBatchSettings(nodes=None, ncpus=None, time=None, queue=None, account=None, resources=None, batch_args=None, **kwargs)[source]¶

Specify qsub batch parameters for a job

nodes, and ncpus are used to create the select statement for PBS if a select statement is not included in the resources. If both are supplied the value for select statement supplied in resources will override.

Parameters

nodes (int, optional) – number of nodes for batch, defaults to None
ncpus (int, optional) – number of cpus per node, defaults to None
time (str, optional) – walltime for batch job, defaults to None
queue (str, optional) – queue to run batch in, defaults to None
account (str, optional) – account for batch launch, defaults to None
resources (dict[str, str], optional) – overrides for resource arguments, defaults to None
batch_args (dict[str, str], optional) – overrides for PBS batch arguments, defaults to None

add_preamble(lines)¶

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters: line (str or list[str]) – lines to add to preamble.

property batch_cmd¶

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns: batch command
Type: str

format_batch_args()[source]¶

Get the formatted batch arguments for a preview

Returns: batch arguments for Qsub
Return type: list[str]

set_account(acct)[source]¶

Set the account for this batch job

Parameters: acct (str) – account id

set_batch_command(command)¶

Set the command used to launch the batch e.g. sbatch

Parameters: command (str) – batch command

set_hostlist(host_list)[source]¶

Specify the hostlist for this job

Parameters: host_list (str | list[str]) – hosts to launch on
Raises: TypeError – if not str or list of str

set_ncpus(num_cpus)[source]¶

Set the number of cpus obtained in each node.

If a select argument is provided in QsubBatchSettings.resources, then this value will be overridden

Parameters: num_cpus (int) – number of cpus per node in select

set_nodes(num_nodes)[source]¶

Set the number of nodes for this batch job

If a select argument is provided in QsubBatchSettings.resources this value will be overridden

Parameters: num_nodes (int) – number of nodes

set_queue(queue)[source]¶

Set the queue for the batch job

Parameters: queue (str) – queue name

set_resource(resource_name, value)[source]¶

Set a resource value for the Qsub batch

If a select statement is provided, the nodes and ncpus arguments will be overridden. Likewise for Walltime

Parameters

resource_name (str) – name of resource, e.g. walltime
value (str) – value

set_walltime(walltime)[source]¶

Set the walltime of the job

format = “HH:MM:SS”

If a walltime argument is provided in QsubBatchSettings.resources, then this value will be overridden

Parameters: walltime (str) – wall time

CobaltBatchSettings¶

CobaltBatchSettings are used to configure jobs that should be launched as a batch on Cobalt Systems. They closely mimic that of the QsubBatchSettings for PBSPro.

`CobaltBatchSettings.set_account`(acct)	Set the account for this batch job
`CobaltBatchSettings.set_batch_command`(command)	Set the command used to launch the batch e.g.
`CobaltBatchSettings.set_nodes`(num_nodes)	Set the number of nodes for this batch job
`CobaltBatchSettings.set_queue`(queue)	Set the queue for the batch job
`CobaltBatchSettings.set_walltime`(walltime)	Set the walltime of the job
`CobaltBatchSettings.format_batch_args`()	Get the formatted batch arguments for a preview

class CobaltBatchSettings(nodes=None, time='', queue=None, account=None, batch_args=None)[source]¶

Specify settings for a Cobalt qsub batch launch

If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}

Initialization values provided (nodes, time, account) will overwrite the same arguments in batch_args if present

Parameters

nodes (int, optional) – number of nodes, defaults to None
time (str, optional) – walltime for job, e.g. “10:00:00” for 10 hours, defaults to empty str
queue (str, optional) – queue to launch job in, defaults to None
account (str, optional) – account for job, defaults to None
batch_args (dict[str, str], optional) – extra batch arguments, defaults to None

add_preamble(lines)¶

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters: line (str or list[str]) – lines to add to preamble.

property batch_cmd¶

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns: batch command
Type: str

format_batch_args()[source]¶

Get the formatted batch arguments for a preview

Returns: list of batch arguments for Sbatch
Return type: list[str]

set_account(acct)[source]¶

Set the account for this batch job

Parameters: acct (str) – account id

set_batch_command(command)¶

Set the command used to launch the batch e.g. sbatch

Parameters: command (str) – batch command

set_hostlist(host_list)[source]¶

Specify the hostlist for this job

Parameters: host_list (str | list[str]) – hosts to launch on
Raises: TypeError – if not str or list of str

set_nodes(num_nodes)[source]¶

Set the number of nodes for this batch job

Parameters: num_nodes (int) – number of nodes

set_queue(queue)[source]¶

Set the queue for the batch job

Parameters: queue (str) – queue name

set_tasks(num_tasks)[source]¶

Set total number of processes to start

Parameters: num_tasks (int) – number of processes

set_walltime(walltime)[source]¶

Set the walltime of the job

format = “HH:MM:SS”

Cobalt walltime can also be specified with number of minutes.

Parameters: walltime (str) – wall time

JsrunSettings¶

JsrunSettings can be used on any system that supports the IBM LSF launcher.

JsrunSettings can be used in interactive session (on allocation) and within batch launches (i.e. BsubBatchSettings)

`JsrunSettings.set_num_rs`(num_rs)	Set the number of resource sets to use
`JsrunSettings.set_cpus_per_rs`(num_cpus)	Set the number of cpus to use per resource set
`JsrunSettings.set_gpus_per_rs`(num_gpus)	Set the number of gpus to use per resource set
`JsrunSettings.set_rs_per_host`(num_rs)	Set the number of resource sets to use per host
`JsrunSettings.set_tasks`(num_tasks)	Set the number of tasks for this job
`JsrunSettings.set_tasks_per_rs`(num_tprs)	Set the number of tasks per resource set
`JsrunSettings.set_binding`(binding)	Set binding
`JsrunSettings.make_mpmd`([jsrun_settings])	Make step an MPMD (or SPMD) job.
`JsrunSettings.set_mpmd_preamble`(preamble_lines)	Set preamble used in ERF file.
`JsrunSettings.update_env`(env_vars)	Update the job environment variables
`JsrunSettings.set_erf_sets`(erf_sets)	Set resource sets used for ERF (SPMD or MPMD) steps.
`JsrunSettings.format_env_vars`()	Format environment variables.
`JsrunSettings.format_run_args`()	Return a list of LSF formatted run arguments

class JsrunSettings(exe, exe_args=None, run_args=None, env_vars=None)[source]¶

Settings to run job with jsrun command

JsrunSettings can be used for both the lsf launcher.

Parameters

exe (str) – executable
exe_args (str | list[str], optional) – executable arguments, defaults to None
run_args (dict[str, str], optional) – arguments for run command, defaults to None
env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)¶

Add executable arguments to executable

Parameters: args (str | list[str]) – executable arguments
Raises: TypeError – if exe args are not strings

format_env_vars()[source]¶

Format environment variables. Each variable needs to be passed with --env. If a variable is set to None, its value is propagated from the current environment.

Returns: formatted string to export variables
Return type: str

format_run_args()[source]¶

Return a list of LSF formatted run arguments

Returns: list of LSF arguments for these settings
Return type: list[str]

make_mpmd(jsrun_settings=None)[source]¶

Make step an MPMD (or SPMD) job.

This method will activate job execution through an ERF file.

Optionally, this method adds an instance of JsrunSettings to the list of settings to be launched in the same ERF file.

Parameters: aprun_settings (JsrunSettings, optional) – JsrunSettings instance, defaults to None

property run_command¶

Return the launch binary used to launch the executable

Returns: launch binary e.g. mpiexec
Type: str

set_binding(binding)[source]¶

Set binding

This sets --bind

Parameters: binding (str) – Binding, e.g. packed:21

set_cpus_per_rs(num_cpus)[source]¶

Set the number of cpus to use per resource set

This sets --cpu_per_rs

Parameters: num_cpus (int or str) – number of cpus to use per resource set or ALL_CPUS

set_erf_sets(erf_sets)[source]¶

Set resource sets used for ERF (SPMD or MPMD) steps.

erf_sets is a dictionary used to fill the ERF line representing these settings, e.g. {“host”: “1”, “cpu”: “{0:21}, {21:21}”, “gpu”: “*”} can be used to specify rank (or rank_count), hosts, cpus, gpus, and memory. The key rank is used to give specific ranks, as in {“rank”: “1, 2, 5”}, while the key rank_count is used to specify the count only, as in {“rank_count”: “3”}. If both are specified, only rank is used.

Parameters: hosts (dict[str,str]) – dictionary of resources

set_gpus_per_rs(num_gpus)[source]¶

Set the number of gpus to use per resource set

This sets --gpu_per_rs

Parameters: num_cpus – number of gpus to use per resource set or ALL_GPUS

set_individual_output(suffix=None)[source]¶

Set individual std output.

This sets --stdio_mode individual and inserts the suffix into the output name. The resulting output name will be self.name + suffix + .out.

Parameters: suffix (str, optional) – Optional suffix to add to output file names, it can contain %j, %h, %p, or %t, as specified by jsrun options.

set_mpmd_preamble(preamble_lines)[source]¶

Set preamble used in ERF file. Typical lines include oversubscribe-cpu : allow or overlapping-rs : allow. Can be used to set launch_distribution. If it is not present, it will be inferred from the settings, or set to packed by default.

Parameters: preamble_lines (list[str]) – lines to put at the beginning of the ERF file.

set_num_rs(num_rs)[source]¶

Set the number of resource sets to use

This sets --nrs.

Parameters: num_rs (int or str) – Number of resource sets or ALL_HOSTS

set_rs_per_host(num_rs)[source]¶

Set the number of resource sets to use per host

This sets --rs_per_host

Parameters: num_rs (int) – number of resource sets to use per host

set_tasks(num_tasks)[source]¶

Set the number of tasks for this job

This sets --np

Parameters: num_tasks (int) – number of tasks

set_tasks_per_rs(num_tprs)[source]¶

Set the number of tasks per resource set

This sets --tasks_per_rs

Parameters: num_tpn (int) – number of tasks per resource set

update_env(env_vars)¶

Update the job environment variables

Parameters: env_vars (dict[str, str]) – environment variables to update or add

BsubBatchSettings¶

BsubBatchSettings are used to configure jobs that should be launched as a batch on LSF systems.

`BsubBatchSettings.set_walltime`(time)	Set the walltime
`BsubBatchSettings.set_smts`(smts)	Set SMTs
`BsubBatchSettings.set_project`(project)	Set the project
`BsubBatchSettings.set_nodes`(num_nodes)	Set the number of nodes for this batch job
`BsubBatchSettings.set_expert_mode_req`(…)	Set allocation for expert mode.
`BsubBatchSettings.set_hostlist`(host_list)	Specify the hostlist for this job
`BsubBatchSettings.set_tasks`(num_tasks)	Set the number of tasks for this job
`BsubBatchSettings.format_batch_args`()	Get the formatted batch arguments for a preview

class BsubBatchSettings(nodes=None, time=None, project=None, batch_args=None, smts=None, **kwargs)[source]¶

Specify bsub batch parameters for a job

Parameters

nodes (int, optional) – number of nodes for batch, defaults to None
time (str, optional) – walltime for batch job in format hh:mm, defaults to None
project (str, optional) – project for batch launch, defaults to None
batch_args (dict[str, str], optional) – overrides for LSF batch arguments, defaults to None
smts (int, optional) – SMTs, defaults to None

add_preamble(lines)¶

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters: line (str or list[str]) – lines to add to preamble.

property batch_cmd¶

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns: batch command
Type: str

format_batch_args()[source]¶

Get the formatted batch arguments for a preview

Returns: list of batch arguments for Qsub
Return type: list[str]

set_account(acct)¶

set_batch_command(command)¶

Set the command used to launch the batch e.g. sbatch

Parameters: command (str) – batch command

set_expert_mode_req(res_req, slots)[source]¶

Set allocation for expert mode. This will activate expert mode (-csm) and disregard all other allocation options.

This sets -csm -n slots -R res_req

set_hostlist(host_list)[source]¶

Specify the hostlist for this job

Parameters: host_list (str | list[str]) – hosts to launch on
Raises: TypeError – if not str or list of str

set_nodes(num_nodes)[source]¶

Set the number of nodes for this batch job

This sets -nnodes.

Parameters: num_nodes (int) – number of nodes

set_project(project)[source]¶

Set the project

This sets -P.

Parameters: time (str) – project name

set_smts(smts)[source]¶

Set SMTs

This sets -alloc_flags. If the user sets SMT explicitly through -alloc_flags, then that takes precedence.

Parameters: smts (int) – SMT (e.g on Summit: 1, 2, or 4)

set_tasks(num_tasks)[source]¶

Set the number of tasks for this job

This sets -n

Parameters: num_tasks (int) – number of tasks

set_walltime(time)[source]¶

Set the walltime

This sets -W.

Parameters: time (str) – Time in hh:mm format, e.g. “10:00” for 10 hours

The following are RunSettings types that are supported on multiple launchers

MpirunSettings¶

MpirunSettings are for launching with OpenMPI. MpirunSettings are supported on Slurm, PBSpro, and Cobalt.

`MpirunSettings.set_cpus_per_task`(num_cpus)	Set the number of tasks for this job
`MpirunSettings.set_hostlist`(host_list)	Set the hostlist for the `mpirun` command
`MpirunSettings.set_tasks`(num_tasks)	Set the number of tasks for this job
`MpirunSettings.set_task_map`(task_mapping)	Set `mpirun` task mapping
`MpirunSettings.make_mpmd`(mpirun_settings)	Make a mpmd workload by combining two `mpirun` commands
`MpirunSettings.add_exe_args`(args)	Add executable arguments to executable
`MpirunSettings.format_run_args`()	return a list of OpenMPI formatted run arguments
`MpirunSettings.format_env_vars`()	Format the environment variables for mpirun
`MpirunSettings.update_env`(env_vars)	Update the job environment variables

class MpirunSettings(exe, exe_args=None, run_args=None, env_vars=None)[source]¶

Settings to run job with mpirun command (OpenMPI)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into mpirun arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters

exe (str) – executable
exe_args (str | list[str], optional) – executable arguments, defaults to None
run_args (dict[str, str], optional) – arguments for run command, defaults to None
env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)¶

Add executable arguments to executable

Parameters: args (str | list[str]) – executable arguments
Raises: TypeError – if exe args are not strings

format_env_vars()[source]¶

Format the environment variables for mpirun

Automatically exports PYTHONPATH, LD_LIBRARY_PATH and PATH

Returns: list of env vars
Return type: list[str]

format_run_args()[source]¶

return a list of OpenMPI formatted run arguments

Returns: list of OpenMPI arguments for these settings
Return type: list[str]

make_mpmd(mpirun_settings)[source]¶

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters: mpirun_settings (MpirunSettings) – MpirunSettings instance

property run_command¶

Return the launch binary used to launch the executable

Returns: launch binary e.g. mpiexec
Type: str

set_cpus_per_task(num_cpus)[source]¶

Set the number of tasks for this job

This sets --cpus-per-proc

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters: num_tasks (int) – number of tasks

set_hostlist(host_list)[source]¶

Set the hostlist for the mpirun command

Parameters: host_list (str | list[str]) – list of host names
Raises: TypeError – if not str or list of str

set_task_map(task_mapping)[source]¶

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters: task_mapping (str) – task mapping

set_tasks(num_tasks)[source]¶

Set the number of tasks for this job

This sets --n

Parameters: num_tasks (int) – number of tasks

update_env(env_vars)¶

Update the job environment variables

Parameters: env_vars (dict[str, str]) – environment variables to update or add

Orchestrator¶

The Orchestrator API is implemented for each launcher that SmartSim supports.

Slurm

Cobalt

PBSPro

LSF

The base Orchestrator class can be used for launching Redis locally on single node workstations or laptops.

Local Orchestrator¶

The Orchestrator base class can be launched through the local launcher and does not support cluster instances

class Orchestrator(port=6379, interface='lo', **kwargs)[source]¶

The Orchestrator is an in-memory database that can be launched alongside entities in SmartSim. Data can be transferred between entities by using one of the Python, C, C++ or Fortran clients within an entity.

Initialize an Orchestrator reference for local launch

Parameters

port (int, optional) – TCP/IP port, defaults to 6379
interface (str, optional) – network interface, defaults to “lo”

Extra configurations for RedisAI

See https://oss.redislabs.com/redisai/configuration/

Parameters

threads_per_queue (int, optional) – threads per GPU device
inter_op_threads (int, optional) – threads accross CPU operations
intra_op_threads (int, optional) – threads per CPU operation

property batch¶

check_cluster_status(trials=10)[source]¶: Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified

get_address()[source]¶

Return database addresses

Returns: addresses
Return type: list[str]
Raises: SmartSimError – If database address cannot be found or is not active

property hosts¶

Return the hostnames of orchestrator instance hosts

Note that this will only be populated after the orchestrator has been launched by SmartSim.

Returns: hostnames
Return type: list[str]

is_active()[source]¶

Check if the database is active

Returns: True if database is active, False otherwise
Return type: bool

property num_shards¶

Return the number of DB shards contained in the orchestrator. This might differ from the number of DBNode objects, as each DBNode may start more than one shard (e.g. with MPMD).

Returns: num_shards
Return type: int

remove_stale_files()[source]¶: Can be used to remove database files of a previous launch

set_path(new_path)¶

property type¶: Return the name of the class

PBSPro Orchestrator¶

The PBSPro Orchestrator can be launched as a batch, and in an interactive allocation.

class PBSOrchestrator(port=6379, db_nodes=1, batch=True, hosts=None, run_command='aprun', interface='ipogif0', account=None, time=None, queue=None, **kwargs)[source]¶

Bases: smartsim.database.orchestrator.Orchestrator

Initialize an Orchestrator reference for PBSPro based systems

The PBSOrchestrator launches as a batch by default. If batch=False, at launch, the PBSOrchestrator will look for an interactive allocation to launch on.

The PBS orchestrator does not support multiple databases per node.

If mpirun is specifed as the run_command, then the hosts argument is required.

Parameters

port (int) – TCP/IP port
db_nodes (int, optional) – number of compute nodes to span accross, defaults to 1
batch (bool, optional) – run as a batch workload, defaults to True
hosts (list[str]) – specify hosts to launch on, defaults to None
run_command (str, optional) – specify launch binary. Options are mpirun and aprun, defaults to “aprun”
interface (str, optional) – network interface to use, defaults to “ipogif0”
account (str, optional) – account to run batch on
time (str, optional) – walltime for batch ‘HH:MM:SS’ format
queue (str, optional) – queue to launch batch in

property batch¶

check_cluster_status(trials=10)¶: Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified

get_address()¶

Return database addresses

Returns: addresses
Return type: list[str]
Raises: SmartSimError – If database address cannot be found or is not active

property hosts¶

Return the hostnames of orchestrator instance hosts

Note that this will only be populated after the orchestrator has been launched by SmartSim.

Returns: hostnames
Return type: list[str]

is_active()¶

Check if the database is active

Returns: True if database is active, False otherwise
Return type: bool

property num_shards¶

Return the number of DB shards contained in the orchestrator. This might differ from the number of DBNode objects, as each DBNode may start more than one shard (e.g. with MPMD).

Returns: num_shards
Return type: int

remove_stale_files()¶: Can be used to remove database files of a previous launch

set_batch_arg(arg, value)[source]¶

Set a qsub argument the PBSOrchestrator should launch with

Some commonly used arguments such as -e are used by SmartSim and will not be allowed to be set.

Parameters

arg (str) – batch argument to set e.g. “A” for account
value (str | None) – batch param - set to None if no param value

Raises

SmartSimError – if orchestrator not launching as batch

set_cpus(num_cpus)[source]¶

Set the number of CPUs available to each database shard

This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.

Parameters: num_cpus (int) – number of cpus to set

set_hosts(host_list)[source]¶

Specify the hosts for the PBSOrchestrator to launch on

Parameters: host_list (str | list[str]) – list of hosts (compute node names)
Raises: TypeError – if host_list is wrong type

set_path(new_path)¶

set_run_arg(arg, value)[source]¶

Set a run argument the orchestrator should launch each node with (it will be passed to aprun)

Some commonly used arguments are used by SmartSim and will not be allowed to be set.

Parameters

arg (str) – run argument to set
value (str | None) – run parameter - set to None if no parameter value

set_walltime(walltime)[source]¶

Set the batch walltime of the orchestrator

Note: This will only effect orchestrators launched as a batch

Parameters: walltime (str) – amount of time e.g. 10 hours is 10:00:00
Raises: SmartSimError – if orchestrator isn’t launching as batch

property type¶: Return the name of the class

Slurm Orchestrator¶

The SlurmOrchestrator is used to launch Redis on to Slurm WLM systems and can be launched as a batch, on existing allocations, or in an interactive allocation.

class SlurmOrchestrator(port=6379, db_nodes=1, batch=True, hosts=None, run_command='srun', account=None, time=None, alloc=None, db_per_host=1, interface='ipogif0', **kwargs)[source]¶

Bases: smartsim.database.orchestrator.Orchestrator

Initialize an Orchestrator reference for Slurm based systems

The orchestrator launches as a batch by default. The Slurm orchestrator can also be given an allocation to run on. If no allocation is provided, and batch=False, at launch, the orchestrator will look for an interactive allocation to launch on.

The SlurmOrchestrator port provided will be incremented if multiple databases per node are launched.

SlurmOrchestrator supports launching with both srun and mpirun as launch binaries. If mpirun is used, the hosts parameter should be populated with length equal to that of the db_nodes argument.

Parameters

port (int) – TCP/IP port
db_nodes (int, optional) – number of database shards, defaults to 1
batch (bool, optional) – Run as a batch workload, defaults to True
hosts (list[str]) – specify hosts to launch on
run_command (str, optional) – specify launch binary. Options are “mpirun” and “srun”, defaults to “srun”
account (str, optional) – account to run batch on
time (str, optional) – walltime for batch ‘HH:MM:SS’ format
alloc (str, optional) – allocation to launch on, defaults to None
db_per_host (int, optional) – number of database shards per system host (MPMD), defaults to 1

property batch¶

check_cluster_status(trials=10)¶: Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified

get_address()¶

Return database addresses

Returns: addresses
Return type: list[str]
Raises: SmartSimError – If database address cannot be found or is not active

property hosts¶

Return the hostnames of orchestrator instance hosts

Note that this will only be populated after the orchestrator has been launched by SmartSim.

Returns: hostnames
Return type: list[str]

is_active()¶

Check if the database is active

Returns: True if database is active, False otherwise
Return type: bool

property num_shards¶

Return the number of DB shards contained in the orchestrator. This might differ from the number of DBNode objects, as each DBNode may start more than one shard (e.g. with MPMD).

Returns: num_shards
Return type: int

remove_stale_files()¶: Can be used to remove database files of a previous launch

set_batch_arg(arg, value)[source]¶

Set a Sbatch argument the orchestrator should launch with

Some commonly used arguments such as –job-name are used by SmartSim and will not be allowed to be set.

Parameters

arg (str) – batch argument to set e.g. “exclusive”
value (str | None) – batch param - set to None if no param value

Raises

SmartSimError – if orchestrator not launching as batch

set_cpus(num_cpus)[source]¶

Set the number of CPUs available to each database shard

This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.

Parameters: num_cpus (int) – number of cpus to set

set_hosts(host_list)[source]¶

Specify the hosts for the SlurmOrchestrator to launch on

Parameters: host_list (str, list[str]) – list of host (compute node names)
Raises: TypeError – if wrong type

set_path(new_path)¶

set_run_arg(arg, value)[source]¶

Set a run argument the orchestrator should launch each node with (it will be passed to jrun)

Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “n”, “N”, etc.

Parameters

arg (str) – run argument to set
value (str | None) – run parameter - set to None if no parameter value

set_walltime(walltime)[source]¶

Set the batch walltime of the orchestrator

Note: This will only effect orchestrators launched as a batch

Parameters: walltime (str) – amount of time e.g. 10 hours is 10:00:00
Raises: SmartSimError – if orchestrator isn’t launching as batch

property type¶: Return the name of the class

Cobalt Orchestrator¶

The CobaltOrchestrator can be launched as a batch, and in an interactive allocation.

class CobaltOrchestrator(port=6379, db_nodes=1, batch=True, hosts=None, run_command='aprun', interface='ipogif0', account=None, queue=None, time=None, **kwargs)[source]¶

Bases: smartsim.database.orchestrator.Orchestrator

Initialize an Orchestrator reference for Cobalt based systems

The orchestrator launches as a batch by default. If batch=False, at launch, the orchestrator will look for an interactive allocation to launch on.

The Cobalt orchestrator does not support multiple databases per node.

Parameters

port (int) – TCP/IP port, defaults to 6379
db_nodes (int, optional) – number of database shards, defaults to 1
batch (bool, optional) – Run as a batch workload, defaults to True
hosts (list[str]) – specify hosts to launch on, defaults to None. Optional if not launching with OpenMPI
run_command (str, optional) – specify launch binary. Options are mpirun and aprun, defaults to “aprun”.
interface (str, optional) – network interface to use, defaults to “ipogif0”
account (str, optional) – account to run batch on
queue (str, optional) – queue to launch batch in
time (str, optional) – walltime for batch ‘HH:MM:SS’ format

property batch¶

check_cluster_status(trials=10)¶: Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified

get_address()¶

Return database addresses

Returns: addresses
Return type: list[str]
Raises: SmartSimError – If database address cannot be found or is not active

property hosts¶

Return the hostnames of orchestrator instance hosts

Note that this will only be populated after the orchestrator has been launched by SmartSim.

Returns: hostnames
Return type: list[str]

is_active()¶

Check if the database is active

Returns: True if database is active, False otherwise
Return type: bool

property num_shards¶

Return the number of DB shards contained in the orchestrator. This might differ from the number of DBNode objects, as each DBNode may start more than one shard (e.g. with MPMD).

Returns: num_shards
Return type: int

remove_stale_files()¶: Can be used to remove database files of a previous launch

set_batch_arg(arg, value)[source]¶

Set a cobalt qsub argument

Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “cwd”, “jobname”, etc.

Parameters

arg (str) – batch argument to set e.g. “exclusive”
value (str | None) – batch param - set to None if no param value

Raises

SmartSimError – if orchestrator not launching as batch

set_cpus(num_cpus)[source]¶

Set the number of CPUs available to each database shard

This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.

Parameters: num_cpus (int) – number of cpus to set

set_hosts(host_list)[source]¶

Specify the hosts for the CobaltOrchestrator to launch on

Parameters: host_list (str | list[str]) – list of hosts (compute node names)
Raises: TypeError – if wrong type

set_path(new_path)¶

set_run_arg(arg, value)[source]¶

Set a run argument the orchestrator should launch each node with (it will be passed to aprun)

Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “wdir”, “n”, etc.

Parameters

arg (str) – run argument to set
value (str | None) – run parameter - set to None if no parameter value

set_walltime(walltime)[source]¶

Set the batch walltime of the orchestrator

Note: This will only effect orchestrators launched as a batch

Parameters: walltime (str) – amount of time e.g. 10 hours is 10:00:00
Raises: SmartSimError – if orchestrator isn’t launching as batch

property type¶: Return the name of the class

LSF Orchestrator¶

The LSFOrchestrator can be launched as a batch, and in an interactive allocation.

class LSFOrchestrator(port=6379, db_nodes=1, cpus_per_shard=4, gpus_per_shard=0, batch=True, hosts=None, project=None, time=None, db_per_host=1, interface='ib0', **kwargs)[source]¶

Bases: smartsim.database.orchestrator.Orchestrator

Initialize an Orchestrator reference for LSF based systems

The orchestrator launches as a batch by default. If batch=False, at launch, the orchestrator will look for an interactive allocation to launch on.

The LSFOrchestrator port provided will be incremented if multiple databases per host are launched (db_per_host>1).

Each database shard is assigned a resource set with cpus and gpus allocated contiguously on the host: it is the user’s responsibility to check if enough resources are available on each host.

A list of hosts to launch the database on can be specified these addresses must correspond to those of the first db_nodes//db_per_host compute nodes in the allocation: for example, for 8 db_nodes and 2 db_per_host the host_list must contain the addresses of hosts 1, 2, 3, and 4.

LSFOrchestrator is launched with only one jsrun command as launch binary, and an Explicit Resource File (ERF) which is automatically generated. The orchestrator is always launched on the first db_nodes//db_per_host compute nodes in the allocation.

Parameters

port (int) – TCP/IP port
db_nodes (int, optional) – number of database shards, defaults to 1
cpus_per_shard (int, optional) – cpus to allocate per shard, defaults to 4
gpus_per_shard (int, optional) – gpus to allocate per shard, defaults to 0
batch (bool, optional) – Run as a batch workload, defaults to True
hosts (list[str], optional) – specify hosts to launch on
project (str, optional) – project to run batch on
time (str, optional) – walltime for batch ‘HH:MM’ format
db_per_host (int, optional) – number of database shards per system host (MPMD), defaults to 1
interface (str) – network interface to use

property batch¶

check_cluster_status(trials=10)¶: Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified

get_address()¶

Return database addresses

Returns: addresses
Return type: list[str]
Raises: SmartSimError – If database address cannot be found or is not active

property hosts¶

Return the hostnames of orchestrator instance hosts

Note that this will only be populated after the orchestrator has been launched by SmartSim.

Returns: hostnames
Return type: list[str]

is_active()¶

Check if the database is active

Returns: True if database is active, False otherwise
Return type: bool

property num_shards¶

Return the number of DB shards contained in the orchestrator. This might differ from the number of DBNode objects, as each DBNode may start more than one shard (e.g. with MPMD).

Returns: num_shards
Return type: int

remove_stale_files()¶: Can be used to remove database files of a previous launch

set_batch_arg(arg, value)[source]¶

Set a cobalt qsub argument

Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “m”, “n”, etc.

Parameters

arg (str) – batch argument to set e.g. “exclusive”
value (str | None) – batch param - set to None if no param value

Raises

SmartSimError – if orchestrator not launching as batch

set_hosts(host_list)[source]¶

Specify the hosts for the LSFOrchestrator to launch on

Parameters: host_list (str | list[str]) – list of host (compute node names)
Raises: TypeError – if wrong type

set_path(new_path)¶

set_run_arg(arg, value)[source]¶

Set a run argument the orchestrator should launch each node with (it will be passed to aprun)

Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “chdir”, “np”

Parameters

arg (str) – run argument to set
value (str | None) – run parameter - set to None if no parameter value

set_walltime(walltime)[source]¶

Set the batch walltime of the orchestrator

Note: This will only effect orchestrators launched as a batch

Parameters: walltime (str) – amount of time e.g. 10 hours is 10:00
Raises: SmartSimError – if orchestrator isn’t launching as batch

property type¶: Return the name of the class

Entity¶

Ensemble¶

`Ensemble.__init__`(name, params[, …])	Initialize an Ensemble of Model instances.
`Ensemble.add_model`(model)	Add a model to this ensemble
`Ensemble.attach_generator_files`([to_copy, …])	Attach files to each model within the ensemble for generation
`Ensemble.register_incoming_entity`(…)	Register future communication between entities.
`Ensemble.enable_key_prefixing`()	If called, all models within this ensemble will prefix their keys with its own model name.
`Ensemble.query_key_prefixing`()	Inquire as to whether each model within the ensemble will prefix its keys

class Ensemble(name, params, batch_settings=None, run_settings=None, perm_strat='all_perm', **kwargs)[source]¶

Bases: smartsim.entity.entityList.EntityList

Ensemble is a group of Model instances that can be treated as a reference to a single instance.

Initialize an Ensemble of Model instances.

The kwargs argument can be used to pass custom input parameters to the permutation strategy.

Parameters

name (str) – name of the ensemble
params (dict[str, Any]) – parameters to expand into Model members
batch_settings (BatchSettings, optional) – describes settings for Ensemble as batch workload
run_settings (RunSettings, optional) – describes how each Model should be executed
replicas (int, optional) – number of Model replicas to create - a keyword argument of kwargs
perm_strategy (str) – strategy for expanding params into Model instances from params argument options are “all_perm”, “stepped”, “random” or a callable function. Defaults to “all_perm”.

Returns

Ensemble instance

Return type

Ensemble

add_model(model)[source]¶

Add a model to this ensemble

Parameters

model (Model) – model instance to be added

Raises

TypeError – if model is not an instance of Model
EntityExistsError – if model already exists in this ensemble

attach_generator_files(to_copy=None, to_symlink=None, to_configure=None)[source]¶

Attach files to each model within the ensemble for generation

Attach files needed for the entity that, upon generation, will be located in the path of the entity.

During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.

Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;

Parameters

to_copy (list, optional) – files to copy, defaults to []
to_symlink (list, optional) – files to symlink, defaults to []
to_configure (list, optional) – input files with tagged parameters, defaults to []

enable_key_prefixing()[source]¶: If called, all models within this ensemble will prefix their keys with its own model name.

query_key_prefixing()[source]¶

Inquire as to whether each model within the ensemble will prefix its keys

Returns: True if all models have key prefixing enabled, False otherwise
Return type: bool

register_incoming_entity(incoming_entity)[source]¶

Register future communication between entities.

Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity

Only python clients can have multiple incoming connections

Parameters: incoming_entity (SmartSimEntity) – The entity that data will be received from

property type¶: Return the name of the class

Model¶

`Model.__init__`(name, params, path, run_settings)	Initialize a model entity within Smartsim
`Model.attach_generator_files`([to_copy, …])	Attach files to an entity for generation
`Model.register_incoming_entity`(incoming_entity)	Register future communication between entities.
`Model.enable_key_prefixing`()	If called, the entity will prefix its keys with its own model name
`Model.disable_key_prefixing`()	If called, the entity will not prefix its keys with its own model name
`Model.query_key_prefixing`()	Inquire as to whether this entity will prefix its keys with its name

class Model(name, params, path, run_settings)[source]¶

Bases: smartsim.entity.entity.SmartSimEntity

Initialize a model entity within Smartsim

Parameters

name (str) – name of the model
params (dict) – model parameters for writing into configuration files.
path (str) – path to output, error, and configuration files
run_settings (RunSettings) – launcher settings specified in the experiment

attach_generator_files(to_copy=None, to_symlink=None, to_configure=None)[source]¶

Attach files to an entity for generation

Attach files needed for the entity that, upon generation, will be located in the path of the entity.

During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.

Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;

Parameters

to_copy (list, optional) – files to copy, defaults to []
to_symlink (list, optional) – files to symlink, defaults to []
to_configure (list, optional) – input files with tagged parameters, defaults to []

disable_key_prefixing()[source]¶: If called, the entity will not prefix its keys with its own model name

enable_key_prefixing()[source]¶: If called, the entity will prefix its keys with its own model name

query_key_prefixing()[source]¶: Inquire as to whether this entity will prefix its keys with its name

register_incoming_entity(incoming_entity)[source]¶

Register future communication between entities.

Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity

Parameters: incoming_entity (SmartSimEntity) – The entity that data will be received from
Raises: SmartSimError – if incoming entity has already been registered

property type¶: Return the name of the class

TensorFlow¶

SmartSim includes built-in utilities for supporting TensorFlow and Keras in SmartSim.

freeze_model(model, output_dir, file_name)

Freeze a Keras or TensorFlow Graph

freeze_model(model, output_dir, file_name)[source]¶

Freeze a Keras or TensorFlow Graph

to use a Keras or TensorFlow model in SmartSim, the model must be frozen and the inputs and outputs provided to the smartredis.client.set_model_from_file() method.

This utiliy function provides everything users need to take a trained model and put it inside an orchestrator instance

Parameters

model (tf.Module) – TensorFlow or Keras model
output_dir (str) – output dir to save model file to
file_name (str) – name of model file to create

Returns

path to model file, model input layer names, model output layer names

Return type

str, list[str], list[str]

Slurm¶

Note

This module is importable through smartsim e.g., from smartsim import slurm

`slurm.get_allocation`([nodes, time, account, …])	Request an allocation
`slurm.release_allocation`(alloc_id)	Free an allocation’s resources

get_allocation(nodes=1, time=None, account=None, options=None)[source]¶

Request an allocation

This function requests an allocation with the specified arguments. Anything passed to the options will be processed as a Slurm argument and appended to the salloc command with the appropriate prefix (e.g. “-” or “–”).

The options can be used to pass extra settings to the workload manager such as the following for Slurm:

nodelist=”nid00004”

For arguments without a value, pass None or and empty string as the value. For Slurm:

exclusive=None

Parameters

nodes (int, optional) – number of nodes for the allocation, defaults to 1
time (str, optional) – wall time of the allocation, HH:MM:SS format, defaults to None
account (str, optional) – account id for allocation, defaults to None
options (dict[str, str], optional) – additional options for the slurm wlm, defaults to None

Raises

LauncherError – if the allocation is not successful

Returns

the id of the allocation

Return type

str

get_default_partition()[source]¶

Returns the default partition from Slurm

This default partition is assumed to be the partition with a star following its partition name in sinfo output

Returns: the name of the default partition
Return type: str

release_allocation(alloc_id)[source]¶

Free an allocation’s resources

Parameters: alloc_id (str) – allocation id
Raises: LauncherError – if allocation could not be freed

validate(nodes=1, ppn=1, partition=None)[source]¶

Check that there are sufficient resources in the provided Slurm partitions.

if no partition is provided, the default partition is found and used.

Parameters

nodes (int, optional) – Override the default node count to validate, defaults to 1
ppn (int, optional) – Override the default processes per node to validate, defaults to 1
partition (str, optional) – partition to validate, defaults to None

Raises

LauncherError

Returns

True if resources are available, False otherwise

Return type

bool