SmartSim API¶
Experiment¶
|
Example initialization |
|
Launch instances passed as arguments |
|
Stop specific instances launched by this |
|
Create an |
|
Create a |
|
Generate the file structure for an |
|
Monitor jobs through logging to stdout. |
|
Query if a job has completed |
|
Query the status of the specific job(s) |
|
Reconnect to a running |
Return a summary of the |
-
class
Experiment
(name, exp_path=None, launcher='local')[source]¶ Bases:
object
Experiments are the main user interface in SmartSim.
Experiments can create instances to launch called
Model
andEnsemble
. Through theExperiment
interface, users can programmatically create, configure, start, stop, poll and query the instances they create.Example initialization
exp = Experiment(name="my_exp", launcher="local")
- Parameters
name (str) – name for the
Experiment
exp_path (str, optional) – path to location of
Experiment
directory if generatedlauncher (str, optional) – type of launcher being used, options are “slurm”, “pbs”, “cobalt”, “lsf”, or “local”. Defaults to “local”
-
create_ensemble
(name, params=None, batch_settings=None, run_settings=None, replicas=None, perm_strategy='all_perm', **kwargs)[source]¶ Create an
Ensemble
ofModel
instancesEnsembles can be launched sequentially or as a batch if using a non-local launcher. e.g. slurm
Ensembles require one of the following combinations of arguments
run_settings
andparams
run_settings
andreplicas
batch_settings
batch_settings
,run_settings
, andparams
batch_settings
,run_settings
, andreplicas
If given solely batch settings, an empty ensemble will be created that models can be added to manually through
Ensemble.add_model()
. The entire ensemble will launch as one batch.Provided batch and run settings, either
params
orreplicas
must be passed and the entire ensemble will launch as a single batch.Provided solely run settings, either
params
orreplicas
must be passed and the ensemble members will each launch sequentially.The kwargs argument can be used to pass custom input parameters to the permutation strategy.
- Parameters
name (str) – name of the ensemble
params (dict[str, Any]) – parameters to expand into
Model
membersbatch_settings (BatchSettings) – describes settings for
Ensemble
as batch workloadrun_settings (RunSettings) – describes how each
Model
should be executedreplicas (int) – number of replicas to create
perm_strategy (str, optional) – strategy for expanding
params
intoModel
instances from params argument options are “all_perm”, “stepped”, “random” or a callable function. Default is “all_perm”.
- Raises
SmartSimError – if initialization fails
- Returns
Ensemble
instance- Return type
-
create_model
(name, run_settings, params=None, path=None, enable_key_prefixing=False)[source]¶ Create a
Model
By default, allModel
instances start with the cwd as their path unless specified. Regardless of if path is specified, upon user passing the instance toExperiment.generate()
, theModel
path will be overwritten and replaced with the created directory for theModel
- Parameters
name (str) – name of the model
run_settings (RunSettings) – defines how
Model
should be run,params (dict, optional) – model parameters for writing into configuration files
path (str, optional) – path to where the model should be executed at runtime
enable_key_prefixing (bool, optional) – If True, data sent to the Orchestrator using SmartRedis from this
Model
will be prefixed with theModel
name. Default is True.
- Raises
SmartSimError – if initialization fails
- Returns
the created
Model
- Return type
-
finished
(entity)[source]¶ Query if a job has completed
An instance of
Model
orEnsemble
can be passed as an argument.- Parameters
entity (Model | Ensemble) – object launched by this
Experiment
- Returns
True if job has completed, False otherwise
- Return type
bool
- Raises
SmartSimError – if entity has not been launched by this
Experiment
-
generate
(*args, tag=None, overwrite=False)[source]¶ Generate the file structure for an
Experiment
Experiment.generate
creates directories for each instance passed to organize Experiments that launch many instancesIf files or directories are attached to
Model
objects usingModel.attach_generator_files()
, those files or directories will be symlinked, copied, or configured and written into the created directory for that instance.Instances of
Model
,Ensemble
andOrchestrator
can all be passed as arguments to the generate method.- Parameters
tag (str, optional) – tag used in to_configure generator files
overwrite (bool, optional) – overwrite existing folders and contents, defaults to False
-
get_status
(*args)[source]¶ Query the status of the specific job(s)
Instances of
Model
,Ensemble
andOrchestrator
can all be passed as arguments toExperiment.get_status()
- Returns
status of the specific job(s)
- Return type
list[str]
- Raises
SmartSimError – if status retrieval fails
TypeError –
-
poll
(interval=10, verbose=True)[source]¶ Monitor jobs through logging to stdout.
This method should only be used if jobs were launched with
Experiment.start(block=False)
- Parameters
interval (int, optional) – frequency (in seconds) of logging to stdout, defaults to 10 seconds
verbose (bool, optional) – set verbosity, defaults to True
- Raises
SmartSimError –
-
reconnect_orchestrator
(checkpoint)[source]¶ Reconnect to a running
Orchestrator
This method can be used to connect to a Redis deployment that was launched by a previous
Experiment
. This way users can run many experiments utilizing the same Redis deployment- Parameters
checkpoint (str) – the smartsim_db.dat file created when an
Orchestrator
is launched
-
start
(*args, block=True, summary=False)[source]¶ Launch instances passed as arguments
Start the
Experiment
by turning specified instances into jobs for the underlying launcher and launching them.Instances of
Model
,Ensemble
andOrchestrator
can all be passed as arguments to the start method. Passing more than oneOrchestrator
as arguments is forbidden.- Parameters
block (bool, optional) – block execution until all non-database jobs are finished, defaults to True
summary (bool, optional) – print a launch summary prior to launch, defaults to False
Settings¶
Settings are provided to Model
and Ensemble
objects
to provide parameters for how a job should be executed. Some
are specifically meant for certain launchers like SbatchSettings
is solely meant for system using Slurm as a workload manager.
MpirunSettings
for OpenMPI based jobs is supported by Slurm,
PBSPro, and Cobalt.
Types of Settings:
|
Run parameters for a |
|
Initialize run parameters for a slurm job with |
|
Specify run parameters for a Slurm batch job |
|
Settings to run job with |
|
Specify |
|
Specify settings for a Cobalt |
|
Settings to run job with |
|
Settings to run job with |
|
Specify |
Local¶
When running SmartSim on laptops and single node workstations,
the base RunSettings
object is used to parameterize jobs.
RunSettings
include a run_command
parameter for local
launches that utilize a parallel launch binary like
mpirun
, mpiexec
, and others.
|
Add executable arguments to executable |
|
Update the job environment variables |
-
class
RunSettings
(exe, exe_args=None, run_command='', run_args=None, env_vars=None)[source]¶ Run parameters for a
Model
The base
RunSettings
class should only be used with the local launcher on single node, workstations, or laptops.If no
run_command
is specified, the executable will be launched locally.run_args
passed as a dict will be interpreted literally for localRunSettings
and added directly to therun_command
e.g. run_args = {“-np”: 2} will be “-np 2”Example initialization
rs = RunSettings("echo", "hello", "mpirun", run_args={"-np": "2"})
- Parameters
exe (str) – executable to run
exe_args (str | list[str], optional) – executable arguments, defaults to None
run_command (str, optional) – launch binary (e.g. “srun”), defaults to empty str
run_args (dict[str, str], optional) – arguments for run command (e.g. -np for mpiexec), defaults to None
env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None
-
add_exe_args
(args)[source]¶ Add executable arguments to executable
- Parameters
args (str | list[str]) – executable arguments
- Raises
TypeError – if exe args are not strings
-
format_run_args
()[source]¶ Return formatted run arguments
For
RunSettings
, the run arguments are passed literally with no formatting.- Returns
list run arguments for these settings
- Return type
list[str]
-
property
run_command
¶ Return the launch binary used to launch the executable
- Returns
launch binary e.g. mpiexec
- Type
str
SrunSettings¶
SrunSettings
can be used for running on existing allocations,
running jobs in interactive allocations, and for adding srun
steps to a batch.
|
Set the number of cpus to use per task |
|
Specify the hostlist for this job |
|
Set the number of nodes |
|
Set the number of tasks for this job |
|
Set the number of tasks for this job |
Add executable arguments to executable |
|
return a list of slurm formatted run arguments |
|
Build environment variable string for Slurm |
|
|
Update the job environment variables |
-
class
SrunSettings
(exe, exe_args=None, run_args=None, env_vars=None, alloc=None)[source]¶ Initialize run parameters for a slurm job with
srun
SrunSettings
should only be used on Slurm based systems.If an allocation is specified, the instance receiving these run parameters will launch on that allocation.
- Parameters
exe (str) – executable to run
exe_args (list[str] | str, optional) – executable arguments, defaults to Noe
run_args (dict[str, str | None], optional) – srun arguments without dashes, defaults to None
env_vars (dict[str, str], optional) – environment variables for job, defaults to None
alloc (str, optional) – allocation ID if running on existing alloc, defaults to None
-
add_exe_args
(args)¶ Add executable arguments to executable
- Parameters
args (str | list[str]) – executable arguments
- Raises
TypeError – if exe args are not strings
-
format_env_vars
()[source]¶ Build environment variable string for Slurm
Slurm takes exports in comma separated lists the list starts with all as to not disturb the rest of the environment for more information on this, see the slurm documentation for srun
- Returns
the formatted string of environment variables
- Return type
str
-
format_run_args
()[source]¶ return a list of slurm formatted run arguments
- Returns
list of slurm arguments for these settings
- Return type
list[str]
-
property
run_command
¶ Return the launch binary used to launch the executable
- Returns
launch binary e.g. mpiexec
- Type
str
-
set_cpus_per_task
(num_cpus)[source]¶ Set the number of cpus to use per task
This sets
--cpus-per-task
- Parameters
num_cpus (int) – number of cpus to use per task
-
set_hostlist
(host_list)[source]¶ Specify the hostlist for this job
- Parameters
host_list (str | list[str]) – hosts to launch on
- Raises
TypeError – if not str or list of str
-
set_nodes
(num_nodes)[source]¶ Set the number of nodes
Effectively this is setting:
srun --nodes <num_nodes>
- Parameters
num_nodes (int) – number of nodes to run with
-
set_tasks
(num_tasks)[source]¶ Set the number of tasks for this job
This sets
--ntasks
- Parameters
num_tasks (int) – number of tasks
-
set_tasks_per_node
(num_tpn)[source]¶ Set the number of tasks for this job
This sets
--ntasks-per-node
- Parameters
num_tpn (int) – number of tasks per node
-
update_env
(env_vars)¶ Update the job environment variables
- Parameters
env_vars (dict[str, str]) – environment variables to update or add
SbatchSettings¶
SbatchSettings
are used for launching batches onto Slurm
WLM systems.
Set the account for this batch job |
|
|
Set the command used to launch the batch e.g. |
|
Set the number of nodes for this batch job |
|
Specify the hostlist for this job |
|
Set the partition for the batch job |
|
Set the walltime of the job |
Get the formatted batch arguments for a preview |
-
class
SbatchSettings
(nodes=None, time='', account=None, batch_args=None)[source]¶ Specify run parameters for a Slurm batch job
Slurm sbatch arguments can be written into
batch_args
as a dictionary. e.g. {‘ntasks’: 1}If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}
Initialization values provided (nodes, time, account) will overwrite the same arguments in
batch_args
if present- Parameters
nodes (int, optional) – number of nodes, defaults to None
time (str, optional) – walltime for job, e.g. “10:00:00” for 10 hours
account (str, optional) – account for job, defaults to None
batch_args (dict[str, str], optional) – extra batch arguments, defaults to None
-
add_preamble
(lines)¶ Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters
line (str or list[str]) – lines to add to preamble.
-
property
batch_cmd
¶ Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns
batch command
- Type
str
-
format_batch_args
()[source]¶ Get the formatted batch arguments for a preview
- Returns
batch arguments for Sbatch
- Return type
list[str]
-
set_batch_command
(command)¶ Set the command used to launch the batch e.g.
sbatch
- Parameters
command (str) – batch command
-
set_hostlist
(host_list)[source]¶ Specify the hostlist for this job
- Parameters
host_list (str | list[str]) – hosts to launch on
- Raises
TypeError – if not str or list of str
-
set_nodes
(num_nodes)[source]¶ Set the number of nodes for this batch job
- Parameters
num_nodes (int) – number of nodes
AprunSettings¶
AprunSettings
can be used on any system that supports the
Cray ALPS layer. SmartSim supports using AprunSettings
on PBSPro and Cobalt WLM systems.
AprunSettings
can be used in interactive session (on allocation)
and within batch launches (e.g., QsubBatchSettings
)
|
Set the number of cpus to use per task |
|
Specify the hostlist for this job |
|
Set the number of tasks for this job |
|
Set the number of tasks for this job |
|
Make job an MPMD job |
Add executable arguments to executable |
|
Return a list of ALPS formatted run arguments |
|
Format the environment variables for aprun |
|
|
Update the job environment variables |
-
class
AprunSettings
(exe, exe_args=None, run_args=None, env_vars=None)[source]¶ Settings to run job with
aprun
commandAprunSettings
can be used for both the pbs and cobalt launchers.- Parameters
exe (str) – executable
exe_args (str | list[str], optional) – executable arguments, defaults to None
run_args (dict[str, str], optional) – arguments for run command, defaults to None
env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None
-
add_exe_args
(args)¶ Add executable arguments to executable
- Parameters
args (str | list[str]) – executable arguments
- Raises
TypeError – if exe args are not strings
-
format_env_vars
()[source]¶ Format the environment variables for aprun
- Returns
list of env vars
- Return type
list[str]
-
format_run_args
()[source]¶ Return a list of ALPS formatted run arguments
- Returns
list of ALPS arguments for these settings
- Return type
list[str]
-
make_mpmd
(aprun_settings)[source]¶ Make job an MPMD job
This method combines two
AprunSettings
into a single MPMD command joined with ‘:’- Parameters
aprun_settings (AprunSettings) –
AprunSettings
instance
-
property
run_command
¶ Return the launch binary used to launch the executable
- Returns
launch binary e.g. mpiexec
- Type
str
-
set_cpus_per_task
(num_cpus)[source]¶ Set the number of cpus to use per task
This sets
--cpus-per-pe
- Parameters
num_cpus (int) – number of cpus to use per task
-
set_hostlist
(host_list)[source]¶ Specify the hostlist for this job
- Parameters
host_list (str | list[str]) – hosts to launch on
- Raises
TypeError – if not str or list of str
-
set_tasks
(num_tasks)[source]¶ Set the number of tasks for this job
This sets
--pes
- Parameters
num_tasks (int) – number of tasks
-
set_tasks_per_node
(num_tpn)[source]¶ Set the number of tasks for this job
This sets
--pes-per-node
- Parameters
num_tpn (int) – number of tasks per node
-
update_env
(env_vars)¶ Update the job environment variables
- Parameters
env_vars (dict[str, str]) – environment variables to update or add
QsubBatchSettings¶
QsubBatchSettings
are used to configure jobs that should
be launched as a batch on PBSPro systems.
Set the account for this batch job |
|
|
Set the command used to launch the batch e.g. |
|
Set the number of nodes for this batch job |
|
Set the number of cpus obtained in each node. |
|
Set the queue for the batch job |
Set a resource value for the Qsub batch |
|
|
Set the walltime of the job |
Get the formatted batch arguments for a preview |
-
class
QsubBatchSettings
(nodes=None, ncpus=None, time=None, queue=None, account=None, resources=None, batch_args=None, **kwargs)[source]¶ Specify
qsub
batch parameters for a jobnodes
, andncpus
are used to create the select statement for PBS if a select statement is not included in theresources
. If both are supplied the value for select statement supplied inresources
will override.- Parameters
nodes (int, optional) – number of nodes for batch, defaults to None
ncpus (int, optional) – number of cpus per node, defaults to None
time (str, optional) – walltime for batch job, defaults to None
queue (str, optional) – queue to run batch in, defaults to None
account (str, optional) – account for batch launch, defaults to None
resources (dict[str, str], optional) – overrides for resource arguments, defaults to None
batch_args (dict[str, str], optional) – overrides for PBS batch arguments, defaults to None
-
add_preamble
(lines)¶ Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters
line (str or list[str]) – lines to add to preamble.
-
property
batch_cmd
¶ Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns
batch command
- Type
str
-
format_batch_args
()[source]¶ Get the formatted batch arguments for a preview
- Returns
batch arguments for Qsub
- Return type
list[str]
-
set_batch_command
(command)¶ Set the command used to launch the batch e.g.
sbatch
- Parameters
command (str) – batch command
-
set_hostlist
(host_list)[source]¶ Specify the hostlist for this job
- Parameters
host_list (str | list[str]) – hosts to launch on
- Raises
TypeError – if not str or list of str
-
set_ncpus
(num_cpus)[source]¶ Set the number of cpus obtained in each node.
If a select argument is provided in
QsubBatchSettings.resources
, then this value will be overridden- Parameters
num_cpus (int) – number of cpus per node in select
-
set_nodes
(num_nodes)[source]¶ Set the number of nodes for this batch job
If a select argument is provided in
QsubBatchSettings.resources
this value will be overridden- Parameters
num_nodes (int) – number of nodes
CobaltBatchSettings¶
CobaltBatchSettings
are used to configure jobs that should
be launched as a batch on Cobalt Systems. They closely mimic
that of the QsubBatchSettings
for PBSPro.
Set the account for this batch job |
|
Set the command used to launch the batch e.g. |
|
|
Set the number of nodes for this batch job |
Set the queue for the batch job |
|
|
Set the walltime of the job |
Get the formatted batch arguments for a preview |
-
class
CobaltBatchSettings
(nodes=None, time='', queue=None, account=None, batch_args=None)[source]¶ Specify settings for a Cobalt
qsub
batch launchIf the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}
Initialization values provided (nodes, time, account) will overwrite the same arguments in
batch_args
if present- Parameters
nodes (int, optional) – number of nodes, defaults to None
time (str, optional) – walltime for job, e.g. “10:00:00” for 10 hours, defaults to empty str
queue (str, optional) – queue to launch job in, defaults to None
account (str, optional) – account for job, defaults to None
batch_args (dict[str, str], optional) – extra batch arguments, defaults to None
-
add_preamble
(lines)¶ Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters
line (str or list[str]) – lines to add to preamble.
-
property
batch_cmd
¶ Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns
batch command
- Type
str
-
format_batch_args
()[source]¶ Get the formatted batch arguments for a preview
- Returns
list of batch arguments for Sbatch
- Return type
list[str]
-
set_batch_command
(command)¶ Set the command used to launch the batch e.g.
sbatch
- Parameters
command (str) – batch command
-
set_hostlist
(host_list)[source]¶ Specify the hostlist for this job
- Parameters
host_list (str | list[str]) – hosts to launch on
- Raises
TypeError – if not str or list of str
-
set_nodes
(num_nodes)[source]¶ Set the number of nodes for this batch job
- Parameters
num_nodes (int) – number of nodes
JsrunSettings¶
JsrunSettings
can be used on any system that supports the
IBM LSF launcher.
JsrunSettings
can be used in interactive session (on allocation)
and within batch launches (i.e. BsubBatchSettings
)
|
Set the number of resource sets to use |
|
Set the number of cpus to use per resource set |
|
Set the number of gpus to use per resource set |
|
Set the number of resource sets to use per host |
|
Set the number of tasks for this job |
|
Set the number of tasks per resource set |
|
Set binding |
|
Make step an MPMD (or SPMD) job. |
|
Set preamble used in ERF file. |
|
Update the job environment variables |
|
Set resource sets used for ERF (SPMD or MPMD) steps. |
Format environment variables. |
|
Return a list of LSF formatted run arguments |
-
class
JsrunSettings
(exe, exe_args=None, run_args=None, env_vars=None)[source]¶ Settings to run job with
jsrun
commandJsrunSettings
can be used for both the lsf launcher.- Parameters
exe (str) – executable
exe_args (str | list[str], optional) – executable arguments, defaults to None
run_args (dict[str, str], optional) – arguments for run command, defaults to None
env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None
-
add_exe_args
(args)¶ Add executable arguments to executable
- Parameters
args (str | list[str]) – executable arguments
- Raises
TypeError – if exe args are not strings
-
format_env_vars
()[source]¶ Format environment variables. Each variable needs to be passed with
--env
. If a variable is set toNone
, its value is propagated from the current environment.- Returns
formatted string to export variables
- Return type
str
-
format_run_args
()[source]¶ Return a list of LSF formatted run arguments
- Returns
list of LSF arguments for these settings
- Return type
list[str]
-
make_mpmd
(jsrun_settings=None)[source]¶ Make step an MPMD (or SPMD) job.
This method will activate job execution through an ERF file.
Optionally, this method adds an instance of
JsrunSettings
to the list of settings to be launched in the same ERF file.- Parameters
aprun_settings (JsrunSettings, optional) –
JsrunSettings
instance, defaults to None
-
property
run_command
¶ Return the launch binary used to launch the executable
- Returns
launch binary e.g. mpiexec
- Type
str
-
set_binding
(binding)[source]¶ Set binding
This sets
--bind
- Parameters
binding (str) – Binding, e.g. packed:21
-
set_cpus_per_rs
(num_cpus)[source]¶ Set the number of cpus to use per resource set
This sets
--cpu_per_rs
- Parameters
num_cpus (int or str) – number of cpus to use per resource set or ALL_CPUS
-
set_erf_sets
(erf_sets)[source]¶ Set resource sets used for ERF (SPMD or MPMD) steps.
erf_sets
is a dictionary used to fill the ERF line representing these settings, e.g. {“host”: “1”, “cpu”: “{0:21}, {21:21}”, “gpu”: “*”} can be used to specify rank (or rank_count), hosts, cpus, gpus, and memory. The key rank is used to give specific ranks, as in {“rank”: “1, 2, 5”}, while the key rank_count is used to specify the count only, as in {“rank_count”: “3”}. If both are specified, only rank is used.- Parameters
hosts (dict[str,str]) – dictionary of resources
-
set_gpus_per_rs
(num_gpus)[source]¶ Set the number of gpus to use per resource set
This sets
--gpu_per_rs
- Parameters
num_cpus – number of gpus to use per resource set or ALL_GPUS
-
set_individual_output
(suffix=None)[source]¶ Set individual std output.
This sets
--stdio_mode individual
and inserts the suffix into the output name. The resulting output name will beself.name + suffix + .out
.- Parameters
suffix (str, optional) – Optional suffix to add to output file names, it can contain %j, %h, %p, or %t, as specified by jsrun options.
-
set_mpmd_preamble
(preamble_lines)[source]¶ Set preamble used in ERF file. Typical lines include oversubscribe-cpu : allow or overlapping-rs : allow. Can be used to set launch_distribution. If it is not present, it will be inferred from the settings, or set to packed by default.
- Parameters
preamble_lines (list[str]) – lines to put at the beginning of the ERF file.
-
set_num_rs
(num_rs)[source]¶ Set the number of resource sets to use
This sets
--nrs
.- Parameters
num_rs (int or str) – Number of resource sets or ALL_HOSTS
-
set_rs_per_host
(num_rs)[source]¶ Set the number of resource sets to use per host
This sets
--rs_per_host
- Parameters
num_rs (int) – number of resource sets to use per host
-
set_tasks
(num_tasks)[source]¶ Set the number of tasks for this job
This sets
--np
- Parameters
num_tasks (int) – number of tasks
-
set_tasks_per_rs
(num_tprs)[source]¶ Set the number of tasks per resource set
This sets
--tasks_per_rs
- Parameters
num_tpn (int) – number of tasks per resource set
-
update_env
(env_vars)¶ Update the job environment variables
- Parameters
env_vars (dict[str, str]) – environment variables to update or add
BsubBatchSettings¶
BsubBatchSettings
are used to configure jobs that should
be launched as a batch on LSF systems.
Set the walltime |
|
Set SMTs |
|
|
Set the project |
|
Set the number of nodes for this batch job |
Set allocation for expert mode. |
|
|
Specify the hostlist for this job |
|
Set the number of tasks for this job |
Get the formatted batch arguments for a preview |
-
class
BsubBatchSettings
(nodes=None, time=None, project=None, batch_args=None, smts=None, **kwargs)[source]¶ Specify
bsub
batch parameters for a job- Parameters
nodes (int, optional) – number of nodes for batch, defaults to None
time (str, optional) – walltime for batch job in format hh:mm, defaults to None
project (str, optional) – project for batch launch, defaults to None
batch_args (dict[str, str], optional) – overrides for LSF batch arguments, defaults to None
smts (int, optional) – SMTs, defaults to None
-
add_preamble
(lines)¶ Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters
line (str or list[str]) – lines to add to preamble.
-
property
batch_cmd
¶ Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns
batch command
- Type
str
-
format_batch_args
()[source]¶ Get the formatted batch arguments for a preview
- Returns
list of batch arguments for Qsub
- Return type
list[str]
-
set_account
(acct)¶
-
set_batch_command
(command)¶ Set the command used to launch the batch e.g.
sbatch
- Parameters
command (str) – batch command
-
set_expert_mode_req
(res_req, slots)[source]¶ Set allocation for expert mode. This will activate expert mode (
-csm
) and disregard all other allocation options.This sets
-csm -n slots -R res_req
-
set_hostlist
(host_list)[source]¶ Specify the hostlist for this job
- Parameters
host_list (str | list[str]) – hosts to launch on
- Raises
TypeError – if not str or list of str
-
set_nodes
(num_nodes)[source]¶ Set the number of nodes for this batch job
This sets
-nnodes
.- Parameters
num_nodes (int) – number of nodes
-
set_smts
(smts)[source]¶ Set SMTs
This sets
-alloc_flags
. If the user sets SMT explicitly through-alloc_flags
, then that takes precedence.- Parameters
smts (int) – SMT (e.g on Summit: 1, 2, or 4)
The following are RunSettings
types that are supported on multiple
launchers
MpirunSettings¶
MpirunSettings
are for launching with OpenMPI. MpirunSettings
are
supported on Slurm, PBSpro, and Cobalt.
|
Set the number of tasks for this job |
|
Set the hostlist for the |
|
Set the number of tasks for this job |
|
Set |
|
Make a mpmd workload by combining two |
Add executable arguments to executable |
|
return a list of OpenMPI formatted run arguments |
|
Format the environment variables for mpirun |
|
|
Update the job environment variables |
-
class
MpirunSettings
(exe, exe_args=None, run_args=None, env_vars=None)[source]¶ Settings to run job with
mpirun
command (OpenMPI)Note that environment variables can be passed with a None value to signify that they should be exported from the current environment
Any arguments passed in the
run_args
dict will be converted intompirun
arguments and prefixed with--
. Values of None can be provided for arguments that do not have values.- Parameters
exe (str) – executable
exe_args (str | list[str], optional) – executable arguments, defaults to None
run_args (dict[str, str], optional) – arguments for run command, defaults to None
env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None
-
add_exe_args
(args)¶ Add executable arguments to executable
- Parameters
args (str | list[str]) – executable arguments
- Raises
TypeError – if exe args are not strings
-
format_env_vars
()[source]¶ Format the environment variables for mpirun
Automatically exports
PYTHONPATH
,LD_LIBRARY_PATH
andPATH
- Returns
list of env vars
- Return type
list[str]
-
format_run_args
()[source]¶ return a list of OpenMPI formatted run arguments
- Returns
list of OpenMPI arguments for these settings
- Return type
list[str]
-
make_mpmd
(mpirun_settings)[source]¶ Make a mpmd workload by combining two
mpirun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters
mpirun_settings (MpirunSettings) – MpirunSettings instance
-
property
run_command
¶ Return the launch binary used to launch the executable
- Returns
launch binary e.g. mpiexec
- Type
str
-
set_cpus_per_task
(num_cpus)[source]¶ Set the number of tasks for this job
This sets
--cpus-per-proc
note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.
- Parameters
num_tasks (int) – number of tasks
-
set_hostlist
(host_list)[source]¶ Set the hostlist for the
mpirun
command- Parameters
host_list (str | list[str]) – list of host names
- Raises
TypeError – if not str or list of str
-
set_task_map
(task_mapping)[source]¶ Set
mpirun
task mappingthis sets
--map-by <mapping>
For examples, see the man page for
mpirun
- Parameters
task_mapping (str) – task mapping
-
set_tasks
(num_tasks)[source]¶ Set the number of tasks for this job
This sets
--n
- Parameters
num_tasks (int) – number of tasks
-
update_env
(env_vars)¶ Update the job environment variables
- Parameters
env_vars (dict[str, str]) – environment variables to update or add
Orchestrator¶
The Orchestrator
API is implemented for each launcher that
SmartSim supports.
Slurm
Cobalt
PBSPro
LSF
The base Orchestrator
class can be used for launching Redis
locally on single node workstations or laptops.
Local Orchestrator¶
The Orchestrator
base class can be launched through
the local launcher and does not support cluster instances
-
class
Orchestrator
(port=6379, interface='lo', **kwargs)[source]¶ The Orchestrator is an in-memory database that can be launched alongside entities in SmartSim. Data can be transferred between entities by using one of the Python, C, C++ or Fortran clients within an entity.
Initialize an Orchestrator reference for local launch
- Parameters
port (int, optional) – TCP/IP port, defaults to 6379
interface (str, optional) – network interface, defaults to “lo”
Extra configurations for RedisAI
See https://oss.redislabs.com/redisai/configuration/
- Parameters
threads_per_queue (int, optional) – threads per GPU device
inter_op_threads (int, optional) – threads accross CPU operations
intra_op_threads (int, optional) – threads per CPU operation
-
property
batch
¶
-
check_cluster_status
(trials=10)[source]¶ Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified
-
get_address
()[source]¶ Return database addresses
- Returns
addresses
- Return type
list[str]
- Raises
SmartSimError – If database address cannot be found or is not active
-
property
hosts
¶ Return the hostnames of orchestrator instance hosts
Note that this will only be populated after the orchestrator has been launched by SmartSim.
- Returns
hostnames
- Return type
list[str]
-
is_active
()[source]¶ Check if the database is active
- Returns
True if database is active, False otherwise
- Return type
bool
-
property
num_shards
¶ Return the number of DB shards contained in the orchestrator. This might differ from the number of
DBNode
objects, as eachDBNode
may start more than one shard (e.g. with MPMD).- Returns
num_shards
- Return type
int
-
set_path
(new_path)¶
-
property
type
¶ Return the name of the class
PBSPro Orchestrator¶
The PBSPro Orchestrator can be launched as a batch, and in an interactive allocation.
-
class
PBSOrchestrator
(port=6379, db_nodes=1, batch=True, hosts=None, run_command='aprun', interface='ipogif0', account=None, time=None, queue=None, **kwargs)[source]¶ Bases:
smartsim.database.orchestrator.Orchestrator
Initialize an Orchestrator reference for PBSPro based systems
The
PBSOrchestrator
launches as a batch by default. If batch=False, at launch, thePBSOrchestrator
will look for an interactive allocation to launch on.The PBS orchestrator does not support multiple databases per node.
If
mpirun
is specifed as therun_command
, then thehosts
argument is required.- Parameters
port (int) – TCP/IP port
db_nodes (int, optional) – number of compute nodes to span accross, defaults to 1
batch (bool, optional) – run as a batch workload, defaults to True
hosts (list[str]) – specify hosts to launch on, defaults to None
run_command (str, optional) – specify launch binary. Options are
mpirun
andaprun
, defaults to “aprun”interface (str, optional) – network interface to use, defaults to “ipogif0”
account (str, optional) – account to run batch on
time (str, optional) – walltime for batch ‘HH:MM:SS’ format
queue (str, optional) – queue to launch batch in
-
property
batch
¶
-
check_cluster_status
(trials=10)¶ Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified
-
get_address
()¶ Return database addresses
- Returns
addresses
- Return type
list[str]
- Raises
SmartSimError – If database address cannot be found or is not active
-
property
hosts
¶ Return the hostnames of orchestrator instance hosts
Note that this will only be populated after the orchestrator has been launched by SmartSim.
- Returns
hostnames
- Return type
list[str]
-
is_active
()¶ Check if the database is active
- Returns
True if database is active, False otherwise
- Return type
bool
-
property
num_shards
¶ Return the number of DB shards contained in the orchestrator. This might differ from the number of
DBNode
objects, as eachDBNode
may start more than one shard (e.g. with MPMD).- Returns
num_shards
- Return type
int
-
remove_stale_files
()¶ Can be used to remove database files of a previous launch
-
set_batch_arg
(arg, value)[source]¶ Set a
qsub
argument thePBSOrchestrator
should launch withSome commonly used arguments such as -e are used by SmartSim and will not be allowed to be set.
- Parameters
arg (str) – batch argument to set e.g. “A” for account
value (str | None) – batch param - set to None if no param value
- Raises
SmartSimError – if orchestrator not launching as batch
-
set_cpus
(num_cpus)[source]¶ Set the number of CPUs available to each database shard
This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.
- Parameters
num_cpus (int) – number of cpus to set
-
set_hosts
(host_list)[source]¶ Specify the hosts for the
PBSOrchestrator
to launch on- Parameters
host_list (str | list[str]) – list of hosts (compute node names)
- Raises
TypeError – if host_list is wrong type
-
set_path
(new_path)¶
-
set_run_arg
(arg, value)[source]¶ Set a run argument the orchestrator should launch each node with (it will be passed to aprun)
Some commonly used arguments are used by SmartSim and will not be allowed to be set.
- Parameters
arg (str) – run argument to set
value (str | None) – run parameter - set to None if no parameter value
-
set_walltime
(walltime)[source]¶ Set the batch walltime of the orchestrator
Note: This will only effect orchestrators launched as a batch
- Parameters
walltime (str) – amount of time e.g. 10 hours is 10:00:00
- Raises
SmartSimError – if orchestrator isn’t launching as batch
-
property
type
¶ Return the name of the class
Slurm Orchestrator¶
The SlurmOrchestrator
is used to launch Redis on to Slurm WLM
systems and can be launched as a batch, on existing allocations,
or in an interactive allocation.
-
class
SlurmOrchestrator
(port=6379, db_nodes=1, batch=True, hosts=None, run_command='srun', account=None, time=None, alloc=None, db_per_host=1, interface='ipogif0', **kwargs)[source]¶ Bases:
smartsim.database.orchestrator.Orchestrator
Initialize an Orchestrator reference for Slurm based systems
The orchestrator launches as a batch by default. The Slurm orchestrator can also be given an allocation to run on. If no allocation is provided, and batch=False, at launch, the orchestrator will look for an interactive allocation to launch on.
The SlurmOrchestrator port provided will be incremented if multiple databases per node are launched.
SlurmOrchestrator supports launching with both
srun
andmpirun
as launch binaries. If mpirun is used, the hosts parameter should be populated with length equal to that of thedb_nodes
argument.- Parameters
port (int) – TCP/IP port
db_nodes (int, optional) – number of database shards, defaults to 1
batch (bool, optional) – Run as a batch workload, defaults to True
hosts (list[str]) – specify hosts to launch on
run_command (str, optional) – specify launch binary. Options are “mpirun” and “srun”, defaults to “srun”
account (str, optional) – account to run batch on
time (str, optional) – walltime for batch ‘HH:MM:SS’ format
alloc (str, optional) – allocation to launch on, defaults to None
db_per_host (int, optional) – number of database shards per system host (MPMD), defaults to 1
-
property
batch
¶
-
check_cluster_status
(trials=10)¶ Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified
-
get_address
()¶ Return database addresses
- Returns
addresses
- Return type
list[str]
- Raises
SmartSimError – If database address cannot be found or is not active
-
property
hosts
¶ Return the hostnames of orchestrator instance hosts
Note that this will only be populated after the orchestrator has been launched by SmartSim.
- Returns
hostnames
- Return type
list[str]
-
is_active
()¶ Check if the database is active
- Returns
True if database is active, False otherwise
- Return type
bool
-
property
num_shards
¶ Return the number of DB shards contained in the orchestrator. This might differ from the number of
DBNode
objects, as eachDBNode
may start more than one shard (e.g. with MPMD).- Returns
num_shards
- Return type
int
-
remove_stale_files
()¶ Can be used to remove database files of a previous launch
-
set_batch_arg
(arg, value)[source]¶ Set a Sbatch argument the orchestrator should launch with
Some commonly used arguments such as –job-name are used by SmartSim and will not be allowed to be set.
- Parameters
arg (str) – batch argument to set e.g. “exclusive”
value (str | None) – batch param - set to None if no param value
- Raises
SmartSimError – if orchestrator not launching as batch
-
set_cpus
(num_cpus)[source]¶ Set the number of CPUs available to each database shard
This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.
- Parameters
num_cpus (int) – number of cpus to set
-
set_hosts
(host_list)[source]¶ Specify the hosts for the
SlurmOrchestrator
to launch on- Parameters
host_list (str, list[str]) – list of host (compute node names)
- Raises
TypeError – if wrong type
-
set_path
(new_path)¶
-
set_run_arg
(arg, value)[source]¶ Set a run argument the orchestrator should launch each node with (it will be passed to jrun)
Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “n”, “N”, etc.
- Parameters
arg (str) – run argument to set
value (str | None) – run parameter - set to None if no parameter value
-
set_walltime
(walltime)[source]¶ Set the batch walltime of the orchestrator
Note: This will only effect orchestrators launched as a batch
- Parameters
walltime (str) – amount of time e.g. 10 hours is 10:00:00
- Raises
SmartSimError – if orchestrator isn’t launching as batch
-
property
type
¶ Return the name of the class
Cobalt Orchestrator¶
The CobaltOrchestrator
can be launched as a batch, and
in an interactive allocation.
-
class
CobaltOrchestrator
(port=6379, db_nodes=1, batch=True, hosts=None, run_command='aprun', interface='ipogif0', account=None, queue=None, time=None, **kwargs)[source]¶ Bases:
smartsim.database.orchestrator.Orchestrator
Initialize an Orchestrator reference for Cobalt based systems
The orchestrator launches as a batch by default. If batch=False, at launch, the orchestrator will look for an interactive allocation to launch on.
The Cobalt orchestrator does not support multiple databases per node.
- Parameters
port (int) – TCP/IP port, defaults to 6379
db_nodes (int, optional) – number of database shards, defaults to 1
batch (bool, optional) – Run as a batch workload, defaults to True
hosts (list[str]) – specify hosts to launch on, defaults to None. Optional if not launching with OpenMPI
run_command (str, optional) – specify launch binary. Options are
mpirun
andaprun
, defaults to “aprun”.interface (str, optional) – network interface to use, defaults to “ipogif0”
account (str, optional) – account to run batch on
queue (str, optional) – queue to launch batch in
time (str, optional) – walltime for batch ‘HH:MM:SS’ format
-
property
batch
¶
-
check_cluster_status
(trials=10)¶ Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified
-
get_address
()¶ Return database addresses
- Returns
addresses
- Return type
list[str]
- Raises
SmartSimError – If database address cannot be found or is not active
-
property
hosts
¶ Return the hostnames of orchestrator instance hosts
Note that this will only be populated after the orchestrator has been launched by SmartSim.
- Returns
hostnames
- Return type
list[str]
-
is_active
()¶ Check if the database is active
- Returns
True if database is active, False otherwise
- Return type
bool
-
property
num_shards
¶ Return the number of DB shards contained in the orchestrator. This might differ from the number of
DBNode
objects, as eachDBNode
may start more than one shard (e.g. with MPMD).- Returns
num_shards
- Return type
int
-
remove_stale_files
()¶ Can be used to remove database files of a previous launch
-
set_batch_arg
(arg, value)[source]¶ Set a cobalt
qsub
argumentSome commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “cwd”, “jobname”, etc.
- Parameters
arg (str) – batch argument to set e.g. “exclusive”
value (str | None) – batch param - set to None if no param value
- Raises
SmartSimError – if orchestrator not launching as batch
-
set_cpus
(num_cpus)[source]¶ Set the number of CPUs available to each database shard
This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.
- Parameters
num_cpus (int) – number of cpus to set
-
set_hosts
(host_list)[source]¶ Specify the hosts for the
CobaltOrchestrator
to launch on- Parameters
host_list (str | list[str]) – list of hosts (compute node names)
- Raises
TypeError – if wrong type
-
set_path
(new_path)¶
-
set_run_arg
(arg, value)[source]¶ Set a run argument the orchestrator should launch each node with (it will be passed to aprun)
Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “wdir”, “n”, etc.
- Parameters
arg (str) – run argument to set
value (str | None) – run parameter - set to None if no parameter value
-
set_walltime
(walltime)[source]¶ Set the batch walltime of the orchestrator
Note: This will only effect orchestrators launched as a batch
- Parameters
walltime (str) – amount of time e.g. 10 hours is 10:00:00
- Raises
SmartSimError – if orchestrator isn’t launching as batch
-
property
type
¶ Return the name of the class
LSF Orchestrator¶
The LSFOrchestrator
can be launched as a batch, and
in an interactive allocation.
-
class
LSFOrchestrator
(port=6379, db_nodes=1, cpus_per_shard=4, gpus_per_shard=0, batch=True, hosts=None, project=None, time=None, db_per_host=1, interface='ib0', **kwargs)[source]¶ Bases:
smartsim.database.orchestrator.Orchestrator
Initialize an Orchestrator reference for LSF based systems
The orchestrator launches as a batch by default. If batch=False, at launch, the orchestrator will look for an interactive allocation to launch on.
The LSFOrchestrator port provided will be incremented if multiple databases per host are launched (
db_per_host>1
).Each database shard is assigned a resource set with cpus and gpus allocated contiguously on the host: it is the user’s responsibility to check if enough resources are available on each host.
A list of hosts to launch the database on can be specified these addresses must correspond to those of the first
db_nodes//db_per_host
compute nodes in the allocation: for example, for 8db_nodes
and 2db_per_host
thehost_list
must contain the addresses of hosts 1, 2, 3, and 4.LSFOrchestrator
is launched with only onejsrun
command as launch binary, and an Explicit Resource File (ERF) which is automatically generated. The orchestrator is always launched on the firstdb_nodes//db_per_host
compute nodes in the allocation.- Parameters
port (int) – TCP/IP port
db_nodes (int, optional) – number of database shards, defaults to 1
cpus_per_shard (int, optional) – cpus to allocate per shard, defaults to 4
gpus_per_shard (int, optional) – gpus to allocate per shard, defaults to 0
batch (bool, optional) – Run as a batch workload, defaults to True
hosts (list[str], optional) – specify hosts to launch on
project (str, optional) – project to run batch on
time (str, optional) – walltime for batch ‘HH:MM’ format
db_per_host (int, optional) – number of database shards per system host (MPMD), defaults to 1
interface (str) – network interface to use
-
property
batch
¶
-
check_cluster_status
(trials=10)¶ Check that a cluster is up and running :param trials: number of attempts to verify cluster status :type trials: int, optional :raises SmartSimError: If cluster status cannot be verified
-
get_address
()¶ Return database addresses
- Returns
addresses
- Return type
list[str]
- Raises
SmartSimError – If database address cannot be found or is not active
-
property
hosts
¶ Return the hostnames of orchestrator instance hosts
Note that this will only be populated after the orchestrator has been launched by SmartSim.
- Returns
hostnames
- Return type
list[str]
-
is_active
()¶ Check if the database is active
- Returns
True if database is active, False otherwise
- Return type
bool
-
property
num_shards
¶ Return the number of DB shards contained in the orchestrator. This might differ from the number of
DBNode
objects, as eachDBNode
may start more than one shard (e.g. with MPMD).- Returns
num_shards
- Return type
int
-
remove_stale_files
()¶ Can be used to remove database files of a previous launch
-
set_batch_arg
(arg, value)[source]¶ Set a cobalt
qsub
argumentSome commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “m”, “n”, etc.
- Parameters
arg (str) – batch argument to set e.g. “exclusive”
value (str | None) – batch param - set to None if no param value
- Raises
SmartSimError – if orchestrator not launching as batch
-
set_hosts
(host_list)[source]¶ Specify the hosts for the
LSFOrchestrator
to launch on- Parameters
host_list (str | list[str]) – list of host (compute node names)
- Raises
TypeError – if wrong type
-
set_path
(new_path)¶
-
set_run_arg
(arg, value)[source]¶ Set a run argument the orchestrator should launch each node with (it will be passed to aprun)
Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “chdir”, “np”
- Parameters
arg (str) – run argument to set
value (str | None) – run parameter - set to None if no parameter value
-
set_walltime
(walltime)[source]¶ Set the batch walltime of the orchestrator
Note: This will only effect orchestrators launched as a batch
- Parameters
walltime (str) – amount of time e.g. 10 hours is 10:00
- Raises
SmartSimError – if orchestrator isn’t launching as batch
-
property
type
¶ Return the name of the class
Entity¶
Ensemble¶
|
Initialize an Ensemble of Model instances. |
|
Add a model to this ensemble |
|
Attach files to each model within the ensemble for generation |
Register future communication between entities. |
|
If called, all models within this ensemble will prefix their keys with its own model name. |
|
Inquire as to whether each model within the ensemble will prefix its keys |
-
class
Ensemble
(name, params, batch_settings=None, run_settings=None, perm_strat='all_perm', **kwargs)[source]¶ Bases:
smartsim.entity.entityList.EntityList
Ensemble
is a group ofModel
instances that can be treated as a reference to a single instance.Initialize an Ensemble of Model instances.
The kwargs argument can be used to pass custom input parameters to the permutation strategy.
- Parameters
name (str) – name of the ensemble
params (dict[str, Any]) – parameters to expand into
Model
membersbatch_settings (BatchSettings, optional) – describes settings for
Ensemble
as batch workloadrun_settings (RunSettings, optional) – describes how each
Model
should be executedreplicas (int, optional) – number of
Model
replicas to create - a keyword argument of kwargsperm_strategy (str) – strategy for expanding
params
intoModel
instances from params argument options are “all_perm”, “stepped”, “random” or a callable function. Defaults to “all_perm”.
- Returns
Ensemble
instance- Return type
Ensemble
-
add_model
(model)[source]¶ Add a model to this ensemble
- Parameters
model (Model) – model instance to be added
- Raises
TypeError – if model is not an instance of
Model
EntityExistsError – if model already exists in this ensemble
-
attach_generator_files
(to_copy=None, to_symlink=None, to_configure=None)[source]¶ Attach files to each model within the ensemble for generation
Attach files needed for the entity that, upon generation, will be located in the path of the entity.
During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.
Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;
- Parameters
to_copy (list, optional) – files to copy, defaults to []
to_symlink (list, optional) – files to symlink, defaults to []
to_configure (list, optional) – input files with tagged parameters, defaults to []
-
enable_key_prefixing
()[source]¶ If called, all models within this ensemble will prefix their keys with its own model name.
-
query_key_prefixing
()[source]¶ Inquire as to whether each model within the ensemble will prefix its keys
- Returns
True if all models have key prefixing enabled, False otherwise
- Return type
bool
-
register_incoming_entity
(incoming_entity)[source]¶ Register future communication between entities.
Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity
Only python clients can have multiple incoming connections
- Parameters
incoming_entity (SmartSimEntity) – The entity that data will be received from
-
property
type
¶ Return the name of the class
Model¶
|
Initialize a model entity within Smartsim |
|
Attach files to an entity for generation |
|
Register future communication between entities. |
If called, the entity will prefix its keys with its own model name |
|
If called, the entity will not prefix its keys with its own model name |
|
Inquire as to whether this entity will prefix its keys with its name |
-
class
Model
(name, params, path, run_settings)[source]¶ Bases:
smartsim.entity.entity.SmartSimEntity
Initialize a model entity within Smartsim
- Parameters
name (str) – name of the model
params (dict) – model parameters for writing into configuration files.
path (str) – path to output, error, and configuration files
run_settings (RunSettings) – launcher settings specified in the experiment
-
attach_generator_files
(to_copy=None, to_symlink=None, to_configure=None)[source]¶ Attach files to an entity for generation
Attach files needed for the entity that, upon generation, will be located in the path of the entity.
During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.
Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;
- Parameters
to_copy (list, optional) – files to copy, defaults to []
to_symlink (list, optional) – files to symlink, defaults to []
to_configure (list, optional) – input files with tagged parameters, defaults to []
-
disable_key_prefixing
()[source]¶ If called, the entity will not prefix its keys with its own model name
-
register_incoming_entity
(incoming_entity)[source]¶ Register future communication between entities.
Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity
- Parameters
incoming_entity (SmartSimEntity) – The entity that data will be received from
- Raises
SmartSimError – if incoming entity has already been registered
-
property
type
¶ Return the name of the class
TensorFlow¶
SmartSim includes built-in utilities for supporting TensorFlow and Keras in SmartSim.
|
Freeze a Keras or TensorFlow Graph |
-
freeze_model
(model, output_dir, file_name)[source]¶ Freeze a Keras or TensorFlow Graph
to use a Keras or TensorFlow model in SmartSim, the model must be frozen and the inputs and outputs provided to the smartredis.client.set_model_from_file() method.
This utiliy function provides everything users need to take a trained model and put it inside an
orchestrator
instance- Parameters
model (tf.Module) – TensorFlow or Keras model
output_dir (str) – output dir to save model file to
file_name (str) – name of model file to create
- Returns
path to model file, model input layer names, model output layer names
- Return type
str, list[str], list[str]
Slurm¶
Note
This module is importable through smartsim e.g., from smartsim import slurm
|
Request an allocation |
|
Free an allocation’s resources |
-
get_allocation
(nodes=1, time=None, account=None, options=None)[source]¶ Request an allocation
This function requests an allocation with the specified arguments. Anything passed to the options will be processed as a Slurm argument and appended to the salloc command with the appropriate prefix (e.g. “-” or “–”).
The options can be used to pass extra settings to the workload manager such as the following for Slurm:
nodelist=”nid00004”
For arguments without a value, pass None or and empty string as the value. For Slurm:
exclusive=None
- Parameters
nodes (int, optional) – number of nodes for the allocation, defaults to 1
time (str, optional) – wall time of the allocation, HH:MM:SS format, defaults to None
account (str, optional) – account id for allocation, defaults to None
options (dict[str, str], optional) – additional options for the slurm wlm, defaults to None
- Raises
LauncherError – if the allocation is not successful
- Returns
the id of the allocation
- Return type
str
-
get_default_partition
()[source]¶ Returns the default partition from Slurm
This default partition is assumed to be the partition with a star following its partition name in sinfo output
- Returns
the name of the default partition
- Return type
str
-
release_allocation
(alloc_id)[source]¶ Free an allocation’s resources
- Parameters
alloc_id (str) – allocation id
- Raises
LauncherError – if allocation could not be freed
-
validate
(nodes=1, ppn=1, partition=None)[source]¶ Check that there are sufficient resources in the provided Slurm partitions.
if no partition is provided, the default partition is found and used.
- Parameters
nodes (int, optional) – Override the default node count to validate, defaults to 1
ppn (int, optional) – Override the default processes per node to validate, defaults to 1
partition (str, optional) – partition to validate, defaults to None
- Raises
LauncherError
- Returns
True if resources are available, False otherwise
- Return type
bool