SmartSim API#
Experiment#
Settings#
Settings are provided to Model
and Ensemble
objects
to provide parameters for how a job should be executed. Some
are specifically meant for certain launchers like SbatchSettings
is solely meant for system using Slurm as a workload manager.
MpirunSettings
for OpenMPI based jobs is supported by Slurm
and PBSPro.
Types of Settings:
|
Run parameters for a |
|
Initialize run parameters for a slurm job with |
|
Settings to run job with |
|
Settings to run job with |
|
Settings to run job with |
|
Settings to run job with |
|
Initialize run parameters for a Dragon process |
|
Specify run parameters for a Slurm batch job |
|
Specify |
Settings objects can accept a container object that defines a container runtime, image, and arguments to use for the workload. Below is a list of supported container runtimes.
Types of Containers:
|
Singularity (apptainer) container type. |
RunSettings#
When running SmartSim on laptops and single node workstations,
the base RunSettings
object is used to parameterize jobs.
RunSettings
include a run_command
parameter for local
launches that utilize a parallel launch binary like
mpirun
, mpiexec
, and others.
|
Add executable arguments to executable |
|
Update the job environment variables |
- class RunSettings(exe: str, exe_args: str | List[str] | None = None, run_command: str = '', run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, container: smartsim.settings.containers.Container | None = None, **_kwargs: Any) None [source]#
Run parameters for a
Model
The base
RunSettings
class should only be used with the local launcher on single node, workstations, or laptops.If no
run_command
is specified, the executable will be launched locally.run_args
passed as a dict will be interpreted literally for localRunSettings
and added directly to therun_command
e.g. run_args = {“-np”: 2} will be “-np 2”Example initialization
rs = RunSettings("echo", "hello", "mpirun", run_args={"-np": "2"})
- Parameters:
exe (
str
) – executable to runexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_command (
str
, default:''
) – launch binary (e.g. “srun”)run_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run command (e.g. -np for mpiexec)env_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job withcontainer (
Optional
[Container
], default:None
) – container type for workload (e.g. “singularity”)
- add_exe_args(args: str | List[str]) None [source]#
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] [source]#
Build environment variable string
- Return type:
List
[str
]- Returns:
formatted list of strings to export variables
- format_run_args() List[str] [source]#
Return formatted run arguments
For
RunSettings
, the run arguments are passed literally with no formatting.- Return type:
List
[str
]- Returns:
list run arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None [source]#
Make job an MPMD job
- Parameters:
settings (
RunSettings
) –RunSettings
instance- Return type:
None
-
reserved_run_args:
ClassVar
[frozenset
[str
]] = frozenset({})#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None [source]#
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None [source]#
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None [source]#
Copy executable file to allocated compute nodes
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None [source]#
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus per task
- Parameters:
cpus_per_task (
int
) – number of cpus per task- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None [source]#
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Return type:
None
- set_hostlist_from_file(file_path: str) None [source]#
Use the contents of a file to specify the hostlist for this job
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None [source]#
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None [source]#
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None [source]#
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None [source]#
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None [source]#
Set the job to run in quiet mode
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None [source]#
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None [source]#
Set the number of tasks to launch
- Parameters:
tasks (
int
) – number of tasks to launch- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks per node
- Parameters:
tasks_per_node (
int
) – number of tasks to launch per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None [source]#
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None [source]#
Set the job to run in verbose mode
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None [source]#
Set the formatted walltime
- Parameters:
walltime (
str
) – Time in format required by launcher``- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None [source]#
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
SrunSettings#
SrunSettings
can be used for running on existing allocations,
running jobs in interactive allocations, and for adding srun
steps to a batch.
|
Set the number of nodes |
|
Specify the node feature for this job |
|
Set the number of tasks for this job |
|
Set the number of tasks for this job |
|
Set the walltime of the job |
|
Specify the hostlist for this job |
|
Specify a list of hosts to exclude for launching this job |
|
Set the number of cpus to use per task |
Add executable arguments to executable |
|
Return a list of slurm formatted run arguments |
|
Build bash compatible environment variable string for Slurm |
|
|
Update the job environment variables |
- class SrunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, alloc: str | None = None, **kwargs: Any) None [source]#
Initialize run parameters for a slurm job with
srun
SrunSettings
should only be used on Slurm based systems.If an allocation is specified, the instance receiving these run parameters will launch on that allocation.
- Parameters:
exe (
str
) – executable to runexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – srun arguments without dashesenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment variables for joballoc (
Optional
[str
], default:None
) – allocation ID if running on existing alloc
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- check_env_vars() None [source]#
Warn a user trying to set a variable which is set in the environment
Given Slurm’s env var precedence, trying to export a variable which is already present in the environment will not work.
- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_comma_sep_env_vars() Tuple[str, List[str]] [source]#
Build environment variable string for Slurm
Slurm takes exports in comma separated lists the list starts with all as to not disturb the rest of the environment for more information on this, see the slurm documentation for srun
- Return type:
Tuple
[str
,List
[str
]]- Returns:
the formatted string of environment variables
- format_env_vars() List[str] [source]#
Build bash compatible environment variable string for Slurm
- Return type:
List
[str
]- Returns:
the formatted string of environment variables
- format_run_args() List[str] [source]#
Return a list of slurm formatted run arguments
- Return type:
List
[str
]- Returns:
list of slurm arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None [source]#
Make a mpmd workload by combining two
srun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters:
settings (
RunSettings
) – SrunSettings instance- Return type:
None
- reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({'D', 'chdir'})#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None [source]#
Copy executable file to allocated compute nodes
This sets
--bcast
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None [source]#
Bind by setting CPU masks on tasks
This sets
--cpu-bind
using themap_cpu:<list>
option- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus to use per task
This sets
--cpus-per-task
- Parameters:
num_cpus – number of cpus to use per task
- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None [source]#
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Raises:
TypeError –
- Return type:
None
- set_het_group(het_group: Iterable[int]) None [source]#
Set the heterogeneous group for this job
this sets –het-group
- Parameters:
het_group (
Iterable
[int
]) – list of heterogeneous groups- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
This sets
--nodelist
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None [source]#
Use the contents of a file to set the node list
This sets
--nodefile
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None [source]#
Specify the real memory required per node
This sets
--mem
in megabytes- Parameters:
memory_per_node (
int
) – Amount of memory per node in megabytes- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None [source]#
Specify the node feature for this job
This sets
-C
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_nodes(nodes: int) None [source]#
Set the number of nodes
Effectively this is setting:
srun --nodes <num_nodes>
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None [source]#
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None [source]#
Set the number of tasks for this job
This sets
--ntasks
- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks for this job
This sets
--ntasks-per-node
- Parameters:
tasks_per_node (
int
) – number of tasks per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None [source]#
Set the job to run in verbose mode
This sets
--verbose
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None [source]#
Set the walltime of the job
format = “HH:MM:SS”
- Parameters:
walltime (
str
) – wall time- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
AprunSettings#
AprunSettings
can be used on any system that supports the
Cray ALPS layer. SmartSim supports using AprunSettings
on PBSPro WLM systems.
AprunSettings
can be used in interactive session (on allocation)
and within batch launches (e.g., QsubBatchSettings
)
|
Set the number of cpus to use per task |
|
Specify the hostlist for this job |
|
Set the number of tasks for this job |
|
Set the number of tasks for this job |
|
Make job an MPMD job |
Add executable arguments to executable |
|
Return a list of ALPS formatted run arguments |
|
Format the environment variables for aprun |
|
|
Update the job environment variables |
- class AprunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any)[source]#
Settings to run job with
aprun
commandAprunSettings
can be used for the pbs launcher.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] [source]#
Format the environment variables for aprun
- Return type:
List
[str
]- Returns:
list of env vars
- format_run_args() List[str] [source]#
Return a list of ALPS formatted run arguments
- Return type:
List
[str
]- Returns:
list of ALPS arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None [source]#
Make job an MPMD job
This method combines two
AprunSettings
into a single MPMD command joined with ‘:’- Parameters:
settings (
RunSettings
) –AprunSettings
instance- Return type:
None
- reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({})#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy executable file to allocated compute nodes
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None [source]#
Specifies the cores to which MPI processes are bound
This sets
--cpu-binding
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List of cpu numbers- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus to use per task
This sets
--cpus-per-pe
- Parameters:
cpus_per_task (
int
) – number of cpus to use per task- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None [source]#
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None [source]#
Use the contents of a file to set the node list
This sets
--node-list-file
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None [source]#
Specify the real memory required per node
This sets
--memory-per-pe
in megabytes- Parameters:
memory_per_node (
int
) – Per PE memory limit in megabytes- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None [source]#
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None [source]#
Set the number of tasks for this job
This sets
--pes
- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks for this job
This sets
--pes-per-node
- Parameters:
tasks_per_node (
int
) – number of tasks per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None [source]#
Set the job to run in verbose mode
This sets
--debug
arg to the highest level- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None [source]#
Set the walltime of the job
Walltime is given in total number of seconds
- Parameters:
walltime (
str
) – wall time- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
DragonRunSettings#
DragonRunSettings
can be used on systems that support Slurm or
PBS, if Dragon is available in the Python environment (see _dragon_install
for instructions on how to install it through smart
).
DragonRunSettings
can be used in interactive sessions (on allcation)
and within batch launches (i.e. SbatchSettings
or QsubBatchSettings
,
for Slurm and PBS sessions, respectively).
|
Set the number of nodes |
Set the number of tasks for this job |
- class DragonRunSettings(exe: str, exe_args: str | List[str] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Initialize run parameters for a Dragon process
DragonRunSettings
should only be used on systems where Dragon is available and installed in the current environment.If an allocation is specified, the instance receiving these run parameters will launch on that allocation.
- Parameters:
exe (
str
) – executable to runexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable arguments, defaults to Noneenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment variables for job, defaults to Nonealloc – allocation ID if running on existing alloc, defaults to None
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] #
Build environment variable string
- Return type:
List
[str
]- Returns:
formatted list of strings to export variables
- format_run_args() List[str] #
Return formatted run arguments
For
RunSettings
, the run arguments are passed literally with no formatting.- Return type:
List
[str
]- Returns:
list run arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None #
Make job an MPMD job
- Parameters:
settings (
RunSettings
) –RunSettings
instance- Return type:
None
- reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({})#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy executable file to allocated compute nodes
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_affinity(devices: List[int]) None [source]#
Set the CPU affinity for this job
- Parameters:
devices (
List
[int
]) – list of CPU indices to execute on- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None #
Set the number of cpus per task
- Parameters:
cpus_per_task (
int
) – number of cpus per task- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_gpu_affinity(devices: List[int]) None [source]#
Set the GPU affinity for this job
- Parameters:
devices (
List
[int
]) – list of GPU indices to execute on.- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to specify the hostlist for this job
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None #
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None [source]#
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – a collection of strings representing the required node features. Currently supported node features are: “gpu”- Return type:
None
- set_nodes(nodes: int) None [source]#
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None #
Set the number of tasks to launch
- Parameters:
tasks (
int
) – number of tasks to launch- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks for this job
- Parameters:
tasks_per_node (
int
) – number of tasks per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the formatted walltime
- Parameters:
walltime (
str
) – Time in format required by launcher``- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
MpirunSettings#
MpirunSettings
are for launching with OpenMPI. MpirunSettings
are
supported on Slurm and PBSpro.
|
Set the number of tasks for this job |
|
Set the hostlist for the |
|
Set the number of tasks for this job |
|
Set |
|
Make a mpmd workload by combining two |
Add executable arguments to executable |
|
Return a list of MPI-standard formatted run arguments |
|
Format the environment variables for mpirun |
|
|
Update the job environment variables |
- class MpirunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Settings to run job with
mpirun
command (MPI-standard)Note that environment variables can be passed with a None value to signify that they should be exported from the current environment
Any arguments passed in the
run_args
dict will be converted intompirun
arguments and prefixed with--
. Values of None can be provided for arguments that do not have values.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] #
Format the environment variables for mpirun
- Return type:
List
[str
]- Returns:
list of env vars
- format_run_args() List[str] #
Return a list of MPI-standard formatted run arguments
- Return type:
List
[str
]- Returns:
list of MPI-standard arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None #
Make a mpmd workload by combining two
mpirun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters:
settings (
RunSettings
) – MpirunSettings instance- Return type:
None
- reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({'wd', 'wdir'})#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy the specified executable(s) to remote machines
This sets
--preload-binary
- Parameters:
dest_path (
Optional
[str
], default:None
) – Destination path (Ignored)- Return type:
None
- set_cpu_binding_type(bind_type: str) None #
Specifies the cores to which MPI processes are bound
This sets
--bind-to
for MPI compliant implementations- Parameters:
bind_type (
str
) – binding type- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None #
Set the number of tasks for this job
This sets
--cpus-per-proc
for MPI compliant implementationsnote: this option has been deprecated in openMPI 4.0+ and will soon be replaced.
- Parameters:
cpus_per_task (
int
) – number of tasks- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Set the hostlist for the
mpirun
commandThis sets
--host
- Parameters:
host_list (
Union
[str
,List
[str
]]) – list of host names- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to set the hostlist
This sets
--hostfile
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None #
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set
mpirun
task mappingthis sets
--map-by <mapping>
For examples, see the man page for
mpirun
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None #
Set the number of tasks for this job
This sets
-n
for MPI compliant implementations- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None #
Set the number of tasks per node
- Parameters:
tasks_per_node (
int
) – number of tasks to launch per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
This sets
--verbose
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the maximum number of seconds that a job will run
This sets
--timeout
- Parameters:
walltime (
str
) – number like string of seconds that a job will run in secs- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
MpiexecSettings#
MpiexecSettings
are for launching with OpenMPI’s mpiexec
. MpirunSettings
are
supported on Slurm and PBSpro.
|
Set the number of tasks for this job |
|
Set the hostlist for the |
|
Set the number of tasks for this job |
|
Set |
|
Make a mpmd workload by combining two |
Add executable arguments to executable |
|
Return a list of MPI-standard formatted run arguments |
|
Format the environment variables for mpirun |
|
|
Update the job environment variables |
- class MpiexecSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Settings to run job with
mpiexec
command (MPI-standard)Note that environment variables can be passed with a None value to signify that they should be exported from the current environment
Any arguments passed in the
run_args
dict will be converted intompiexec
arguments and prefixed with--
. Values of None can be provided for arguments that do not have values.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] #
Format the environment variables for mpirun
- Return type:
List
[str
]- Returns:
list of env vars
- format_run_args() List[str] #
Return a list of MPI-standard formatted run arguments
- Return type:
List
[str
]- Returns:
list of MPI-standard arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None #
Make a mpmd workload by combining two
mpirun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters:
settings (
RunSettings
) – MpirunSettings instance- Return type:
None
- reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({'wd', 'wdir'})#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy the specified executable(s) to remote machines
This sets
--preload-binary
- Parameters:
dest_path (
Optional
[str
], default:None
) – Destination path (Ignored)- Return type:
None
- set_cpu_binding_type(bind_type: str) None #
Specifies the cores to which MPI processes are bound
This sets
--bind-to
for MPI compliant implementations- Parameters:
bind_type (
str
) – binding type- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None #
Set the number of tasks for this job
This sets
--cpus-per-proc
for MPI compliant implementationsnote: this option has been deprecated in openMPI 4.0+ and will soon be replaced.
- Parameters:
cpus_per_task (
int
) – number of tasks- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Set the hostlist for the
mpirun
commandThis sets
--host
- Parameters:
host_list (
Union
[str
,List
[str
]]) – list of host names- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to set the hostlist
This sets
--hostfile
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None #
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set
mpirun
task mappingthis sets
--map-by <mapping>
For examples, see the man page for
mpirun
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None #
Set the number of tasks for this job
This sets
-n
for MPI compliant implementations- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None #
Set the number of tasks per node
- Parameters:
tasks_per_node (
int
) – number of tasks to launch per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
This sets
--verbose
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the maximum number of seconds that a job will run
This sets
--timeout
- Parameters:
walltime (
str
) – number like string of seconds that a job will run in secs- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
OrterunSettings#
OrterunSettings
are for launching with OpenMPI’s orterun
. OrterunSettings
are
supported on Slurm and PBSpro.
|
Set the number of tasks for this job |
|
Set the hostlist for the |
|
Set the number of tasks for this job |
|
Set |
|
Make a mpmd workload by combining two |
Add executable arguments to executable |
|
Return a list of MPI-standard formatted run arguments |
|
Format the environment variables for mpirun |
|
|
Update the job environment variables |
- class OrterunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Settings to run job with
orterun
command (MPI-standard)Note that environment variables can be passed with a None value to signify that they should be exported from the current environment
Any arguments passed in the
run_args
dict will be converted intoorterun
arguments and prefixed with--
. Values of None can be provided for arguments that do not have values.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] #
Format the environment variables for mpirun
- Return type:
List
[str
]- Returns:
list of env vars
- format_run_args() List[str] #
Return a list of MPI-standard formatted run arguments
- Return type:
List
[str
]- Returns:
list of MPI-standard arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None #
Make a mpmd workload by combining two
mpirun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters:
settings (
RunSettings
) – MpirunSettings instance- Return type:
None
- reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({'wd', 'wdir'})#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy the specified executable(s) to remote machines
This sets
--preload-binary
- Parameters:
dest_path (
Optional
[str
], default:None
) – Destination path (Ignored)- Return type:
None
- set_cpu_binding_type(bind_type: str) None #
Specifies the cores to which MPI processes are bound
This sets
--bind-to
for MPI compliant implementations- Parameters:
bind_type (
str
) – binding type- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None #
Set the number of tasks for this job
This sets
--cpus-per-proc
for MPI compliant implementationsnote: this option has been deprecated in openMPI 4.0+ and will soon be replaced.
- Parameters:
cpus_per_task (
int
) – number of tasks- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Set the hostlist for the
mpirun
commandThis sets
--host
- Parameters:
host_list (
Union
[str
,List
[str
]]) – list of host names- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to set the hostlist
This sets
--hostfile
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None #
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set
mpirun
task mappingthis sets
--map-by <mapping>
For examples, see the man page for
mpirun
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None #
Set the number of tasks for this job
This sets
-n
for MPI compliant implementations- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None #
Set the number of tasks per node
- Parameters:
tasks_per_node (
int
) – number of tasks to launch per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
This sets
--verbose
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the maximum number of seconds that a job will run
This sets
--timeout
- Parameters:
walltime (
str
) – number like string of seconds that a job will run in secs- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
SbatchSettings#
SbatchSettings
are used for launching batches onto Slurm
WLM systems.
|
Set the account for this batch job |
|
Set the command used to launch the batch e.g. |
|
Set the number of nodes for this batch job |
|
Specify the hostlist for this job |
|
Set the partition for the batch job |
|
alias for set_partition |
|
Set the walltime of the job |
Get the formatted batch arguments for a preview |
- class SbatchSettings(nodes: int | None = None, time: str = '', account: str | None = None, batch_args: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Specify run parameters for a Slurm batch job
Slurm sbatch arguments can be written into
batch_args
as a dictionary. e.g. {‘ntasks’: 1}If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}
Initialization values provided (nodes, time, account) will overwrite the same arguments in
batch_args
if present- Parameters:
nodes (
Optional
[int
], default:None
) – number of nodestime (
str
, default:''
) – walltime for job, e.g. “10:00:00” for 10 hoursaccount (
Optional
[str
], default:None
) – account for jobbatch_args (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – extra batch arguments
- add_preamble(lines: List[str]) None #
Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters:
line – lines to add to preamble.
- Return type:
None
- property batch_args: Dict[str, str | None]#
Retrieve attached batch arguments
- Returns:
attached batch arguments
- property batch_cmd: str#
Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns:
batch command
- format_batch_args() List[str] [source]#
Get the formatted batch arguments for a preview
- Return type:
List
[str
]- Returns:
batch arguments for Sbatch
- property preamble: Iterable[str]#
Return an iterable of preamble clauses to be prepended to the batch file
- Returns:
attached preamble clauses
- set_account(account: str) None [source]#
Set the account for this batch job
- Parameters:
account (
str
) – account id- Return type:
None
- set_batch_command(command: str) None #
Set the command used to launch the batch e.g.
sbatch
- Parameters:
command (
str
) – batch command- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus to use per task
This sets
--cpus-per-task
- Parameters:
num_cpus – number of cpus to use per task
- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_nodes(num_nodes: int) None [source]#
Set the number of nodes for this batch job
- Parameters:
num_nodes (
int
) – number of nodes- Return type:
None
- set_partition(partition: str) None [source]#
Set the partition for the batch job
- Parameters:
partition (
str
) – partition name- Return type:
None
QsubBatchSettings#
QsubBatchSettings
are used to configure jobs that should
be launched as a batch on PBSPro systems.
|
Set the account for this batch job |
|
Set the command used to launch the batch e.g. |
|
Set the number of nodes for this batch job |
|
Set the number of cpus obtained in each node. |
|
Set the queue for the batch job |
Set a resource value for the Qsub batch |
|
|
Set the walltime of the job |
Get the formatted batch arguments for a preview |
- class QsubBatchSettings(nodes: int | None = None, ncpus: int | None = None, time: str | None = None, queue: str | None = None, account: str | None = None, resources: Dict[str, str | int] | None = None, batch_args: Dict[str, str | None] | None = None, **kwargs: Any)[source]#
Specify
qsub
batch parameters for a jobnodes
, andncpus
are used to create the select statement for PBS if a select statement is not included in theresources
. If both are supplied the value for select statement supplied inresources
will override.- Parameters:
nodes (
Optional
[int
], default:None
) – number of nodes for batchncpus (
Optional
[int
], default:None
) – number of cpus per nodetime (
Optional
[str
], default:None
) – walltime for batch jobqueue (
Optional
[str
], default:None
) – queue to run batch inaccount (
Optional
[str
], default:None
) – account for batch launchresources (
Optional
[Dict
[str
,Union
[str
,int
]]], default:None
) – overrides for resource argumentsbatch_args (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – overrides for PBS batch arguments
- add_preamble(lines: List[str]) None #
Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters:
line – lines to add to preamble.
- Return type:
None
- property batch_args: Dict[str, str | None]#
Retrieve attached batch arguments
- Returns:
attached batch arguments
- property batch_cmd: str#
Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns:
batch command
- format_batch_args() List[str] [source]#
Get the formatted batch arguments for a preview
- Return type:
List
[str
]- Returns:
batch arguments for Qsub
- Raises:
ValueError – if options are supplied without values
- property preamble: Iterable[str]#
Return an iterable of preamble clauses to be prepended to the batch file
- Returns:
attached preamble clauses
- property resources: Dict[str, str | int]#
- set_account(account: str) None [source]#
Set the account for this batch job
- Parameters:
acct – account id
- Return type:
None
- set_batch_command(command: str) None #
Set the command used to launch the batch e.g.
sbatch
- Parameters:
command (
str
) – batch command- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_ncpus(num_cpus: int | str) None [source]#
Set the number of cpus obtained in each node.
If a select argument is provided in
QsubBatchSettings.resources
, then this value will be overridden- Parameters:
num_cpus (
Union
[int
,str
]) – number of cpus per node in select- Return type:
None
- set_nodes(num_nodes: int) None [source]#
Set the number of nodes for this batch job
In PBS, ‘select’ is the more primitive way of describing how many nodes to allocate for the job. ‘nodes’ is equivalent to ‘select’ with a ‘place’ statement. Assuming that only advanced users would use ‘set_resource’ instead, defining the number of nodes here is sets the ‘nodes’ resource.
- Parameters:
num_nodes (
int
) – number of nodes- Return type:
None
- set_queue(queue: str) None [source]#
Set the queue for the batch job
- Parameters:
queue (
str
) – queue name- Return type:
None
- set_resource(resource_name: str, value: str | int) None [source]#
Set a resource value for the Qsub batch
If a select statement is provided, the nodes and ncpus arguments will be overridden. Likewise for Walltime
- Parameters:
resource_name (
str
) – name of resource, e.g. walltimevalue (
Union
[str
,int
]) – value
- Return type:
None
Singularity#
Singularity
is a type of Container
that can be passed to a
RunSettings
class or child class to enable running the workload in a
container.
- class Singularity(*args: Any, **kwargs: Any) None [source]#
Singularity (apptainer) container type. To be passed into a
RunSettings
class initializer orExperiment.create_run_settings
.Note
Singularity integration is currently tested with Apptainer 1.0 with slurm and PBS workload managers only.
Also, note that user-defined bind paths (
mount
argument) may be disabled by a system administrator- Parameters:
image – local or remote path to container image, e.g.
docker://sylabsio/lolcow
args (
Any
) – arguments to ‘singularity exec’ commandmount – paths to mount (bind) from host machine into image.
Orchestrator#
Orchestrator#
Model#
|
Initialize a |
|
Attach files to an entity for generation |
|
An alias for |
|
Colocate an Orchestrator instance with this Model over TCP/IP. |
|
Colocate an Orchestrator instance with this Model over UDS. |
Return True if this Model will run with a colocated Orchestrator |
|
|
A TF, TF-lite, PT, or ONNX model to load into the DB at runtime |
|
TorchScript to launch with this Model instance |
|
TorchScript function to launch with this Model instance |
Convert parameters to command line arguments and update run settings. |
|
|
Register future communication between entities. |
If called, the entity will prefix its keys with its own model name |
|
If called, the entity will not prefix its keys with its own model name |
|
Inquire as to whether this entity will prefix its keys with its name |
Model#
- class Model(name: str, params: Dict[str, str], run_settings: smartsim.settings.base.RunSettings, path: str | None = '/usr/local/src/SmartSim/doc', params_as_args: List[str] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None)[source]#
Bases:
SmartSimEntity
Initialize a
Model
- Parameters:
name (
str
) – name of the modelparams (
Dict
[str
,str
]) – model parameters for writing into configuration files or to be passed as command line arguments to executable.path (
Optional
[str
], default:'/usr/local/src/SmartSim/doc'
) – path to output, error, and configuration filesrun_settings (
RunSettings
) – launcher settings specified in the experimentparams_as_args (
Optional
[List
[str
]], default:None
) – list of parameters which have to be interpreted as command line arguments to be added to run_settingsbatch_settings (
Optional
[BatchSettings
], default:None
) – Launcher settings for running the individual model as a batch job
- add_function(name: str, function: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None [source]#
TorchScript function to launch with this Model instance
Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of this Model instance.
For converged orchestrators, the
add_script()
method should be used.Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.
Setting
devices_per_node=N
, with N greater than one will result in the model being stored in the first N devices of typedevice
.- Parameters:
name (
str
) – key to store function underfunction (
Optional
[str
], default:None
) – TorchScript function codedevice (
str
, default:'CPU'
) – device for script executiondevices_per_node (
int
, default:1
) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.first_device (
int
, default:0
) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
- Return type:
None
- add_ml_model(name: str, backend: str, model: bytes | None = None, model_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0, batch_size: int = 0, min_batch_size: int = 0, min_batch_timeout: int = 0, tag: str = '', inputs: List[str] | None = None, outputs: List[str] | None = None) None [source]#
A TF, TF-lite, PT, or ONNX model to load into the DB at runtime
Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance
One of either model (in memory representation) or model_path (file) must be provided
- Parameters:
name (
str
) – key to store model underbackend (
str
) – name of the backend (TORCH, TF, TFLITE, ONNX)model (
Optional
[bytes
], default:None
) – A model in memory (only supported for non-colocated orchestrators)model_path (
Optional
[str
], default:None
) – serialized modeldevice (
str
, default:'CPU'
) – name of device for executiondevices_per_node (
int
, default:1
) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.first_device (
int
, default:0
) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.batch_size (
int
, default:0
) – batch size for executionmin_batch_size (
int
, default:0
) – minimum batch size for model executionmin_batch_timeout (
int
, default:0
) – time to wait for minimum batch sizetag (
str
, default:''
) – additional tag for model informationinputs (
Optional
[List
[str
]], default:None
) – model inputs (TF only)outputs (
Optional
[List
[str
]], default:None
) – model outupts (TF only)
- Return type:
None
- add_script(name: str, script: str | None = None, script_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None [source]#
TorchScript to launch with this Model instance
Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance
Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.
Setting
devices_per_node=N
, with N greater than one will result in the script being stored in the first N devices of typedevice
; alternatively, settingfirst_device=M
will result in the script being stored on nodes M through M + N - 1.One of either script (in memory string representation) or script_path (file) must be provided
- Parameters:
name (
str
) – key to store script underscript (
Optional
[str
], default:None
) – TorchScript code (only supported for non-colocated orchestrators)script_path (
Optional
[str
], default:None
) – path to TorchScript codedevice (
str
, default:'CPU'
) – device for script executiondevices_per_node (
int
, default:1
) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.first_device (
int
, default:0
) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
- Return type:
None
- attach_generator_files(to_copy: List[str] | None = None, to_symlink: List[str] | None = None, to_configure: List[str] | None = None) None [source]#
Attach files to an entity for generation
Attach files needed for the entity that, upon generation, will be located in the path of the entity. Invoking this method after files have already been attached will overwrite the previous list of entity files.
During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.
Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;
- Parameters:
to_copy (
Optional
[List
[str
]], default:None
) – files to copyto_symlink (
Optional
[List
[str
]], default:None
) – files to symlinkto_configure (
Optional
[List
[str
]], default:None
) – input files with tagged parameters
- Return type:
None
- property attached_files_table: str#
Return a list of attached files as a plain text table
- Returns:
String version of table
- colocate_db(*args: Any, **kwargs: Any) None [source]#
An alias for
Model.colocate_db_tcp
- Return type:
None
- colocate_db_tcp(port: int = 6379, ifname: str | list[str] = 'lo', db_cpus: int = 1, custom_pinning: Iterable[int | Iterable[int]] | None = None, debug: bool = False, db_identifier: str = '', **kwargs: Any) None [source]#
Colocate an Orchestrator instance with this Model over TCP/IP.
This method will initialize settings which add an unsharded database to this Model instance. Only this Model will be able to communicate with this colocated database by using the loopback TCP interface.
Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.
ex. kwargs = { maxclients: 100000, threads_per_queue: 1, inter_op_threads: 1, intra_op_threads: 1, server_threads: 2 # keydb only }
Generally these don’t need to be changed.
- Parameters:
port (
int
, default:6379
) – port to use for orchestrator databaseifname (
Union
[str
,list
[str
]], default:'lo'
) – interface to use for orchestratordb_cpus (
int
, default:1
) – number of cpus to use for orchestratorcustom_pinning (
Optional
[Iterable
[Union
[int
,Iterable
[int
]]]], default:None
) – CPUs to pin the orchestrator to. Passing an empty iterable disables pinningdebug (
bool
, default:False
) – launch Model with extra debug information about the colocated dbkwargs (
Any
) – additional keyword arguments to pass to the orchestrator database
- Return type:
None
- colocate_db_uds(unix_socket: str = '/tmp/redis.socket', socket_permissions: int = 755, db_cpus: int = 1, custom_pinning: Iterable[int | Iterable[int]] | None = None, debug: bool = False, db_identifier: str = '', **kwargs: Any) None [source]#
Colocate an Orchestrator instance with this Model over UDS.
This method will initialize settings which add an unsharded database to this Model instance. Only this Model will be able to communicate with this colocated database by using Unix Domain sockets.
Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.
example_kwargs = { "maxclients": 100000, "threads_per_queue": 1, "inter_op_threads": 1, "intra_op_threads": 1, "server_threads": 2 # keydb only }
Generally these don’t need to be changed.
- Parameters:
unix_socket (
str
, default:'/tmp/redis.socket'
) – path to where the socket file will be createdsocket_permissions (
int
, default:755
) – permissions for the socketfiledb_cpus (
int
, default:1
) – number of cpus to use for orchestratorcustom_pinning (
Optional
[Iterable
[Union
[int
,Iterable
[int
]]]], default:None
) – CPUs to pin the orchestrator to. Passing an empty iterable disables pinningdebug (
bool
, default:False
) – launch Model with extra debug information about the colocated dbkwargs (
Any
) – additional keyword arguments to pass to the orchestrator database
- Return type:
None
- property colocated: bool#
Return True if this Model will run with a colocated Orchestrator
- Returns:
Return True of the Model will run with a colocated Orchestrator
- property db_models: Iterable[DBModel]#
Retrieve an immutable collection of attached models
- Returns:
Return an immutable collection of attached models
- property db_scripts: Iterable[DBScript]#
Retrieve an immutable collection attached of scripts
- Returns:
Return an immutable collection of attached scripts
- disable_key_prefixing() None [source]#
If called, the entity will not prefix its keys with its own model name
- Return type:
None
- enable_key_prefixing() None [source]#
If called, the entity will prefix its keys with its own model name
- Return type:
None
- params_to_args() None [source]#
Convert parameters to command line arguments and update run settings.
- Return type:
None
- print_attached_files() None [source]#
Print a table of the attached files on std out
- Return type:
None
- query_key_prefixing() bool [source]#
Inquire as to whether this entity will prefix its keys with its name
- Return type:
bool
- Returns:
Return True if entity will prefix its keys with its name
- register_incoming_entity(incoming_entity: smartsim.entity.entity.SmartSimEntity) None [source]#
Register future communication between entities.
Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity
- Parameters:
incoming_entity (
SmartSimEntity
) – The entity that data will be received from- Raises:
SmartSimError – if incoming entity has already been registered
- Return type:
None
- property type: str#
Return the name of the class
Ensemble#
|
Initialize an Ensemble of Model instances. |
|
Add a model to this ensemble |
|
A TF, TF-lite, PT, or ONNX model to load into the DB at runtime |
|
TorchScript to launch with every entity belonging to this ensemble |
|
TorchScript function to launch with every entity belonging to this ensemble |
|
Attach files to each model within the ensemble for generation |
If called, each model within this ensemble will prefix its key with its own model name. |
|
An alias for a shallow copy of the |
|
Inquire as to whether each model within the ensemble will prefix their keys |
|
Register future communication between entities. |
Ensemble#
- class Ensemble(name: str, params: Dict[str, Any], path: str | None = '/usr/local/src/SmartSim/doc', params_as_args: List[str] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None, run_settings: smartsim.settings.base.RunSettings | None = None, perm_strat: str = 'all_perm', **kwargs: Any) None [source]#
Bases:
EntityList
[Model
]Ensemble
is a group ofModel
instances that can be treated as a reference to a single instance.Initialize an Ensemble of Model instances.
The kwargs argument can be used to pass custom input parameters to the permutation strategy.
- Parameters:
name (
str
) – name of the ensembleparams (
Dict
[str
,Any
]) – parameters to expand intoModel
membersparams_as_args (
Optional
[List
[str
]], default:None
) – list of params that should be used as command line arguments to theModel
member executables and not written to generator filesbatch_settings (
Optional
[BatchSettings
], default:None
) – describes settings forEnsemble
as batch workloadrun_settings (
Optional
[RunSettings
], default:None
) – describes how eachModel
should be executedreplicas – number of
Model
replicas to create - a keyword argument of kwargsperm_strategy – strategy for expanding
params
intoModel
instances from params argument options are “all_perm”, “step”, “random” or a callable function.
- Returns:
Ensemble
instance
- add_function(name: str, function: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None [source]#
TorchScript function to launch with every entity belonging to this ensemble
Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of every entity belonging to this ensemble.
For converged orchestrators, the
add_script()
method should be used.Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.
Setting
devices_per_node=N
, with N greater than one will result in the script being stored in the first N devices of typedevice
; alternatively, settingfirst_device=M
will result in the script being stored on nodes M through M + N - 1.- Parameters:
name (
str
) – key to store function underfunction (
Optional
[str
], default:None
) – TorchScript codedevice (
str
, default:'CPU'
) – device for script executiondevices_per_node (
int
, default:1
) – number of devices on each hostfirst_device (
int
, default:0
) – first device to use on each host
- Return type:
None
- add_ml_model(name: str, backend: str, model: bytes | None = None, model_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0, batch_size: int = 0, min_batch_size: int = 0, min_batch_timeout: int = 0, tag: str = '', inputs: List[str] | None = None, outputs: List[str] | None = None) None [source]#
A TF, TF-lite, PT, or ONNX model to load into the DB at runtime
Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble
One of either model (in memory representation) or model_path (file) must be provided
- Parameters:
name (
str
) – key to store model undermodel (
Optional
[bytes
], default:None
) – model in memorymodel_path (
Optional
[str
], default:None
) – serialized modelbackend (
str
) – name of the backend (TORCH, TF, TFLITE, ONNX)device (
str
, default:'CPU'
) – name of device for executiondevices_per_node (
int
, default:1
) – number of GPUs per node in multiGPU nodesfirst_device (
int
, default:0
) – first device in multi-GPU nodes to use for execution, defaults to 0; ignored if devices_per_node is 1batch_size (
int
, default:0
) – batch size for executionmin_batch_size (
int
, default:0
) – minimum batch size for model executionmin_batch_timeout (
int
, default:0
) – time to wait for minimum batch sizetag (
str
, default:''
) – additional tag for model informationinputs (
Optional
[List
[str
]], default:None
) – model inputs (TF only)outputs (
Optional
[List
[str
]], default:None
) – model outupts (TF only)
- Return type:
None
- add_model(model: smartsim.entity.model.Model) None [source]#
Add a model to this ensemble
- Parameters:
model (
Model
) – model instance to be added- Raises:
TypeError – if model is not an instance of
Model
EntityExistsError – if model already exists in this ensemble
- Return type:
None
- add_script(name: str, script: str | None = None, script_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None [source]#
TorchScript to launch with every entity belonging to this ensemble
Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble
Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.
Setting
devices_per_node=N
, with N greater than one will result in the model being stored in the first N devices of typedevice
.One of either script (in memory string representation) or script_path (file) must be provided
- Parameters:
name (
str
) – key to store script underscript (
Optional
[str
], default:None
) – TorchScript codescript_path (
Optional
[str
], default:None
) – path to TorchScript codedevice (
str
, default:'CPU'
) – device for script executiondevices_per_node (
int
, default:1
) – number of devices on each hostfirst_device (
int
, default:0
) – first device to use on each host
- Return type:
None
- attach_generator_files(to_copy: List[str] | None = None, to_symlink: List[str] | None = None, to_configure: List[str] | None = None) None [source]#
Attach files to each model within the ensemble for generation
Attach files needed for the entity that, upon generation, will be located in the path of the entity.
During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.
Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;
- Parameters:
to_copy (
Optional
[List
[str
]], default:None
) – files to copyto_symlink (
Optional
[List
[str
]], default:None
) – files to symlinkto_configure (
Optional
[List
[str
]], default:None
) – input files with tagged parameters
- Return type:
None
- property attached_files_table: str#
Return a plain-text table with information about files attached to models belonging to this ensemble.
- Returns:
A table of all files attached to all models
- property batch: bool#
Property indicating whether or not the entity sequence should be launched as a batch job
- Returns:
True
if entity sequence should be launched as a batch job,False
if the members will be launched individually.
- property db_models: Iterable[smartsim.entity.DBModel]#
Return an immutable collection of attached models
- property db_scripts: Iterable[smartsim.entity.DBScript]#
Return an immutable collection of attached scripts
- enable_key_prefixing() None [source]#
If called, each model within this ensemble will prefix its key with its own model name.
- Return type:
None
- query_key_prefixing() bool [source]#
Inquire as to whether each model within the ensemble will prefix their keys
- Return type:
bool
- Returns:
True if all models have key prefixing enabled, False otherwise
- register_incoming_entity(incoming_entity: smartsim.entity.entity.SmartSimEntity) None [source]#
Register future communication between entities.
Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity
Only python clients can have multiple incoming connections
- Parameters:
incoming_entity (
SmartSimEntity
) – The entity that data will be received from- Return type:
None
- property type: str#
Return the name of the class
Machine Learning#
SmartSim includes built-in utilities for supporting TensorFlow, Keras, and Pytorch.
TensorFlow#
SmartSim includes built-in utilities for supporting TensorFlow and Keras in training and inference.
PyTorch#
SmartSim includes built-in utilities for supporting PyTorch in training and inference.