SmartSim API

Contents

SmartSim API#

Experiment#

Settings#

Settings are provided to Model and Ensemble objects to provide parameters for how a job should be executed. Some are specifically meant for certain launchers like SbatchSettings is solely meant for system using Slurm as a workload manager. MpirunSettings for OpenMPI based jobs is supported by Slurm and PBSPro.

Types of Settings:

RunSettings(exe[, exe_args, run_command, ...])

Run parameters for a Model

SrunSettings(exe[, exe_args, run_args, ...])

Initialize run parameters for a slurm job with srun

AprunSettings(exe[, exe_args, run_args, ...])

Settings to run job with aprun command

MpirunSettings(exe[, exe_args, run_args, ...])

Settings to run job with mpirun command (MPI-standard)

MpiexecSettings(exe[, exe_args, run_args, ...])

Settings to run job with mpiexec command (MPI-standard)

OrterunSettings(exe[, exe_args, run_args, ...])

Settings to run job with orterun command (MPI-standard)

DragonRunSettings(exe[, exe_args, env_vars])

Initialize run parameters for a Dragon process

SbatchSettings([nodes, time, account, ...])

Specify run parameters for a Slurm batch job

QsubBatchSettings([nodes, ncpus, time, ...])

Specify qsub batch parameters for a job

Settings objects can accept a container object that defines a container runtime, image, and arguments to use for the workload. Below is a list of supported container runtimes.

Types of Containers:

Singularity(*args, **kwargs)

Singularity (apptainer) container type.

RunSettings#

When running SmartSim on laptops and single node workstations, the base RunSettings object is used to parameterize jobs. RunSettings include a run_command parameter for local launches that utilize a parallel launch binary like mpirun, mpiexec, and others.

RunSettings.add_exe_args(args)

Add executable arguments to executable

RunSettings.update_env(env_vars)

Update the job environment variables

class RunSettings(exe: str, exe_args: str | List[str] | None = None, run_command: str = '', run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, container: smartsim.settings.containers.Container | None = None, **_kwargs: Any) None[source]#

Run parameters for a Model

The base RunSettings class should only be used with the local launcher on single node, workstations, or laptops.

If no run_command is specified, the executable will be launched locally.

run_args passed as a dict will be interpreted literally for local RunSettings and added directly to the run_command e.g. run_args = {“-np”: 2} will be “-np 2”

Example initialization

rs = RunSettings("echo", "hello", "mpirun", run_args={"-np": "2"})
Parameters:
  • exe (str) – executable to run

  • exe_args (Union[str, List[str], None], default: None) – executable arguments

  • run_command (str, default: '') – launch binary (e.g. “srun”)

  • run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command (e.g. -np for mpiexec)

  • env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

  • container (Optional[Container], default: None) – container type for workload (e.g. “singularity”)

add_exe_args(args: str | List[str]) None[source]#

Add executable arguments to executable

Parameters:

args (Union[str, List[str]]) – executable arguments

Return type:

None

property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:

attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:

attached executable arguments

format_env_vars() List[str][source]#

Build environment variable string

Return type:

List[str]

Returns:

formatted list of strings to export variables

format_run_args() List[str][source]#

Return formatted run arguments

For RunSettings, the run arguments are passed literally with no formatting.

Return type:

List[str]

Returns:

list run arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) None[source]#

Make job an MPMD job

Parameters:

settings (RunSettings) – RunSettings instance

Return type:

None

reserved_run_args: ClassVar[frozenset[str]] = frozenset({})#
property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:

attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:

launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) None[source]#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters:
  • arg (str) – name of the argument

  • value (Optional[str], default: None) – value of the argument

  • conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) None[source]#

Set binding

Parameters:

binding (str) – Binding

Return type:

None

set_broadcast(dest_path: str | None = None) None[source]#

Copy executable file to allocated compute nodes

Parameters:

dest_path (Optional[str], default: None) – Path to copy an executable file

Return type:

None

set_cpu_bindings(bindings: int | List[int]) None[source]#

Set the cores to which MPI processes are bound

Parameters:

bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound

Return type:

None

set_cpus_per_task(cpus_per_task: int) None[source]#

Set the number of cpus per task

Parameters:

cpus_per_task (int) – number of cpus per task

Return type:

None

set_excluded_hosts(host_list: str | List[str]) None[source]#

Specify a list of hosts to exclude for launching this job

Parameters:

host_list (Union[str, List[str]]) – hosts to exclude

Return type:

None

set_hostlist(host_list: str | List[str]) None[source]#

Specify the hostlist for this job

Parameters:

host_list (Union[str, List[str]]) – hosts to launch on

Return type:

None

set_hostlist_from_file(file_path: str) None[source]#

Use the contents of a file to specify the hostlist for this job

Parameters:

file_path (str) – Path to the hostlist file

Return type:

None

set_memory_per_node(memory_per_node: int) None[source]#

Set the amount of memory required per node in megabytes

Parameters:

memory_per_node (int) – Number of megabytes per node

Return type:

None

set_mpmd_preamble(preamble_lines: List[str]) None[source]#

Set preamble to a file to make a job MPMD

Parameters:

preamble_lines (List[str]) – lines to put at the beginning of a file.

Return type:

None

set_node_feature(feature_list: str | List[str]) None[source]#

Specify the node feature for this job

Parameters:

feature_list (Union[str, List[str]]) – node feature to launch on

Return type:

None

set_nodes(nodes: int) None[source]#

Set the number of nodes

Parameters:

nodes (int) – number of nodes to run with

Return type:

None

set_quiet_launch(quiet: bool) None[source]#

Set the job to run in quiet mode

Parameters:

quiet (bool) – Whether the job should be run quietly

Return type:

None

set_task_map(task_mapping: str) None[source]#

Set a task mapping

Parameters:

task_mapping (str) – task mapping

Return type:

None

set_tasks(tasks: int) None[source]#

Set the number of tasks to launch

Parameters:

tasks (int) – number of tasks to launch

Return type:

None

set_tasks_per_node(tasks_per_node: int) None[source]#

Set the number of tasks per node

Parameters:

tasks_per_node (int) – number of tasks to launch per node

Return type:

None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None[source]#

Automatically format and set wall time

Parameters:
  • hours (int, default: 0) – number of hours to run job

  • minutes (int, default: 0) – number of minutes to run job

  • seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) None[source]#

Set the job to run in verbose mode

Parameters:

verbose (bool) – Whether the job should be run verbosely

Return type:

None

set_walltime(walltime: str) None[source]#

Set the formatted walltime

Parameters:

walltime (str) – Time in format required by launcher``

Return type:

None

update_env(env_vars: Dict[str, str | int | float | bool]) None[source]#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:

env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises:

TypeError – if env_vars values cannot be coerced to strings

Return type:

None

SrunSettings#

SrunSettings can be used for running on existing allocations, running jobs in interactive allocations, and for adding srun steps to a batch.

SrunSettings.set_nodes(nodes)

Set the number of nodes

SrunSettings.set_node_feature(feature_list)

Specify the node feature for this job

SrunSettings.set_tasks(tasks)

Set the number of tasks for this job

SrunSettings.set_tasks_per_node(tasks_per_node)

Set the number of tasks for this job

SrunSettings.set_walltime(walltime)

Set the walltime of the job

SrunSettings.set_hostlist(host_list)

Specify the hostlist for this job

SrunSettings.set_excluded_hosts(host_list)

Specify a list of hosts to exclude for launching this job

SrunSettings.set_cpus_per_task(cpus_per_task)

Set the number of cpus to use per task

SrunSettings.add_exe_args(args)

Add executable arguments to executable

SrunSettings.format_run_args()

Return a list of slurm formatted run arguments

SrunSettings.format_env_vars()

Build bash compatible environment variable string for Slurm

SrunSettings.update_env(env_vars)

Update the job environment variables

class SrunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, alloc: str | None = None, **kwargs: Any) None[source]#

Initialize run parameters for a slurm job with srun

SrunSettings should only be used on Slurm based systems.

If an allocation is specified, the instance receiving these run parameters will launch on that allocation.

Parameters:
  • exe (str) – executable to run

  • exe_args (Union[str, List[str], None], default: None) – executable arguments

  • run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – srun arguments without dashes

  • env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment variables for job

  • alloc (Optional[str], default: None) – allocation ID if running on existing alloc

add_exe_args(args: str | List[str]) None#

Add executable arguments to executable

Parameters:

args (Union[str, List[str]]) – executable arguments

Return type:

None

check_env_vars() None[source]#

Warn a user trying to set a variable which is set in the environment

Given Slurm’s env var precedence, trying to export a variable which is already present in the environment will not work.

Return type:

None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:

attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:

attached executable arguments

format_comma_sep_env_vars() Tuple[str, List[str]][source]#

Build environment variable string for Slurm

Slurm takes exports in comma separated lists the list starts with all as to not disturb the rest of the environment for more information on this, see the slurm documentation for srun

Return type:

Tuple[str, List[str]]

Returns:

the formatted string of environment variables

format_env_vars() List[str][source]#

Build bash compatible environment variable string for Slurm

Return type:

List[str]

Returns:

the formatted string of environment variables

format_run_args() List[str][source]#

Return a list of slurm formatted run arguments

Return type:

List[str]

Returns:

list of slurm arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) None[source]#

Make a mpmd workload by combining two srun commands

This connects the two settings to be executed with a single Model instance

Parameters:

settings (RunSettings) – SrunSettings instance

Return type:

None

reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({'D', 'chdir'})#
property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:

attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:

launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters:
  • arg (str) – name of the argument

  • value (Optional[str], default: None) – value of the argument

  • conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) None#

Set binding

Parameters:

binding (str) – Binding

Return type:

None

set_broadcast(dest_path: str | None = None) None[source]#

Copy executable file to allocated compute nodes

This sets --bcast

Parameters:

dest_path (Optional[str], default: None) – Path to copy an executable file

Return type:

None

set_cpu_bindings(bindings: int | List[int]) None[source]#

Bind by setting CPU masks on tasks

This sets --cpu-bind using the map_cpu:<list> option

Parameters:

bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound

Return type:

None

set_cpus_per_task(cpus_per_task: int) None[source]#

Set the number of cpus to use per task

This sets --cpus-per-task

Parameters:

num_cpus – number of cpus to use per task

Return type:

None

set_excluded_hosts(host_list: str | List[str]) None[source]#

Specify a list of hosts to exclude for launching this job

Parameters:

host_list (Union[str, List[str]]) – hosts to exclude

Raises:

TypeError

Return type:

None

set_het_group(het_group: Iterable[int]) None[source]#

Set the heterogeneous group for this job

this sets –het-group

Parameters:

het_group (Iterable[int]) – list of heterogeneous groups

Return type:

None

set_hostlist(host_list: str | List[str]) None[source]#

Specify the hostlist for this job

This sets --nodelist

Parameters:

host_list (Union[str, List[str]]) – hosts to launch on

Raises:

TypeError – if not str or list of str

Return type:

None

set_hostlist_from_file(file_path: str) None[source]#

Use the contents of a file to set the node list

This sets --nodefile

Parameters:

file_path (str) – Path to the hostlist file

Return type:

None

set_memory_per_node(memory_per_node: int) None[source]#

Specify the real memory required per node

This sets --mem in megabytes

Parameters:

memory_per_node (int) – Amount of memory per node in megabytes

Return type:

None

set_mpmd_preamble(preamble_lines: List[str]) None#

Set preamble to a file to make a job MPMD

Parameters:

preamble_lines (List[str]) – lines to put at the beginning of a file.

Return type:

None

set_node_feature(feature_list: str | List[str]) None[source]#

Specify the node feature for this job

This sets -C

Parameters:

feature_list (Union[str, List[str]]) – node feature to launch on

Raises:

TypeError – if not str or list of str

Return type:

None

set_nodes(nodes: int) None[source]#

Set the number of nodes

Effectively this is setting: srun --nodes <num_nodes>

Parameters:

nodes (int) – number of nodes to run with

Return type:

None

set_quiet_launch(quiet: bool) None[source]#

Set the job to run in quiet mode

This sets --quiet

Parameters:

quiet (bool) – Whether the job should be run quietly

Return type:

None

set_task_map(task_mapping: str) None#

Set a task mapping

Parameters:

task_mapping (str) – task mapping

Return type:

None

set_tasks(tasks: int) None[source]#

Set the number of tasks for this job

This sets --ntasks

Parameters:

tasks (int) – number of tasks

Return type:

None

set_tasks_per_node(tasks_per_node: int) None[source]#

Set the number of tasks for this job

This sets --ntasks-per-node

Parameters:

tasks_per_node (int) – number of tasks per node

Return type:

None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None#

Automatically format and set wall time

Parameters:
  • hours (int, default: 0) – number of hours to run job

  • minutes (int, default: 0) – number of minutes to run job

  • seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) None[source]#

Set the job to run in verbose mode

This sets --verbose

Parameters:

verbose (bool) – Whether the job should be run verbosely

Return type:

None

set_walltime(walltime: str) None[source]#

Set the walltime of the job

format = “HH:MM:SS”

Parameters:

walltime (str) – wall time

Return type:

None

update_env(env_vars: Dict[str, str | int | float | bool]) None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:

env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises:

TypeError – if env_vars values cannot be coerced to strings

Return type:

None

AprunSettings#

AprunSettings can be used on any system that supports the Cray ALPS layer. SmartSim supports using AprunSettings on PBSPro WLM systems.

AprunSettings can be used in interactive session (on allocation) and within batch launches (e.g., QsubBatchSettings)

AprunSettings.set_cpus_per_task(cpus_per_task)

Set the number of cpus to use per task

AprunSettings.set_hostlist(host_list)

Specify the hostlist for this job

AprunSettings.set_tasks(tasks)

Set the number of tasks for this job

AprunSettings.set_tasks_per_node(tasks_per_node)

Set the number of tasks for this job

AprunSettings.make_mpmd(settings)

Make job an MPMD job

AprunSettings.add_exe_args(args)

Add executable arguments to executable

AprunSettings.format_run_args()

Return a list of ALPS formatted run arguments

AprunSettings.format_env_vars()

Format the environment variables for aprun

AprunSettings.update_env(env_vars)

Update the job environment variables

class AprunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any)[source]#

Settings to run job with aprun command

AprunSettings can be used for the pbs launcher.

Parameters:
  • exe (str) – executable

  • exe_args (Union[str, List[str], None], default: None) – executable arguments

  • run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command

  • env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) None#

Add executable arguments to executable

Parameters:

args (Union[str, List[str]]) – executable arguments

Return type:

None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:

attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:

attached executable arguments

format_env_vars() List[str][source]#

Format the environment variables for aprun

Return type:

List[str]

Returns:

list of env vars

format_run_args() List[str][source]#

Return a list of ALPS formatted run arguments

Return type:

List[str]

Returns:

list of ALPS arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) None[source]#

Make job an MPMD job

This method combines two AprunSettings into a single MPMD command joined with ‘:’

Parameters:

settings (RunSettings) – AprunSettings instance

Return type:

None

reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({})#
property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:

attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:

launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters:
  • arg (str) – name of the argument

  • value (Optional[str], default: None) – value of the argument

  • conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) None#

Set binding

Parameters:

binding (str) – Binding

Return type:

None

set_broadcast(dest_path: str | None = None) None#

Copy executable file to allocated compute nodes

Parameters:

dest_path (Optional[str], default: None) – Path to copy an executable file

Return type:

None

set_cpu_bindings(bindings: int | List[int]) None[source]#

Specifies the cores to which MPI processes are bound

This sets --cpu-binding

Parameters:

bindings (Union[int, List[int]]) – List of cpu numbers

Return type:

None

set_cpus_per_task(cpus_per_task: int) None[source]#

Set the number of cpus to use per task

This sets --cpus-per-pe

Parameters:

cpus_per_task (int) – number of cpus to use per task

Return type:

None

set_excluded_hosts(host_list: str | List[str]) None[source]#

Specify a list of hosts to exclude for launching this job

Parameters:

host_list (Union[str, List[str]]) – hosts to exclude

Raises:

TypeError – if not str or list of str

Return type:

None

set_hostlist(host_list: str | List[str]) None[source]#

Specify the hostlist for this job

Parameters:

host_list (Union[str, List[str]]) – hosts to launch on

Raises:

TypeError – if not str or list of str

Return type:

None

set_hostlist_from_file(file_path: str) None[source]#

Use the contents of a file to set the node list

This sets --node-list-file

Parameters:

file_path (str) – Path to the hostlist file

Return type:

None

set_memory_per_node(memory_per_node: int) None[source]#

Specify the real memory required per node

This sets --memory-per-pe in megabytes

Parameters:

memory_per_node (int) – Per PE memory limit in megabytes

Return type:

None

set_mpmd_preamble(preamble_lines: List[str]) None#

Set preamble to a file to make a job MPMD

Parameters:

preamble_lines (List[str]) – lines to put at the beginning of a file.

Return type:

None

set_node_feature(feature_list: str | List[str]) None#

Specify the node feature for this job

Parameters:

feature_list (Union[str, List[str]]) – node feature to launch on

Return type:

None

set_nodes(nodes: int) None#

Set the number of nodes

Parameters:

nodes (int) – number of nodes to run with

Return type:

None

set_quiet_launch(quiet: bool) None[source]#

Set the job to run in quiet mode

This sets --quiet

Parameters:

quiet (bool) – Whether the job should be run quietly

Return type:

None

set_task_map(task_mapping: str) None#

Set a task mapping

Parameters:

task_mapping (str) – task mapping

Return type:

None

set_tasks(tasks: int) None[source]#

Set the number of tasks for this job

This sets --pes

Parameters:

tasks (int) – number of tasks

Return type:

None

set_tasks_per_node(tasks_per_node: int) None[source]#

Set the number of tasks for this job

This sets --pes-per-node

Parameters:

tasks_per_node (int) – number of tasks per node

Return type:

None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None#

Automatically format and set wall time

Parameters:
  • hours (int, default: 0) – number of hours to run job

  • minutes (int, default: 0) – number of minutes to run job

  • seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) None[source]#

Set the job to run in verbose mode

This sets --debug arg to the highest level

Parameters:

verbose (bool) – Whether the job should be run verbosely

Return type:

None

set_walltime(walltime: str) None[source]#

Set the walltime of the job

Walltime is given in total number of seconds

Parameters:

walltime (str) – wall time

Return type:

None

update_env(env_vars: Dict[str, str | int | float | bool]) None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:

env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises:

TypeError – if env_vars values cannot be coerced to strings

Return type:

None

DragonRunSettings#

DragonRunSettings can be used on systems that support Slurm or PBS, if Dragon is available in the Python environment (see _dragon_install for instructions on how to install it through smart).

DragonRunSettings can be used in interactive sessions (on allcation) and within batch launches (i.e. SbatchSettings or QsubBatchSettings, for Slurm and PBS sessions, respectively).

DragonRunSettings.set_nodes(nodes)

Set the number of nodes

DragonRunSettings.set_tasks_per_node(...)

Set the number of tasks for this job

class DragonRunSettings(exe: str, exe_args: str | List[str] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None[source]#

Initialize run parameters for a Dragon process

DragonRunSettings should only be used on systems where Dragon is available and installed in the current environment.

If an allocation is specified, the instance receiving these run parameters will launch on that allocation.

Parameters:
  • exe (str) – executable to run

  • exe_args (Union[str, List[str], None], default: None) – executable arguments, defaults to None

  • env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment variables for job, defaults to None

  • alloc – allocation ID if running on existing alloc, defaults to None

add_exe_args(args: str | List[str]) None#

Add executable arguments to executable

Parameters:

args (Union[str, List[str]]) – executable arguments

Return type:

None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:

attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:

attached executable arguments

format_env_vars() List[str]#

Build environment variable string

Return type:

List[str]

Returns:

formatted list of strings to export variables

format_run_args() List[str]#

Return formatted run arguments

For RunSettings, the run arguments are passed literally with no formatting.

Return type:

List[str]

Returns:

list run arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) None#

Make job an MPMD job

Parameters:

settings (RunSettings) – RunSettings instance

Return type:

None

reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({})#
property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:

attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:

launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters:
  • arg (str) – name of the argument

  • value (Optional[str], default: None) – value of the argument

  • conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) None#

Set binding

Parameters:

binding (str) – Binding

Return type:

None

set_broadcast(dest_path: str | None = None) None#

Copy executable file to allocated compute nodes

Parameters:

dest_path (Optional[str], default: None) – Path to copy an executable file

Return type:

None

set_cpu_affinity(devices: List[int]) None[source]#

Set the CPU affinity for this job

Parameters:

devices (List[int]) – list of CPU indices to execute on

Return type:

None

set_cpu_bindings(bindings: int | List[int]) None#

Set the cores to which MPI processes are bound

Parameters:

bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound

Return type:

None

set_cpus_per_task(cpus_per_task: int) None#

Set the number of cpus per task

Parameters:

cpus_per_task (int) – number of cpus per task

Return type:

None

set_excluded_hosts(host_list: str | List[str]) None#

Specify a list of hosts to exclude for launching this job

Parameters:

host_list (Union[str, List[str]]) – hosts to exclude

Return type:

None

set_gpu_affinity(devices: List[int]) None[source]#

Set the GPU affinity for this job

Parameters:

devices (List[int]) – list of GPU indices to execute on.

Return type:

None

set_hostlist(host_list: str | List[str]) None#

Specify the hostlist for this job

Parameters:

host_list (Union[str, List[str]]) – hosts to launch on

Return type:

None

set_hostlist_from_file(file_path: str) None#

Use the contents of a file to specify the hostlist for this job

Parameters:

file_path (str) – Path to the hostlist file

Return type:

None

set_memory_per_node(memory_per_node: int) None#

Set the amount of memory required per node in megabytes

Parameters:

memory_per_node (int) – Number of megabytes per node

Return type:

None

set_mpmd_preamble(preamble_lines: List[str]) None#

Set preamble to a file to make a job MPMD

Parameters:

preamble_lines (List[str]) – lines to put at the beginning of a file.

Return type:

None

set_node_feature(feature_list: str | List[str]) None[source]#

Specify the node feature for this job

Parameters:

feature_list (Union[str, List[str]]) – a collection of strings representing the required node features. Currently supported node features are: “gpu”

Return type:

None

set_nodes(nodes: int) None[source]#

Set the number of nodes

Parameters:

nodes (int) – number of nodes to run with

Return type:

None

set_quiet_launch(quiet: bool) None#

Set the job to run in quiet mode

Parameters:

quiet (bool) – Whether the job should be run quietly

Return type:

None

set_task_map(task_mapping: str) None#

Set a task mapping

Parameters:

task_mapping (str) – task mapping

Return type:

None

set_tasks(tasks: int) None#

Set the number of tasks to launch

Parameters:

tasks (int) – number of tasks to launch

Return type:

None

set_tasks_per_node(tasks_per_node: int) None[source]#

Set the number of tasks for this job

Parameters:

tasks_per_node (int) – number of tasks per node

Return type:

None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None#

Automatically format and set wall time

Parameters:
  • hours (int, default: 0) – number of hours to run job

  • minutes (int, default: 0) – number of minutes to run job

  • seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) None#

Set the job to run in verbose mode

Parameters:

verbose (bool) – Whether the job should be run verbosely

Return type:

None

set_walltime(walltime: str) None#

Set the formatted walltime

Parameters:

walltime (str) – Time in format required by launcher``

Return type:

None

update_env(env_vars: Dict[str, str | int | float | bool]) None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:

env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises:

TypeError – if env_vars values cannot be coerced to strings

Return type:

None

MpirunSettings#

MpirunSettings are for launching with OpenMPI. MpirunSettings are supported on Slurm and PBSpro.

MpirunSettings.set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

MpirunSettings.set_hostlist(host_list)

Set the hostlist for the mpirun command

MpirunSettings.set_tasks(tasks)

Set the number of tasks for this job

MpirunSettings.set_task_map(task_mapping)

Set mpirun task mapping

MpirunSettings.make_mpmd(settings)

Make a mpmd workload by combining two mpirun commands

MpirunSettings.add_exe_args(args)

Add executable arguments to executable

MpirunSettings.format_run_args()

Return a list of MPI-standard formatted run arguments

MpirunSettings.format_env_vars()

Format the environment variables for mpirun

MpirunSettings.update_env(env_vars)

Update the job environment variables

class MpirunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None[source]#

Settings to run job with mpirun command (MPI-standard)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into mpirun arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters:
  • exe (str) – executable

  • exe_args (Union[str, List[str], None], default: None) – executable arguments

  • run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command

  • env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) None#

Add executable arguments to executable

Parameters:

args (Union[str, List[str]]) – executable arguments

Return type:

None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:

attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:

attached executable arguments

format_env_vars() List[str]#

Format the environment variables for mpirun

Return type:

List[str]

Returns:

list of env vars

format_run_args() List[str]#

Return a list of MPI-standard formatted run arguments

Return type:

List[str]

Returns:

list of MPI-standard arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) None#

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters:

settings (RunSettings) – MpirunSettings instance

Return type:

None

reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({'wd', 'wdir'})#
property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:

attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:

launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters:
  • arg (str) – name of the argument

  • value (Optional[str], default: None) – value of the argument

  • conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) None#

Set binding

Parameters:

binding (str) – Binding

Return type:

None

set_broadcast(dest_path: str | None = None) None#

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters:

dest_path (Optional[str], default: None) – Destination path (Ignored)

Return type:

None

set_cpu_binding_type(bind_type: str) None#

Specifies the cores to which MPI processes are bound

This sets --bind-to for MPI compliant implementations

Parameters:

bind_type (str) – binding type

Return type:

None

set_cpu_bindings(bindings: int | List[int]) None#

Set the cores to which MPI processes are bound

Parameters:

bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound

Return type:

None

set_cpus_per_task(cpus_per_task: int) None#

Set the number of tasks for this job

This sets --cpus-per-proc for MPI compliant implementations

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters:

cpus_per_task (int) – number of tasks

Return type:

None

set_excluded_hosts(host_list: str | List[str]) None#

Specify a list of hosts to exclude for launching this job

Parameters:

host_list (Union[str, List[str]]) – hosts to exclude

Return type:

None

set_hostlist(host_list: str | List[str]) None#

Set the hostlist for the mpirun command

This sets --host

Parameters:

host_list (Union[str, List[str]]) – list of host names

Raises:

TypeError – if not str or list of str

Return type:

None

set_hostlist_from_file(file_path: str) None#

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters:

file_path (str) – Path to the hostlist file

Return type:

None

set_memory_per_node(memory_per_node: int) None#

Set the amount of memory required per node in megabytes

Parameters:

memory_per_node (int) – Number of megabytes per node

Return type:

None

set_mpmd_preamble(preamble_lines: List[str]) None#

Set preamble to a file to make a job MPMD

Parameters:

preamble_lines (List[str]) – lines to put at the beginning of a file.

Return type:

None

set_node_feature(feature_list: str | List[str]) None#

Specify the node feature for this job

Parameters:

feature_list (Union[str, List[str]]) – node feature to launch on

Return type:

None

set_nodes(nodes: int) None#

Set the number of nodes

Parameters:

nodes (int) – number of nodes to run with

Return type:

None

set_quiet_launch(quiet: bool) None#

Set the job to run in quiet mode

This sets --quiet

Parameters:

quiet (bool) – Whether the job should be run quietly

Return type:

None

set_task_map(task_mapping: str) None#

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters:

task_mapping (str) – task mapping

Return type:

None

set_tasks(tasks: int) None#

Set the number of tasks for this job

This sets -n for MPI compliant implementations

Parameters:

tasks (int) – number of tasks

Return type:

None

set_tasks_per_node(tasks_per_node: int) None#

Set the number of tasks per node

Parameters:

tasks_per_node (int) – number of tasks to launch per node

Return type:

None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None#

Automatically format and set wall time

Parameters:
  • hours (int, default: 0) – number of hours to run job

  • minutes (int, default: 0) – number of minutes to run job

  • seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) None#

Set the job to run in verbose mode

This sets --verbose

Parameters:

verbose (bool) – Whether the job should be run verbosely

Return type:

None

set_walltime(walltime: str) None#

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters:

walltime (str) – number like string of seconds that a job will run in secs

Return type:

None

update_env(env_vars: Dict[str, str | int | float | bool]) None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:

env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises:

TypeError – if env_vars values cannot be coerced to strings

Return type:

None

MpiexecSettings#

MpiexecSettings are for launching with OpenMPI’s mpiexec. MpirunSettings are supported on Slurm and PBSpro.

MpiexecSettings.set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

MpiexecSettings.set_hostlist(host_list)

Set the hostlist for the mpirun command

MpiexecSettings.set_tasks(tasks)

Set the number of tasks for this job

MpiexecSettings.set_task_map(task_mapping)

Set mpirun task mapping

MpiexecSettings.make_mpmd(settings)

Make a mpmd workload by combining two mpirun commands

MpiexecSettings.add_exe_args(args)

Add executable arguments to executable

MpiexecSettings.format_run_args()

Return a list of MPI-standard formatted run arguments

MpiexecSettings.format_env_vars()

Format the environment variables for mpirun

MpiexecSettings.update_env(env_vars)

Update the job environment variables

class MpiexecSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None[source]#

Settings to run job with mpiexec command (MPI-standard)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into mpiexec arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters:
  • exe (str) – executable

  • exe_args (Union[str, List[str], None], default: None) – executable arguments

  • run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command

  • env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) None#

Add executable arguments to executable

Parameters:

args (Union[str, List[str]]) – executable arguments

Return type:

None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:

attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:

attached executable arguments

format_env_vars() List[str]#

Format the environment variables for mpirun

Return type:

List[str]

Returns:

list of env vars

format_run_args() List[str]#

Return a list of MPI-standard formatted run arguments

Return type:

List[str]

Returns:

list of MPI-standard arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) None#

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters:

settings (RunSettings) – MpirunSettings instance

Return type:

None

reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({'wd', 'wdir'})#
property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:

attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:

launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters:
  • arg (str) – name of the argument

  • value (Optional[str], default: None) – value of the argument

  • conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) None#

Set binding

Parameters:

binding (str) – Binding

Return type:

None

set_broadcast(dest_path: str | None = None) None#

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters:

dest_path (Optional[str], default: None) – Destination path (Ignored)

Return type:

None

set_cpu_binding_type(bind_type: str) None#

Specifies the cores to which MPI processes are bound

This sets --bind-to for MPI compliant implementations

Parameters:

bind_type (str) – binding type

Return type:

None

set_cpu_bindings(bindings: int | List[int]) None#

Set the cores to which MPI processes are bound

Parameters:

bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound

Return type:

None

set_cpus_per_task(cpus_per_task: int) None#

Set the number of tasks for this job

This sets --cpus-per-proc for MPI compliant implementations

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters:

cpus_per_task (int) – number of tasks

Return type:

None

set_excluded_hosts(host_list: str | List[str]) None#

Specify a list of hosts to exclude for launching this job

Parameters:

host_list (Union[str, List[str]]) – hosts to exclude

Return type:

None

set_hostlist(host_list: str | List[str]) None#

Set the hostlist for the mpirun command

This sets --host

Parameters:

host_list (Union[str, List[str]]) – list of host names

Raises:

TypeError – if not str or list of str

Return type:

None

set_hostlist_from_file(file_path: str) None#

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters:

file_path (str) – Path to the hostlist file

Return type:

None

set_memory_per_node(memory_per_node: int) None#

Set the amount of memory required per node in megabytes

Parameters:

memory_per_node (int) – Number of megabytes per node

Return type:

None

set_mpmd_preamble(preamble_lines: List[str]) None#

Set preamble to a file to make a job MPMD

Parameters:

preamble_lines (List[str]) – lines to put at the beginning of a file.

Return type:

None

set_node_feature(feature_list: str | List[str]) None#

Specify the node feature for this job

Parameters:

feature_list (Union[str, List[str]]) – node feature to launch on

Return type:

None

set_nodes(nodes: int) None#

Set the number of nodes

Parameters:

nodes (int) – number of nodes to run with

Return type:

None

set_quiet_launch(quiet: bool) None#

Set the job to run in quiet mode

This sets --quiet

Parameters:

quiet (bool) – Whether the job should be run quietly

Return type:

None

set_task_map(task_mapping: str) None#

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters:

task_mapping (str) – task mapping

Return type:

None

set_tasks(tasks: int) None#

Set the number of tasks for this job

This sets -n for MPI compliant implementations

Parameters:

tasks (int) – number of tasks

Return type:

None

set_tasks_per_node(tasks_per_node: int) None#

Set the number of tasks per node

Parameters:

tasks_per_node (int) – number of tasks to launch per node

Return type:

None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None#

Automatically format and set wall time

Parameters:
  • hours (int, default: 0) – number of hours to run job

  • minutes (int, default: 0) – number of minutes to run job

  • seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) None#

Set the job to run in verbose mode

This sets --verbose

Parameters:

verbose (bool) – Whether the job should be run verbosely

Return type:

None

set_walltime(walltime: str) None#

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters:

walltime (str) – number like string of seconds that a job will run in secs

Return type:

None

update_env(env_vars: Dict[str, str | int | float | bool]) None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:

env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises:

TypeError – if env_vars values cannot be coerced to strings

Return type:

None

OrterunSettings#

OrterunSettings are for launching with OpenMPI’s orterun. OrterunSettings are supported on Slurm and PBSpro.

OrterunSettings.set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

OrterunSettings.set_hostlist(host_list)

Set the hostlist for the mpirun command

OrterunSettings.set_tasks(tasks)

Set the number of tasks for this job

OrterunSettings.set_task_map(task_mapping)

Set mpirun task mapping

OrterunSettings.make_mpmd(settings)

Make a mpmd workload by combining two mpirun commands

OrterunSettings.add_exe_args(args)

Add executable arguments to executable

OrterunSettings.format_run_args()

Return a list of MPI-standard formatted run arguments

OrterunSettings.format_env_vars()

Format the environment variables for mpirun

OrterunSettings.update_env(env_vars)

Update the job environment variables

class OrterunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None[source]#

Settings to run job with orterun command (MPI-standard)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into orterun arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters:
  • exe (str) – executable

  • exe_args (Union[str, List[str], None], default: None) – executable arguments

  • run_args (Optional[Dict[str, Union[int, str, float, None]]], default: None) – arguments for run command

  • env_vars (Optional[Dict[str, Optional[str]]], default: None) – environment vars to launch job with

add_exe_args(args: str | List[str]) None#

Add executable arguments to executable

Parameters:

args (Union[str, List[str]]) – executable arguments

Return type:

None

colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
property env_vars: Dict[str, str | None]#

Return an immutable list of attached environment variables.

Returns:

attached environment variables

property exe_args: str | List[str]#

Return an immutable list of attached executable arguments.

Returns:

attached executable arguments

format_env_vars() List[str]#

Format the environment variables for mpirun

Return type:

List[str]

Returns:

list of env vars

format_run_args() List[str]#

Return a list of MPI-standard formatted run arguments

Return type:

List[str]

Returns:

list of MPI-standard arguments for these settings

make_mpmd(settings: smartsim.settings.base.RunSettings) None#

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters:

settings (RunSettings) – MpirunSettings instance

Return type:

None

reserved_run_args: t.ClassVar[frozenset[str]] = frozenset({'wd', 'wdir'})#
property run_args: Dict[str, int | str | float | None]#

Return an immutable list of attached run arguments.

Returns:

attached run arguments

property run_command: str | None#

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns:

launch binary e.g. mpiexec

set(arg: str, value: str | None = None, condition: bool = True) None#

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff
  socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters:
  • arg (str) – name of the argument

  • value (Optional[str], default: None) – value of the argument

  • conditon – set the argument if condition evaluates to True

Return type:

None

set_binding(binding: str) None#

Set binding

Parameters:

binding (str) – Binding

Return type:

None

set_broadcast(dest_path: str | None = None) None#

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters:

dest_path (Optional[str], default: None) – Destination path (Ignored)

Return type:

None

set_cpu_binding_type(bind_type: str) None#

Specifies the cores to which MPI processes are bound

This sets --bind-to for MPI compliant implementations

Parameters:

bind_type (str) – binding type

Return type:

None

set_cpu_bindings(bindings: int | List[int]) None#

Set the cores to which MPI processes are bound

Parameters:

bindings (Union[int, List[int]]) – List specifing the cores to which MPI processes are bound

Return type:

None

set_cpus_per_task(cpus_per_task: int) None#

Set the number of tasks for this job

This sets --cpus-per-proc for MPI compliant implementations

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters:

cpus_per_task (int) – number of tasks

Return type:

None

set_excluded_hosts(host_list: str | List[str]) None#

Specify a list of hosts to exclude for launching this job

Parameters:

host_list (Union[str, List[str]]) – hosts to exclude

Return type:

None

set_hostlist(host_list: str | List[str]) None#

Set the hostlist for the mpirun command

This sets --host

Parameters:

host_list (Union[str, List[str]]) – list of host names

Raises:

TypeError – if not str or list of str

Return type:

None

set_hostlist_from_file(file_path: str) None#

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters:

file_path (str) – Path to the hostlist file

Return type:

None

set_memory_per_node(memory_per_node: int) None#

Set the amount of memory required per node in megabytes

Parameters:

memory_per_node (int) – Number of megabytes per node

Return type:

None

set_mpmd_preamble(preamble_lines: List[str]) None#

Set preamble to a file to make a job MPMD

Parameters:

preamble_lines (List[str]) – lines to put at the beginning of a file.

Return type:

None

set_node_feature(feature_list: str | List[str]) None#

Specify the node feature for this job

Parameters:

feature_list (Union[str, List[str]]) – node feature to launch on

Return type:

None

set_nodes(nodes: int) None#

Set the number of nodes

Parameters:

nodes (int) – number of nodes to run with

Return type:

None

set_quiet_launch(quiet: bool) None#

Set the job to run in quiet mode

This sets --quiet

Parameters:

quiet (bool) – Whether the job should be run quietly

Return type:

None

set_task_map(task_mapping: str) None#

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters:

task_mapping (str) – task mapping

Return type:

None

set_tasks(tasks: int) None#

Set the number of tasks for this job

This sets -n for MPI compliant implementations

Parameters:

tasks (int) – number of tasks

Return type:

None

set_tasks_per_node(tasks_per_node: int) None#

Set the number of tasks per node

Parameters:

tasks_per_node (int) – number of tasks to launch per node

Return type:

None

set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None#

Automatically format and set wall time

Parameters:
  • hours (int, default: 0) – number of hours to run job

  • minutes (int, default: 0) – number of minutes to run job

  • seconds (int, default: 0) – number of seconds to run job

Return type:

None

set_verbose_launch(verbose: bool) None#

Set the job to run in verbose mode

This sets --verbose

Parameters:

verbose (bool) – Whether the job should be run verbosely

Return type:

None

set_walltime(walltime: str) None#

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters:

walltime (str) – number like string of seconds that a job will run in secs

Return type:

None

update_env(env_vars: Dict[str, str | int | float | bool]) None#

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters:

env_vars (Dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises:

TypeError – if env_vars values cannot be coerced to strings

Return type:

None


SbatchSettings#

SbatchSettings are used for launching batches onto Slurm WLM systems.

SbatchSettings.set_account(account)

Set the account for this batch job

SbatchSettings.set_batch_command(command)

Set the command used to launch the batch e.g.

SbatchSettings.set_nodes(num_nodes)

Set the number of nodes for this batch job

SbatchSettings.set_hostlist(host_list)

Specify the hostlist for this job

SbatchSettings.set_partition(partition)

Set the partition for the batch job

SbatchSettings.set_queue(queue)

alias for set_partition

SbatchSettings.set_walltime(walltime)

Set the walltime of the job

SbatchSettings.format_batch_args()

Get the formatted batch arguments for a preview

class SbatchSettings(nodes: int | None = None, time: str = '', account: str | None = None, batch_args: Dict[str, str | None] | None = None, **kwargs: Any) None[source]#

Specify run parameters for a Slurm batch job

Slurm sbatch arguments can be written into batch_args as a dictionary. e.g. {‘ntasks’: 1}

If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}

Initialization values provided (nodes, time, account) will overwrite the same arguments in batch_args if present

Parameters:
  • nodes (Optional[int], default: None) – number of nodes

  • time (str, default: '') – walltime for job, e.g. “10:00:00” for 10 hours

  • account (Optional[str], default: None) – account for job

  • batch_args (Optional[Dict[str, Optional[str]]], default: None) – extra batch arguments

add_preamble(lines: List[str]) None#

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters:

line – lines to add to preamble.

Return type:

None

property batch_args: Dict[str, str | None]#

Retrieve attached batch arguments

Returns:

attached batch arguments

property batch_cmd: str#

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns:

batch command

format_batch_args() List[str][source]#

Get the formatted batch arguments for a preview

Return type:

List[str]

Returns:

batch arguments for Sbatch

property preamble: Iterable[str]#

Return an iterable of preamble clauses to be prepended to the batch file

Returns:

attached preamble clauses

set_account(account: str) None[source]#

Set the account for this batch job

Parameters:

account (str) – account id

Return type:

None

set_batch_command(command: str) None#

Set the command used to launch the batch e.g. sbatch

Parameters:

command (str) – batch command

Return type:

None

set_cpus_per_task(cpus_per_task: int) None[source]#

Set the number of cpus to use per task

This sets --cpus-per-task

Parameters:

num_cpus – number of cpus to use per task

Return type:

None

set_hostlist(host_list: str | List[str]) None[source]#

Specify the hostlist for this job

Parameters:

host_list (Union[str, List[str]]) – hosts to launch on

Raises:

TypeError – if not str or list of str

Return type:

None

set_nodes(num_nodes: int) None[source]#

Set the number of nodes for this batch job

Parameters:

num_nodes (int) – number of nodes

Return type:

None

set_partition(partition: str) None[source]#

Set the partition for the batch job

Parameters:

partition (str) – partition name

Return type:

None

set_queue(queue: str) None[source]#

alias for set_partition

Sets the partition for the slurm batch job

Parameters:

queue (str) – the partition to run the batch job on

Return type:

None

set_walltime(walltime: str) None[source]#

Set the walltime of the job

format = “HH:MM:SS”

Parameters:

walltime (str) – wall time

Return type:

None

QsubBatchSettings#

QsubBatchSettings are used to configure jobs that should be launched as a batch on PBSPro systems.

QsubBatchSettings.set_account(account)

Set the account for this batch job

QsubBatchSettings.set_batch_command(command)

Set the command used to launch the batch e.g.

QsubBatchSettings.set_nodes(num_nodes)

Set the number of nodes for this batch job

QsubBatchSettings.set_ncpus(num_cpus)

Set the number of cpus obtained in each node.

QsubBatchSettings.set_queue(queue)

Set the queue for the batch job

QsubBatchSettings.set_resource(...)

Set a resource value for the Qsub batch

QsubBatchSettings.set_walltime(walltime)

Set the walltime of the job

QsubBatchSettings.format_batch_args()

Get the formatted batch arguments for a preview

class QsubBatchSettings(nodes: int | None = None, ncpus: int | None = None, time: str | None = None, queue: str | None = None, account: str | None = None, resources: Dict[str, str | int] | None = None, batch_args: Dict[str, str | None] | None = None, **kwargs: Any)[source]#

Specify qsub batch parameters for a job

nodes, and ncpus are used to create the select statement for PBS if a select statement is not included in the resources. If both are supplied the value for select statement supplied in resources will override.

Parameters:
  • nodes (Optional[int], default: None) – number of nodes for batch

  • ncpus (Optional[int], default: None) – number of cpus per node

  • time (Optional[str], default: None) – walltime for batch job

  • queue (Optional[str], default: None) – queue to run batch in

  • account (Optional[str], default: None) – account for batch launch

  • resources (Optional[Dict[str, Union[str, int]]], default: None) – overrides for resource arguments

  • batch_args (Optional[Dict[str, Optional[str]]], default: None) – overrides for PBS batch arguments

add_preamble(lines: List[str]) None#

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters:

line – lines to add to preamble.

Return type:

None

property batch_args: Dict[str, str | None]#

Retrieve attached batch arguments

Returns:

attached batch arguments

property batch_cmd: str#

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns:

batch command

format_batch_args() List[str][source]#

Get the formatted batch arguments for a preview

Return type:

List[str]

Returns:

batch arguments for Qsub

Raises:

ValueError – if options are supplied without values

property preamble: Iterable[str]#

Return an iterable of preamble clauses to be prepended to the batch file

Returns:

attached preamble clauses

property resources: Dict[str, str | int]#
set_account(account: str) None[source]#

Set the account for this batch job

Parameters:

acct – account id

Return type:

None

set_batch_command(command: str) None#

Set the command used to launch the batch e.g. sbatch

Parameters:

command (str) – batch command

Return type:

None

set_hostlist(host_list: str | List[str]) None[source]#

Specify the hostlist for this job

Parameters:

host_list (Union[str, List[str]]) – hosts to launch on

Raises:

TypeError – if not str or list of str

Return type:

None

set_ncpus(num_cpus: int | str) None[source]#

Set the number of cpus obtained in each node.

If a select argument is provided in QsubBatchSettings.resources, then this value will be overridden

Parameters:

num_cpus (Union[int, str]) – number of cpus per node in select

Return type:

None

set_nodes(num_nodes: int) None[source]#

Set the number of nodes for this batch job

In PBS, ‘select’ is the more primitive way of describing how many nodes to allocate for the job. ‘nodes’ is equivalent to ‘select’ with a ‘place’ statement. Assuming that only advanced users would use ‘set_resource’ instead, defining the number of nodes here is sets the ‘nodes’ resource.

Parameters:

num_nodes (int) – number of nodes

Return type:

None

set_queue(queue: str) None[source]#

Set the queue for the batch job

Parameters:

queue (str) – queue name

Return type:

None

set_resource(resource_name: str, value: str | int) None[source]#

Set a resource value for the Qsub batch

If a select statement is provided, the nodes and ncpus arguments will be overridden. Likewise for Walltime

Parameters:
  • resource_name (str) – name of resource, e.g. walltime

  • value (Union[str, int]) – value

Return type:

None

set_walltime(walltime: str) None[source]#

Set the walltime of the job

format = “HH:MM:SS”

If a walltime argument is provided in QsubBatchSettings.resources, then this value will be overridden

Parameters:

walltime (str) – wall time

Return type:

None

Singularity#

Singularity is a type of Container that can be passed to a RunSettings class or child class to enable running the workload in a container.

class Singularity(*args: Any, **kwargs: Any) None[source]#

Singularity (apptainer) container type. To be passed into a RunSettings class initializer or Experiment.create_run_settings.

Note

Singularity integration is currently tested with Apptainer 1.0 with slurm and PBS workload managers only.

Also, note that user-defined bind paths (mount argument) may be disabled by a system administrator

Parameters:
  • image – local or remote path to container image, e.g. docker://sylabsio/lolcow

  • args (Any) – arguments to ‘singularity exec’ command

  • mount – paths to mount (bind) from host machine into image.

Orchestrator#

Orchestrator#

Model#

Model.__init__(name, params, run_settings[, ...])

Initialize a Model

Model.attach_generator_files([to_copy, ...])

Attach files to an entity for generation

Model.colocate_db(*args, **kwargs)

An alias for Model.colocate_db_tcp

Model.colocate_db_tcp([port, ifname, ...])

Colocate an Orchestrator instance with this Model over TCP/IP.

Model.colocate_db_uds([unix_socket, ...])

Colocate an Orchestrator instance with this Model over UDS.

Model.colocated

Return True if this Model will run with a colocated Orchestrator

Model.add_ml_model(name, backend[, model, ...])

A TF, TF-lite, PT, or ONNX model to load into the DB at runtime

Model.add_script(name[, script, ...])

TorchScript to launch with this Model instance

Model.add_function(name[, function, device, ...])

TorchScript function to launch with this Model instance

Model.params_to_args()

Convert parameters to command line arguments and update run settings.

Model.register_incoming_entity(incoming_entity)

Register future communication between entities.

Model.enable_key_prefixing()

If called, the entity will prefix its keys with its own model name

Model.disable_key_prefixing()

If called, the entity will not prefix its keys with its own model name

Model.query_key_prefixing()

Inquire as to whether this entity will prefix its keys with its name

Model#

class Model(name: str, params: Dict[str, str], run_settings: smartsim.settings.base.RunSettings, path: str | None = '/usr/local/src/SmartSim/doc', params_as_args: List[str] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None)[source]#

Bases: SmartSimEntity

Initialize a Model

Parameters:
  • name (str) – name of the model

  • params (Dict[str, str]) – model parameters for writing into configuration files or to be passed as command line arguments to executable.

  • path (Optional[str], default: '/usr/local/src/SmartSim/doc') – path to output, error, and configuration files

  • run_settings (RunSettings) – launcher settings specified in the experiment

  • params_as_args (Optional[List[str]], default: None) – list of parameters which have to be interpreted as command line arguments to be added to run_settings

  • batch_settings (Optional[BatchSettings], default: None) – Launcher settings for running the individual model as a batch job

add_function(name: str, function: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None[source]#

TorchScript function to launch with this Model instance

Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of this Model instance.

For converged orchestrators, the add_script() method should be used.

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the model being stored in the first N devices of type device.

Parameters:
  • name (str) – key to store function under

  • function (Optional[str], default: None) – TorchScript function code

  • device (str, default: 'CPU') – device for script execution

  • devices_per_node (int, default: 1) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

  • first_device (int, default: 0) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

Return type:

None

add_ml_model(name: str, backend: str, model: bytes | None = None, model_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0, batch_size: int = 0, min_batch_size: int = 0, min_batch_timeout: int = 0, tag: str = '', inputs: List[str] | None = None, outputs: List[str] | None = None) None[source]#

A TF, TF-lite, PT, or ONNX model to load into the DB at runtime

Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance

One of either model (in memory representation) or model_path (file) must be provided

Parameters:
  • name (str) – key to store model under

  • backend (str) – name of the backend (TORCH, TF, TFLITE, ONNX)

  • model (Optional[bytes], default: None) – A model in memory (only supported for non-colocated orchestrators)

  • model_path (Optional[str], default: None) – serialized model

  • device (str, default: 'CPU') – name of device for execution

  • devices_per_node (int, default: 1) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

  • first_device (int, default: 0) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

  • batch_size (int, default: 0) – batch size for execution

  • min_batch_size (int, default: 0) – minimum batch size for model execution

  • min_batch_timeout (int, default: 0) – time to wait for minimum batch size

  • tag (str, default: '') – additional tag for model information

  • inputs (Optional[List[str]], default: None) – model inputs (TF only)

  • outputs (Optional[List[str]], default: None) – model outupts (TF only)

Return type:

None

add_script(name: str, script: str | None = None, script_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None[source]#

TorchScript to launch with this Model instance

Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the script being stored in the first N devices of type device; alternatively, setting first_device=M will result in the script being stored on nodes M through M + N - 1.

One of either script (in memory string representation) or script_path (file) must be provided

Parameters:
  • name (str) – key to store script under

  • script (Optional[str], default: None) – TorchScript code (only supported for non-colocated orchestrators)

  • script_path (Optional[str], default: None) – path to TorchScript code

  • device (str, default: 'CPU') – device for script execution

  • devices_per_node (int, default: 1) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

  • first_device (int, default: 0) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

Return type:

None

attach_generator_files(to_copy: List[str] | None = None, to_symlink: List[str] | None = None, to_configure: List[str] | None = None) None[source]#

Attach files to an entity for generation

Attach files needed for the entity that, upon generation, will be located in the path of the entity. Invoking this method after files have already been attached will overwrite the previous list of entity files.

During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.

Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;

Parameters:
  • to_copy (Optional[List[str]], default: None) – files to copy

  • to_symlink (Optional[List[str]], default: None) – files to symlink

  • to_configure (Optional[List[str]], default: None) – input files with tagged parameters

Return type:

None

property attached_files_table: str#

Return a list of attached files as a plain text table

Returns:

String version of table

colocate_db(*args: Any, **kwargs: Any) None[source]#

An alias for Model.colocate_db_tcp

Return type:

None

colocate_db_tcp(port: int = 6379, ifname: str | list[str] = 'lo', db_cpus: int = 1, custom_pinning: Iterable[int | Iterable[int]] | None = None, debug: bool = False, db_identifier: str = '', **kwargs: Any) None[source]#

Colocate an Orchestrator instance with this Model over TCP/IP.

This method will initialize settings which add an unsharded database to this Model instance. Only this Model will be able to communicate with this colocated database by using the loopback TCP interface.

Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.

ex. kwargs = {
    maxclients: 100000,
    threads_per_queue: 1,
    inter_op_threads: 1,
    intra_op_threads: 1,
    server_threads: 2 # keydb only
}

Generally these don’t need to be changed.

Parameters:
  • port (int, default: 6379) – port to use for orchestrator database

  • ifname (Union[str, list[str]], default: 'lo') – interface to use for orchestrator

  • db_cpus (int, default: 1) – number of cpus to use for orchestrator

  • custom_pinning (Optional[Iterable[Union[int, Iterable[int]]]], default: None) – CPUs to pin the orchestrator to. Passing an empty iterable disables pinning

  • debug (bool, default: False) – launch Model with extra debug information about the colocated db

  • kwargs (Any) – additional keyword arguments to pass to the orchestrator database

Return type:

None

colocate_db_uds(unix_socket: str = '/tmp/redis.socket', socket_permissions: int = 755, db_cpus: int = 1, custom_pinning: Iterable[int | Iterable[int]] | None = None, debug: bool = False, db_identifier: str = '', **kwargs: Any) None[source]#

Colocate an Orchestrator instance with this Model over UDS.

This method will initialize settings which add an unsharded database to this Model instance. Only this Model will be able to communicate with this colocated database by using Unix Domain sockets.

Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.

example_kwargs = {
    "maxclients": 100000,
    "threads_per_queue": 1,
    "inter_op_threads": 1,
    "intra_op_threads": 1,
    "server_threads": 2 # keydb only
}

Generally these don’t need to be changed.

Parameters:
  • unix_socket (str, default: '/tmp/redis.socket') – path to where the socket file will be created

  • socket_permissions (int, default: 755) – permissions for the socketfile

  • db_cpus (int, default: 1) – number of cpus to use for orchestrator

  • custom_pinning (Optional[Iterable[Union[int, Iterable[int]]]], default: None) – CPUs to pin the orchestrator to. Passing an empty iterable disables pinning

  • debug (bool, default: False) – launch Model with extra debug information about the colocated db

  • kwargs (Any) – additional keyword arguments to pass to the orchestrator database

Return type:

None

property colocated: bool#

Return True if this Model will run with a colocated Orchestrator

Returns:

Return True of the Model will run with a colocated Orchestrator

property db_models: Iterable[DBModel]#

Retrieve an immutable collection of attached models

Returns:

Return an immutable collection of attached models

property db_scripts: Iterable[DBScript]#

Retrieve an immutable collection attached of scripts

Returns:

Return an immutable collection of attached scripts

disable_key_prefixing() None[source]#

If called, the entity will not prefix its keys with its own model name

Return type:

None

enable_key_prefixing() None[source]#

If called, the entity will prefix its keys with its own model name

Return type:

None

params_to_args() None[source]#

Convert parameters to command line arguments and update run settings.

Return type:

None

print_attached_files() None[source]#

Print a table of the attached files on std out

Return type:

None

query_key_prefixing() bool[source]#

Inquire as to whether this entity will prefix its keys with its name

Return type:

bool

Returns:

Return True if entity will prefix its keys with its name

register_incoming_entity(incoming_entity: smartsim.entity.entity.SmartSimEntity) None[source]#

Register future communication between entities.

Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity

Parameters:

incoming_entity (SmartSimEntity) – The entity that data will be received from

Raises:

SmartSimError – if incoming entity has already been registered

Return type:

None

property type: str#

Return the name of the class

Ensemble#

Ensemble.__init__(name, params[, path, ...])

Initialize an Ensemble of Model instances.

Ensemble.add_model(model)

Add a model to this ensemble

Ensemble.add_ml_model(name, backend[, ...])

A TF, TF-lite, PT, or ONNX model to load into the DB at runtime

Ensemble.add_script(name[, script, ...])

TorchScript to launch with every entity belonging to this ensemble

Ensemble.add_function(name[, function, ...])

TorchScript function to launch with every entity belonging to this ensemble

Ensemble.attach_generator_files([to_copy, ...])

Attach files to each model within the ensemble for generation

Ensemble.enable_key_prefixing()

If called, each model within this ensemble will prefix its key with its own model name.

Ensemble.models

An alias for a shallow copy of the entities attribute

Ensemble.query_key_prefixing()

Inquire as to whether each model within the ensemble will prefix their keys

Ensemble.register_incoming_entity(...)

Register future communication between entities.

Ensemble#

class Ensemble(name: str, params: Dict[str, Any], path: str | None = '/usr/local/src/SmartSim/doc', params_as_args: List[str] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None, run_settings: smartsim.settings.base.RunSettings | None = None, perm_strat: str = 'all_perm', **kwargs: Any) None[source]#

Bases: EntityList[Model]

Ensemble is a group of Model instances that can be treated as a reference to a single instance.

Initialize an Ensemble of Model instances.

The kwargs argument can be used to pass custom input parameters to the permutation strategy.

Parameters:
  • name (str) – name of the ensemble

  • params (Dict[str, Any]) – parameters to expand into Model members

  • params_as_args (Optional[List[str]], default: None) – list of params that should be used as command line arguments to the Model member executables and not written to generator files

  • batch_settings (Optional[BatchSettings], default: None) – describes settings for Ensemble as batch workload

  • run_settings (Optional[RunSettings], default: None) – describes how each Model should be executed

  • replicas – number of Model replicas to create - a keyword argument of kwargs

  • perm_strategy – strategy for expanding params into Model instances from params argument options are “all_perm”, “step”, “random” or a callable function.

Returns:

Ensemble instance

add_function(name: str, function: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None[source]#

TorchScript function to launch with every entity belonging to this ensemble

Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of every entity belonging to this ensemble.

For converged orchestrators, the add_script() method should be used.

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the script being stored in the first N devices of type device; alternatively, setting first_device=M will result in the script being stored on nodes M through M + N - 1.

Parameters:
  • name (str) – key to store function under

  • function (Optional[str], default: None) – TorchScript code

  • device (str, default: 'CPU') – device for script execution

  • devices_per_node (int, default: 1) – number of devices on each host

  • first_device (int, default: 0) – first device to use on each host

Return type:

None

add_ml_model(name: str, backend: str, model: bytes | None = None, model_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0, batch_size: int = 0, min_batch_size: int = 0, min_batch_timeout: int = 0, tag: str = '', inputs: List[str] | None = None, outputs: List[str] | None = None) None[source]#

A TF, TF-lite, PT, or ONNX model to load into the DB at runtime

Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble

One of either model (in memory representation) or model_path (file) must be provided

Parameters:
  • name (str) – key to store model under

  • model (Optional[bytes], default: None) – model in memory

  • model_path (Optional[str], default: None) – serialized model

  • backend (str) – name of the backend (TORCH, TF, TFLITE, ONNX)

  • device (str, default: 'CPU') – name of device for execution

  • devices_per_node (int, default: 1) – number of GPUs per node in multiGPU nodes

  • first_device (int, default: 0) – first device in multi-GPU nodes to use for execution, defaults to 0; ignored if devices_per_node is 1

  • batch_size (int, default: 0) – batch size for execution

  • min_batch_size (int, default: 0) – minimum batch size for model execution

  • min_batch_timeout (int, default: 0) – time to wait for minimum batch size

  • tag (str, default: '') – additional tag for model information

  • inputs (Optional[List[str]], default: None) – model inputs (TF only)

  • outputs (Optional[List[str]], default: None) – model outupts (TF only)

Return type:

None

add_model(model: smartsim.entity.model.Model) None[source]#

Add a model to this ensemble

Parameters:

model (Model) – model instance to be added

Raises:
  • TypeError – if model is not an instance of Model

  • EntityExistsError – if model already exists in this ensemble

Return type:

None

add_script(name: str, script: str | None = None, script_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None[source]#

TorchScript to launch with every entity belonging to this ensemble

Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the model being stored in the first N devices of type device.

One of either script (in memory string representation) or script_path (file) must be provided

Parameters:
  • name (str) – key to store script under

  • script (Optional[str], default: None) – TorchScript code

  • script_path (Optional[str], default: None) – path to TorchScript code

  • device (str, default: 'CPU') – device for script execution

  • devices_per_node (int, default: 1) – number of devices on each host

  • first_device (int, default: 0) – first device to use on each host

Return type:

None

attach_generator_files(to_copy: List[str] | None = None, to_symlink: List[str] | None = None, to_configure: List[str] | None = None) None[source]#

Attach files to each model within the ensemble for generation

Attach files needed for the entity that, upon generation, will be located in the path of the entity.

During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.

Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;

Parameters:
  • to_copy (Optional[List[str]], default: None) – files to copy

  • to_symlink (Optional[List[str]], default: None) – files to symlink

  • to_configure (Optional[List[str]], default: None) – input files with tagged parameters

Return type:

None

property attached_files_table: str#

Return a plain-text table with information about files attached to models belonging to this ensemble.

Returns:

A table of all files attached to all models

property batch: bool#

Property indicating whether or not the entity sequence should be launched as a batch job

Returns:

True if entity sequence should be launched as a batch job, False if the members will be launched individually.

property db_models: Iterable[smartsim.entity.DBModel]#

Return an immutable collection of attached models

property db_scripts: Iterable[smartsim.entity.DBScript]#

Return an immutable collection of attached scripts

enable_key_prefixing() None[source]#

If called, each model within this ensemble will prefix its key with its own model name.

Return type:

None

property models: Collection[Model]#

An alias for a shallow copy of the entities attribute

print_attached_files() None[source]#

Print table of attached files to std out

Return type:

None

query_key_prefixing() bool[source]#

Inquire as to whether each model within the ensemble will prefix their keys

Return type:

bool

Returns:

True if all models have key prefixing enabled, False otherwise

register_incoming_entity(incoming_entity: smartsim.entity.entity.SmartSimEntity) None[source]#

Register future communication between entities.

Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity

Only python clients can have multiple incoming connections

Parameters:

incoming_entity (SmartSimEntity) – The entity that data will be received from

Return type:

None

property type: str#

Return the name of the class

Machine Learning#

SmartSim includes built-in utilities for supporting TensorFlow, Keras, and Pytorch.

TensorFlow#

SmartSim includes built-in utilities for supporting TensorFlow and Keras in training and inference.

PyTorch#

SmartSim includes built-in utilities for supporting PyTorch in training and inference.

Slurm#