SlurmConnector
The Slurm connector allows offloading execution to High-Performance Computing (HPC) facilities orchestrated by the Slurm queue manager. It extends the QueueManagerConnector, which inherits from the ConnectorWrapper interface, allowing users to offload jobs to local or remote Slurm controllers using the stacked locations mechanism. The HPC facility is supposed to be constantly active, reducing the deployment phase to deploy the inner connector (e.g., to create an SSHConnection pointing to an HPC login node).
Warning
Note that in StreamFlow v0.1
, the QueueManagerConnector
directly inherited from the SSHConnector at the implementation level. Consequently, all the properties needed to open an SSH connection to the HPC login node (e.g., hostname
, username
, and sshKey
) were defined directly in the QueueManagerConnector
. This path is still supported by StreamFlow v0.2
, but it is deprecated and will be removed in StreamFlow v0.3
.
Interaction with the Slurm scheduler happens through a Bash script with #SLURM
directives. Users can pass the path of a custom script to the connector using the file
attribute of the SlurmService configuration. This file is interpreted as a Jinja2 template and populated at runtime by the connector. Alternatively, users can pass Slurm options directly from YAML using the other options of a SlurmService object.
As an example, suppose to have a Slurm template script called sbatch.sh
, with the following content:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=queue_name
#SBATCH --mem=1gb
{{streamflow_command}}
A Slurm deployment configuration which uses the sbatch.sh
file to spawn jobs can be written as follows:
deployments:
slurm-example:
type: slurm
config:
services:
example:
file: sbatch.sh
Alternatively, the same behaviour can be recreated by directly passing options through the YAML configuration, as follows:
deployments:
slurm-example:
type: slurm
config:
services:
example:
nodes: 1
partition: queue_name
mem: 1gb
Being passed directly to the sbatch
command line, the YAML options have higher priority than the file-based ones.
Warning
Note that the file
property in the upper configuration level, i.e., outside a service
definition, is still supported in StreamFlow v0.2
, but it is deprecated and will be removed in StreamFlow v0.3
.
The unit of binding is the entire HPC facility. In contrast, the scheduling unit is a single job placement in the Slurm queue. Users can limit the maximum number of concurrently placed jobs by setting the maxConcurrentJobs
parameter.
properties |
||
checkHostKey |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Perform a strict validation of the host SSH keys (and return exception if key is not recognized as valid) |
|
type |
boolean |
|
default |
True |
|
dataTransferConnection |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Sometimes HPC clusters provide dedicated hostnames for large data transfers, which guarantee a higher efficiency for data movements |
|
type |
string |
|
file |
(Deprecated. Use services.) Path to a file containing a Jinja2 template, describing how the StreamFlow command should be executed in the remote environment |
|
type |
string |
|
hostname |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Hostname of the HPC facility |
|
type |
string |
|
maxConcurrentJobs |
Maximum number of jobs concurrently scheduled for execution on the Queue Manager |
|
type |
integer |
|
default |
1 |
|
maxConcurrentSessions |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Maximum number of concurrent session to open for a single SSH client connection |
|
type |
integer |
|
default |
10 |
|
maxConnections |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Maximum number of concurrent connection to open for a single SSH node |
|
type |
integer |
|
default |
1 |
|
passwordFile |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to a file containing the password to use for authentication |
|
type |
string |
|
pollingInterval |
Time interval (in seconds) between consecutive termination checks |
|
type |
integer |
|
default |
5 |
|
services |
Map containing named configurations of Slurm submissions. Parameters can be either specified as #SBATCH directives in a file or directly in YAML format. |
|
type |
object |
|
patternProperties |
||
^[a-z][a-zA-Z0-9._-]*$ |
||
sshKey |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to the SSH key needed to connect with Slurm environment |
|
type |
string |
|
sshKeyPassphraseFile |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to a file containing the passphrase protecting the SSH key |
|
type |
string |
|
transferBufferSize |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Buffer size allocated for local and remote data transfers |
|
type |
integer |
|
default |
64kiB |
|
tunnel |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) External SSH connection parameters for tunneling |
|
type |
||
username |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Username needed to connect with the SSH environment |
|
type |
string |
SlurmService
This complex type represents a submission to the Slurm queue manager.
properties |
|||
account |
Charge resources used by this job to specified account |
||
type |
string |
||
acctgFreq |
Define the job accounting and profiling sampling intervals in seconds |
||
type |
string |
||
array |
Submit a job array, multiple jobs to be executed with identical parameters |
||
type |
string |
||
batch |
Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their batch script using this options. The batch argument must be a subset of the job’s constraint argument |
||
type |
string |
||
bb |
Burst buffer specification. The form of the specification is system dependent |
||
type |
string |
||
bbf |
Path of file containing burst buffer specification. The form of the specification is system dependent |
||
type |
string |
||
begin |
Submit the batch script to the Slurm controller immediately, like normal, but tell the controller to defer the allocation of the job until the specified time |
||
type |
string |
||
clusterConstraint |
Specifies features that a federated cluster must have to have a sibling job submitted to it. Slurm will attempt to submit a sibling job to a cluster if it has at least one of the specified features. If the ! option is included, Slurm will attempt to submit a sibling job to a cluster that has none of the specified features |
||
type |
string |
||
clusters |
Clusters to issue commands to. Multiple cluster names may be comma separated. The job will be submitted to the one cluster providing the earliest expected job initiation time. The default value is the current cluster |
||
type |
string |
||
constraint |
Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their job using the constraint option |
||
type |
string |
||
container |
Absolute path to OCI container bundle |
||
type |
string |
||
containerId |
Unique name for OCI container |
||
type |
string |
||
contiguous |
If set, then the allocated nodes must form a contiguous set |
||
type |
boolean |
||
coreSpec |
Count of specialized cores per node reserved by the job for system operations and not used by the application. The application will not use these cores, but will be charged for their allocation |
||
type |
integer |
||
coresPerSocket |
Restrict node selection to nodes with at least the specified number of cores per socket |
||
type |
integer |
||
cpuFreq |
Request that job steps initiated by srun commands inside this sbatch script be run at some requested frequency if possible, on the CPUs selected for the step on the compute node(s) |
||
type |
string |
||
cpusPerGpu |
Advise Slurm that ensuing job steps will require ncpus processors per allocated GPU. Not compatible with the cpusPerTask option |
||
type |
integer |
||
cpusPerTask |
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task |
||
type |
integer |
||
deadline |
Remove the job if no ending is possible before this deadline. Default is no deadline |
||
type |
string |
||
delayBoot |
Do not reboot nodes in order to satisfied this job’s feature specification if the job has been eligible to run for less than this time period. If the job has waited for less than the specified period, it will use only nodes which already have the specified features. The argument is in units of minutes |
||
type |
integer |
||
distribution |
Specify alternate distribution methods for remote processes. For job allocation, this sets environment variables that will be used by subsequent srun requests and also affects which cores will be selected for job allocation |
||
type |
string |
||
exclude |
Explicitly exclude certain nodes from the resources granted to the jo |
||
type |
string |
||
exclusive |
The job allocation can not share nodes with other running jobs (or just other users with the user option or with the mcs option). If user/mcs are not specified (i.e. the job allocation can not share nodes with other running jobs), the job is allocated all CPUs and GRES on all nodes in the allocation, but is only allocated as much memory as it requested |
||
anyOf |
type |
boolean |
|
type |
string |
||
export |
Identify which environment variables from the submission environment are propagated to the launched application |
||
type |
string |
||
exportFile |
If a number between 3 and OPEN_MAX is specified as the argument to this option, a readable file descriptor will be assumed (STDIN and STDOUT are not supported as valid arguments). Otherwise a filename is assumed. Export environment variables defined in filename or read from fd to the job’s execution environment |
||
anyOf |
type |
integer |
|
type |
string |
||
extraNodeInfo |
Restrict node selection to nodes with at least the specified number of sockets, cores per socket and/or threads per core |
||
type |
string |
||
file |
Path to a file containing a Jinja2 template, describing how the StreamFlow command should be executed in the remote environment |
||
type |
string |
||
getUserEnv |
This option will tell sbatch to retrieve the login environment variables for the user specified in the uid option. Be aware that any environment variables already set in sbatch’s environment will take precedence over any environment variables in the user’s login environment. The optional timeout value is in seconds (default: 8) |
||
anyOf |
type |
boolean |
|
type |
string |
||
gid |
Submit the job with group’s group access permissions. The gid option may be the group name or the numerical group ID |
||
anyOf |
type |
integer |
|
type |
string |
||
gpuBind |
Bind tasks to specific GPUs. By default every spawned task can access every GPU allocated to the step |
||
type |
string |
||
gpuFreq |
Request that GPUs allocated to the job are configured with specific frequency values. This option can be used to independently configure the GPU and its memory frequencies |
||
type |
string |
||
gpus |
Specify the total number of GPUs required for the job. An optional GPU type specification can be supplied (e.g., volta:3) |
||
type |
string |
||
gpusPerNode |
Specify the number of GPUs required for the job on each node included in the job’s resource allocation. An optional GPU type specification can be supplied (e.g., volta:3) |
||
type |
string |
||
gpusPerSocket |
Specify the number of GPUs required for the job on each socket included in the job’s resource allocation. An optional GPU type specification can be supplied (e.g., volta:3) |
||
type |
string |
||
gpusPerTask |
Specify the number of GPUs required for the job on each task to be spawned in the job’s resource allocation. An optional GPU type specification can be supplied (e.g., volta:3) |
||
type |
string |
||
gres |
Specifies a comma-delimited list of generic consumable resources |
||
type |
string |
||
gresFlags |
Specify generic resource task binding options |
||
type |
string |
||
hint |
Bind tasks according to application hints. This option cannot be used in conjunction with ntasksPerCore, threadsPerCore, or extraNodeInfo |
||
type |
string |
||
ignorePBS |
Ignore all #PBS and #BSUB options specified in the batch script |
||
type |
boolean |
||
jobName |
Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system |
||
type |
string |
||
licenses |
Specification of licenses (or other resources available on all nodes of the cluster) which must be allocated to this job |
||
type |
string |
||
mailType |
Notify user by email when certain event types occur |
||
type |
string |
||
mailUser |
User to receive email notification of state changes as defined by mailType. The default value is the submitting user |
||
type |
string |
||
mcsLabel |
Used only when the mcs/group plugin is enabled. This parameter is a group among the groups of the user |
||
type |
string |
||
mem |
Specify the real memory required per node. Default units are megabytes |
||
type |
string |
||
memBind |
Bind tasks to memory. Used only when the task/affinity plugin is enabled and the NUMA memory functions are available |
||
type |
string |
||
memPerCpu |
Minimum memory required per usable allocated CPU. Default units are megabytes |
||
type |
string |
||
memPerGpu |
Minimum memory required per allocated GPU. Default units are megabytes |
||
type |
string |
||
mincpus |
Specify a minimum number of logical cpus/processors per node |
||
type |
integer |
||
network |
Specify information pertaining to the switch or network |
||
type |
string |
||
nice |
Run the job with an adjusted scheduling priority within Slurm. With no adjustment value the scheduling priority is decreased by 100. A negative nice value increases the priority, otherwise decreases it |
||
type |
integer |
||
noKill |
Do not automatically terminate a job if one of the nodes it has been allocated fails. The user will assume the responsibilities for fault-tolerance should a node fail. The job allocation will not be revoked so the user may launch new job steps on the remaining nodes in their allocation |
||
type |
boolean |
||
noRequeue |
Specifies that the batch job should never be requeued under any circumstances |
||
type |
boolean |
||
nodefile |
Much like nodelist, but the list is contained in a file of name node file |
||
type |
string |
||
nodelist |
Request a specific list of hosts. The job will contain all of these hosts and possibly additional hosts as needed to satisfy resource requirements |
||
type |
string |
||
nodes |
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count. Node count can be also specified as size_string. The size_string specification identifies what nodes values should be used |
||
type |
string |
||
ntasks |
This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources |
||
type |
integer |
||
ntasksPerCore |
Request the maximum ntasks be invoked on each core |
||
type |
integer |
||
ntasksPerGpu |
Request that there are ntasks tasks invoked for every GPU |
||
type |
integer |
||
ntasksPerNode |
Request that ntasks be invoked on each node |
||
type |
integer |
||
ntasksPerSocket |
Request the maximum ntasks be invoked on each socket |
||
type |
integer |
||
openMode |
Open the output and error files using append or truncate mode as specified |
||
type |
string |
||
overcommit |
Overcommit resources. When applied to a job allocation (not including jobs requesting exclusive access to the nodes) the resources are allocated as if only one task per node is requested. |
||
type |
boolean |
||
oversubscribe |
The job allocation can over-subscribe resources with other running jobs. The resources to be over-subscribed can be nodes, sockets, cores, and/or hyperthreads depending upon configuration |
||
type |
boolean |
||
partition |
Request a specific partition for the resource allocation. If not specified, the default behavior is to allow the slurm controller to select the default partition as designated by the system administrator |
||
type |
string |
||
power |
Comma separated list of power management plugin options |
||
type |
string |
||
prefer |
Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are desired but not required by their job using the prefer option. This option operates independently from constraint and will override whatever is set there if possible |
||
type |
string |
||
priority |
Request a specific job priority. May be subject to configuration specific constraints |
||
type |
string |
||
profile |
Enables detailed data collection by the acct_gather_profile plugin |
||
type |
string |
||
propagate |
Allows users to specify which of the modifiable (soft) resource limits to propagate to the compute nodes and apply to their jobs |
||
type |
string |
||
qos |
Request a quality of service for the job |
||
type |
string |
||
reboot |
Force the allocated nodes to reboot before starting the job. This is only supported with some system configurations and will otherwise be silently ignored |
||
type |
boolean |
||
requeue |
Specifies that the batch job should be eligible for requeuing. The job may be requeued explicitly by a system administrator, after node failure, or upon preemption by a higher priority job |
||
type |
boolean |
||
reservation |
Allocate resources for the job from the named reservation. If the job can use more than one reservation, specify their names in a comma separate list and the one offering earliest initiation |
||
type |
string |
||
signal |
When a job is within sig_time seconds of its end time, send it the signal sig_num. Due to the resolution of event handling by Slurm, the signal may be sent up to 60 seconds earlier than specified |
||
type |
string |
||
socketsPerNode |
Restrict node selection to nodes with at least the specified number of sockets |
||
type |
integer |
||
spreadJob |
Spread the job allocation over as many nodes as possible and attempt to evenly distribute tasks across the allocated nodes |
||
type |
boolean |
||
switches |
When a tree topology is used, this defines the maximum count of leaf switches desired for the job allocation and optionally the maximum time to wait for that number of switches. If Slurm finds an allocation containing more switches than the count specified, the job remains pending until it either finds an allocation with desired switch count or the time limit expires |
||
type |
string |
||
threadSpec |
Count of specialized threads per node reserved by the job for system operations and not used by the application. The application will not use these threads, but will be charged for their allocation |
||
type |
integer |
||
threadsPerCore |
Restrict node selection to nodes with at least the specified number of threads per core. In task layout, use the specified maximum number of threads per core |
||
type |
integer |
||
time |
Set a limit on the total run time of the job allocation. If a timeout value is defined directly in the workflow specification, it will override this value |
||
type |
string |
||
timeMin |
Set a minimum time limit on the job allocation. If specified, the job may have its time limit lowered to a value no lower than timeMin if doing so permits the job to begin execution earlier than otherwise possible |
||
type |
string |
||
tmp |
Specify a minimum amount of temporary disk space per node. Default units are megabytes |
||
type |
integer |
||
tresPerTask |
Specifies a comma-delimited list of trackable resources required for the job on each task to be spawned in the job’s resource allocation |
||
type |
string |
||
uid |
Attempt to submit and/or run a job as user instead of the invoking user id. user may be the user name or numerical user ID |
||
anyOf |
type |
integer |
|
type |
string |
||
useMinNodes |
If a range of node counts is given, prefer the smaller count |
||
type |
boolean |
||
waitAllNodes |
Controls when the execution of the command begins. By default the job will begin execution as soon as the allocation is made |
||
type |
boolean |
||
wckey |
Specify wckey to be used with job |
||
type |
string |