SlurmConnector

The Slurm connector allows offloading execution to High-Performance Computing (HPC) facilities orchestrated by the Slurm queue manager. It extends the QueueManagerConnector, which inherits from the ConnectorWrapper interface, allowing users to offload jobs to local or remote Slurm controllers using the stacked locations mechanism. The HPC facility is supposed to be constantly active, reducing the deployment phase to deploy the inner connector (e.g., to create an SSHConnection pointing to an HPC login node).

Warning

Note that in StreamFlow v0.1, the QueueManagerConnector directly inherited from the SSHConnector at the implementation level. Consequently, all the properties needed to open an SSH connection to the HPC login node (e.g., hostname, username, and sshKey) were defined directly in the QueueManagerConnector. This path is still supported by StreamFlow v0.2, but it is deprecated and will be removed in StreamFlow v0.3.

Interaction with the Slurm scheduler happens through a Bash script with #SLURM directives. Users can pass the path of a custom script to the connector using the file attribute of the SlurmService configuration. This file is interpreted as a Jinja2 template and populated at runtime by the connector. Alternatively, users can pass Slurm options directly from YAML using the other options of a SlurmService object.

As an example, suppose to have a Slurm template script called sbatch.sh, with the following content:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --partition=queue_name
#SBATCH --mem=1gb

{{streamflow_command}}

A Slurm deployment configuration which uses the sbatch.sh file to spawn jobs can be written as follows:

deployments:
  slurm-example:
    type: slurm
    config:
      services:
        example:
          file: sbatch.sh

Alternatively, the same behaviour can be recreated by directly passing options through the YAML configuration, as follows:

deployments:
  slurm-example:
    type: slurm
    config:
      services:
        example:
          nodes: 1
          partition: queue_name
          mem: 1gb

Being passed directly to the sbatch command line, the YAML options have higher priority than the file-based ones.

Warning

Note that the file property in the upper configuration level, i.e., outside a service definition, is still supported in StreamFlow v0.2, but it is deprecated and will be removed in StreamFlow v0.3.

The unit of binding is the entire HPC facility. In contrast, the scheduling unit is a single job placement in the Slurm queue. Users can limit the maximum number of concurrently placed jobs by setting the maxConcurrentJobs parameter.

properties

checkHostKey

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Perform a strict validation of the host SSH keys (and return exception if key is not recognized as valid)

type

boolean

default

True

dataTransferConnection

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Sometimes HPC clusters provide dedicated hostnames for large data transfers, which guarantee a higher efficiency for data movements

type

string

SSHConnection

file

(Deprecated. Use services.) Path to a file containing a Jinja2 template, describing how the StreamFlow command should be executed in the remote environment

type

string

hostname

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Hostname of the HPC facility

type

string

maxConcurrentJobs

Maximum number of jobs concurrently scheduled for execution on the Queue Manager

type

integer

default

1

maxConcurrentSessions

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Maximum number of concurrent session to open for a single SSH client connection

type

integer

default

10

maxConnections

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Maximum number of concurrent connection to open for a single SSH node

type

integer

default

1

passwordFile

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to a file containing the password to use for authentication

type

string

pollingInterval

Time interval (in seconds) between consecutive termination checks

type

integer

default

5

services

Map containing named configurations of Slurm submissions. Parameters can be either specified as #SBATCH directives in a file or directly in YAML format.

type

object

patternProperties

^[a-z][a-zA-Z0-9._-]*$

SlurmService

sshKey

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to the SSH key needed to connect with Slurm environment

type

string

sshKeyPassphraseFile

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to a file containing the passphrase protecting the SSH key

type

string

transferBufferSize

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Buffer size allocated for local and remote data transfers

type

integer

default

64kiB

tunnel

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) External SSH connection parameters for tunneling

type

SSHConnection

username

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Username needed to connect with the SSH environment

type

string

SlurmService

This complex type represents a submission to the Slurm queue manager.

properties

account

Charge resources used by this job to specified account

type

string

acctgFreq

Define the job accounting and profiling sampling intervals in seconds

type

string

array

Submit a job array, multiple jobs to be executed with identical parameters

type

string

batch

Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their batch script using this options. The batch argument must be a subset of the job’s constraint argument

type

string

bb

Burst buffer specification. The form of the specification is system dependent

type

string

bbf

Path of file containing burst buffer specification. The form of the specification is system dependent

type

string

begin

Submit the batch script to the Slurm controller immediately, like normal, but tell the controller to defer the allocation of the job until the specified time

type

string

clusterConstraint

Specifies features that a federated cluster must have to have a sibling job submitted to it. Slurm will attempt to submit a sibling job to a cluster if it has at least one of the specified features. If the ! option is included, Slurm will attempt to submit a sibling job to a cluster that has none of the specified features

type

string

clusters

Clusters to issue commands to. Multiple cluster names may be comma separated. The job will be submitted to the one cluster providing the earliest expected job initiation time. The default value is the current cluster

type

string

constraint

Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their job using the constraint option

type

string

container

Absolute path to OCI container bundle

type

string

containerId

Unique name for OCI container

type

string

contiguous

If set, then the allocated nodes must form a contiguous set

type

boolean

coreSpec

Count of specialized cores per node reserved by the job for system operations and not used by the application. The application will not use these cores, but will be charged for their allocation

type

integer

coresPerSocket

Restrict node selection to nodes with at least the specified number of cores per socket

type

integer

cpuFreq

Request that job steps initiated by srun commands inside this sbatch script be run at some requested frequency if possible, on the CPUs selected for the step on the compute node(s)

type

string

cpusPerGpu

Advise Slurm that ensuing job steps will require ncpus processors per allocated GPU. Not compatible with the cpusPerTask option

type

integer

cpusPerTask

Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task

type

integer

deadline

Remove the job if no ending is possible before this deadline. Default is no deadline

type

string

delayBoot

Do not reboot nodes in order to satisfied this job’s feature specification if the job has been eligible to run for less than this time period. If the job has waited for less than the specified period, it will use only nodes which already have the specified features. The argument is in units of minutes

type

integer

distribution

Specify alternate distribution methods for remote processes. For job allocation, this sets environment variables that will be used by subsequent srun requests and also affects which cores will be selected for job allocation

type

string

exclude

Explicitly exclude certain nodes from the resources granted to the jo

type

string

exclusive

The job allocation can not share nodes with other running jobs (or just other users with the user option or with the mcs option). If user/mcs are not specified (i.e. the job allocation can not share nodes with other running jobs), the job is allocated all CPUs and GRES on all nodes in the allocation, but is only allocated as much memory as it requested

anyOf

type

boolean

type

string

export

Identify which environment variables from the submission environment are propagated to the launched application

type

string

exportFile

If a number between 3 and OPEN_MAX is specified as the argument to this option, a readable file descriptor will be assumed (STDIN and STDOUT are not supported as valid arguments). Otherwise a filename is assumed. Export environment variables defined in filename or read from fd to the job’s execution environment

anyOf

type

integer

type

string

extraNodeInfo

Restrict node selection to nodes with at least the specified number of sockets, cores per socket and/or threads per core

type

string

file

Path to a file containing a Jinja2 template, describing how the StreamFlow command should be executed in the remote environment

type

string

getUserEnv

This option will tell sbatch to retrieve the login environment variables for the user specified in the uid option. Be aware that any environment variables already set in sbatch’s environment will take precedence over any environment variables in the user’s login environment. The optional timeout value is in seconds (default: 8)

anyOf

type

boolean

type

string

gid

Submit the job with group’s group access permissions. The gid option may be the group name or the numerical group ID

anyOf

type

integer

type

string

gpuBind

Bind tasks to specific GPUs. By default every spawned task can access every GPU allocated to the step

type

string

gpuFreq

Request that GPUs allocated to the job are configured with specific frequency values. This option can be used to independently configure the GPU and its memory frequencies

type

string

gpus

Specify the total number of GPUs required for the job. An optional GPU type specification can be supplied (e.g., volta:3)

type

string

gpusPerNode

Specify the number of GPUs required for the job on each node included in the job’s resource allocation. An optional GPU type specification can be supplied (e.g., volta:3)

type

string

gpusPerSocket

Specify the number of GPUs required for the job on each socket included in the job’s resource allocation. An optional GPU type specification can be supplied (e.g., volta:3)

type

string

gpusPerTask

Specify the number of GPUs required for the job on each task to be spawned in the job’s resource allocation. An optional GPU type specification can be supplied (e.g., volta:3)

type

string

gres

Specifies a comma-delimited list of generic consumable resources

type

string

gresFlags

Specify generic resource task binding options

type

string

hint

Bind tasks according to application hints. This option cannot be used in conjunction with ntasksPerCore, threadsPerCore, or extraNodeInfo

type

string

ignorePBS

Ignore all #PBS and #BSUB options specified in the batch script

type

boolean

jobName

Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system

type

string

licenses

Specification of licenses (or other resources available on all nodes of the cluster) which must be allocated to this job

type

string

mailType

Notify user by email when certain event types occur

type

string

mailUser

User to receive email notification of state changes as defined by mailType. The default value is the submitting user

type

string

mcsLabel

Used only when the mcs/group plugin is enabled. This parameter is a group among the groups of the user

type

string

mem

Specify the real memory required per node. Default units are megabytes

type

string

memBind

Bind tasks to memory. Used only when the task/affinity plugin is enabled and the NUMA memory functions are available

type

string

memPerCpu

Minimum memory required per usable allocated CPU. Default units are megabytes

type

string

memPerGpu

Minimum memory required per allocated GPU. Default units are megabytes

type

string

mincpus

Specify a minimum number of logical cpus/processors per node

type

integer

network

Specify information pertaining to the switch or network

type

string

nice

Run the job with an adjusted scheduling priority within Slurm. With no adjustment value the scheduling priority is decreased by 100. A negative nice value increases the priority, otherwise decreases it

type

integer

noKill

Do not automatically terminate a job if one of the nodes it has been allocated fails. The user will assume the responsibilities for fault-tolerance should a node fail. The job allocation will not be revoked so the user may launch new job steps on the remaining nodes in their allocation

type

boolean

noRequeue

Specifies that the batch job should never be requeued under any circumstances

type

boolean

nodefile

Much like nodelist, but the list is contained in a file of name node file

type

string

nodelist

Request a specific list of hosts. The job will contain all of these hosts and possibly additional hosts as needed to satisfy resource requirements

type

string

nodes

Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count. Node count can be also specified as size_string. The size_string specification identifies what nodes values should be used

type

string

ntasks

This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources

type

integer

ntasksPerCore

Request the maximum ntasks be invoked on each core

type

integer

ntasksPerGpu

Request that there are ntasks tasks invoked for every GPU

type

integer

ntasksPerNode

Request that ntasks be invoked on each node

type

integer

ntasksPerSocket

Request the maximum ntasks be invoked on each socket

type

integer

openMode

Open the output and error files using append or truncate mode as specified

type

string

overcommit

Overcommit resources. When applied to a job allocation (not including jobs requesting exclusive access to the nodes) the resources are allocated as if only one task per node is requested.

type

boolean

oversubscribe

The job allocation can over-subscribe resources with other running jobs. The resources to be over-subscribed can be nodes, sockets, cores, and/or hyperthreads depending upon configuration

type

boolean

partition

Request a specific partition for the resource allocation. If not specified, the default behavior is to allow the slurm controller to select the default partition as designated by the system administrator

type

string

power

Comma separated list of power management plugin options

type

string

prefer

Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are desired but not required by their job using the prefer option. This option operates independently from constraint and will override whatever is set there if possible

type

string

priority

Request a specific job priority. May be subject to configuration specific constraints

type

string

profile

Enables detailed data collection by the acct_gather_profile plugin

type

string

propagate

Allows users to specify which of the modifiable (soft) resource limits to propagate to the compute nodes and apply to their jobs

type

string

qos

Request a quality of service for the job

type

string

reboot

Force the allocated nodes to reboot before starting the job. This is only supported with some system configurations and will otherwise be silently ignored

type

boolean

requeue

Specifies that the batch job should be eligible for requeuing. The job may be requeued explicitly by a system administrator, after node failure, or upon preemption by a higher priority job

type

boolean

reservation

Allocate resources for the job from the named reservation. If the job can use more than one reservation, specify their names in a comma separate list and the one offering earliest initiation

type

string

signal

When a job is within sig_time seconds of its end time, send it the signal sig_num. Due to the resolution of event handling by Slurm, the signal may be sent up to 60 seconds earlier than specified

type

string

socketsPerNode

Restrict node selection to nodes with at least the specified number of sockets

type

integer

spreadJob

Spread the job allocation over as many nodes as possible and attempt to evenly distribute tasks across the allocated nodes

type

boolean

switches

When a tree topology is used, this defines the maximum count of leaf switches desired for the job allocation and optionally the maximum time to wait for that number of switches. If Slurm finds an allocation containing more switches than the count specified, the job remains pending until it either finds an allocation with desired switch count or the time limit expires

type

string

threadSpec

Count of specialized threads per node reserved by the job for system operations and not used by the application. The application will not use these threads, but will be charged for their allocation

type

integer

threadsPerCore

Restrict node selection to nodes with at least the specified number of threads per core. In task layout, use the specified maximum number of threads per core

type

integer

time

Set a limit on the total run time of the job allocation. If a timeout value is defined directly in the workflow specification, it will override this value

type

string

timeMin

Set a minimum time limit on the job allocation. If specified, the job may have its time limit lowered to a value no lower than timeMin if doing so permits the job to begin execution earlier than otherwise possible

type

string

tmp

Specify a minimum amount of temporary disk space per node. Default units are megabytes

type

integer

tresPerTask

Specifies a comma-delimited list of trackable resources required for the job on each task to be spawned in the job’s resource allocation

type

string

uid

Attempt to submit and/or run a job as user instead of the invoking user id. user may be the user name or numerical user ID

anyOf

type

integer

type

string

useMinNodes

If a range of node counts is given, prefer the smaller count

type

boolean

waitAllNodes

Controls when the execution of the command begins. By default the job will begin execution as soon as the allocation is made

type

boolean

wckey

Specify wckey to be used with job

type

string