FluxConnector
The Flux Framework connector allows running jobs on a cluster with Flux Framework in a High Performance Computing Context. Although Flux can work in a local testing container or a cloud environment and has a Python SDK, to match the design here, we follow suit and inherit from the QueueManagerConnector. In this way, users can offload jobs to local or remote PBS controllers using the stacked locations mechanism. The HPC facility is supposed to be constantly active, reducing the deployment phase to deploy the inner connector (e.g., to create an SSHConnection pointing to an HPC login node).
Warning
Note that in StreamFlow v0.1
, the QueueManagerConnector
directly inherited from the SSHConnector at the implementation level. Consequently, all the properties needed to open an SSH connection to the HPC login node (e.g., hostname
, username
, and sshKey
) were defined directly in the QueueManagerConnector
. This path is still supported by StreamFlow v0.2
, but it is deprecated and will be removed in StreamFlow v0.3
.
Interaction with the Flux scheduler happens through a Bash script with #flux
directives. Users can pass the path of a custom script to the connector using the file
attribute of the FluxService configuration. This file is interpreted as a Jinja2 template and populated at runtime by the connector. Alternatively, users can pass PBS options directly from YAML using the other options of a FluxService object.
As an example, suppose to have a Flux template script called batch.sh
, with the following content:
#!/bin/bash
#flux --nodes=1
#flux --queue=queue_name
{{streamflow_command}}
A PBS deployment configuration which uses the batch.sh
file to spawn jobs can be written as follows:
deployments:
flux-example:
type: pbs
config:
services:
example:
file: batch.sh
Alternatively, the same behaviour can be recreated by directly passing options through the YAML configuration, as follows:
deployments:
flux-example:
type: pbs
config:
services:
example:
nodes: 1
queue: queue_name
Being passed directly to the flux batch
command line, the YAML options have higher priority than the file-based ones.
Warning
Note that the file
property in the upper configuration level, i.e., outside a service
definition, is still supported in StreamFlow v0.2
, but it is deprecated and will be removed in StreamFlow v0.3
.
For a quick demo or tutorial, see our example workflow.
properties |
||
checkHostKey |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Perform a strict validation of the host SSH keys (and return exception if key is not recognized as valid) |
|
type |
boolean |
|
default |
True |
|
dataTransferConnection |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Sometimes HPC clusters provide dedicated hostnames for large data transfers, which guarantee a higher efficiency for data movements |
|
type |
string |
|
file |
(Deprecated. Use services.) Path to a file containing a Jinja2 template, describing how the StreamFlow command should be executed in the remote environment |
|
type |
string |
|
hostname |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Hostname of the HPC facility |
|
type |
string |
|
maxConcurrentJobs |
Maximum number of jobs concurrently scheduled for execution on the Queue Manager |
|
type |
integer |
|
default |
1 |
|
maxConcurrentSessions |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Maximum number of concurrent session to open for a single SSH client connection |
|
type |
integer |
|
default |
10 |
|
maxConnections |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Maximum number of concurrent connection to open for a single SSH node |
|
type |
integer |
|
default |
1 |
|
passwordFile |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to a file containing the password to use for authentication |
|
type |
string |
|
pollingInterval |
Time interval (in seconds) between consecutive termination checks |
|
type |
integer |
|
default |
5 |
|
services |
Map containing named configurations of Flux submissions. Parameters can be either specified as #flux directives in a file or directly in YAML format. |
|
type |
object |
|
patternProperties |
||
^[a-z][a-zA-Z0-9._-]*$ |
||
sshKey |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to the SSH key needed to connect with Slurm environment |
|
type |
string |
|
sshKeyPassphraseFile |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to a file containing the passphrase protecting the SSH key |
|
type |
string |
|
transferBufferSize |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Buffer size allocated for local and remote data transfers |
|
type |
integer |
|
default |
64kiB |
|
tunnel |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) External SSH connection parameters for tunneling |
|
type |
||
username |
(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Username needed to connect with the SSH environment |
|
type |
string |
FluxService
This complex type represents a submission to the Flux queue manager.
properties |
||
beginTime |
Convenience option for setting a begin-time dependency for a job. The job is guaranteed to start after the specified date and time |
|
type |
string |
|
brokerOpts |
For batch jobs, pass specified options to the Flux brokers of the new instance |
|
type |
string[] |
|
cores |
Set the total number of cores |
|
type |
integer |
|
coresPerSlot |
Set the number of cores to assign to each slot |
|
type |
integer |
|
coresPerTask |
Set the number of cores to assign to each task |
|
type |
integer |
|
env |
Control how environment variables are exported |
|
type |
string[] |
|
envFile |
Read a set of environment rules from a file |
|
type |
string[] |
|
envRemove |
Remove all environment variables matching the pattern from the current generated environment |
|
type |
string[] |
|
exclusive |
Indicate to the scheduler that nodes should be exclusively allocated to this job |
|
type |
boolean |
|
file |
Path to a file containing a Jinja2 template, describing how the StreamFlow command should be executed in the remote environment |
|
type |
string |
|
flags |
Set comma separated list of job submission flags |
|
type |
string |
|
gpusPerNode |
Request a specific number of GPUs per node |
|
type |
integer |
|
gpusPerSlot |
Set the number of GPU devices to assign to each slot |
|
type |
integer |
|
gpusPerTask |
Set the number of GPU devices to assign to each task |
|
type |
integer |
|
jobName |
Set an alternate job name for the job |
|
type |
string |
|
labelIO |
Add task rank prefixes to each line of output |
|
type |
boolean |
|
nodes |
Set the number of nodes to assign to the job |
|
type |
integer |
|
nslots |
Set the number of slots requested |
|
type |
integer |
|
ntasks |
Set the number of tasks to launch |
|
type |
integer |
|
queue |
Submit a job to a specific named queue |
|
type |
string |
|
requires |
Specify a set of allowable properties and other attributes to consider when matching resources for a job |
|
type |
string |
|
rlimit |
Control how process resource limits are propagated |
|
type |
string[] |
|
setattr |
Set jobspec attribute. Keys may include periods to denote hierarchy |
|
type |
object |
|
setopt |
Set shell option. Keys may include periods to denote hierarchy |
|
type |
object |
|
taskmap |
Choose an alternate method for mapping job task IDs to nodes of the job |
|
type |
string |
|
tasksPerCore |
Force a number of tasks per core |
|
type |
integer |
|
tasksPerNode |
Set the number of tasks per node to run |
|
type |
integer |
|
timeLimit |
Time limit in minutes when no units provided, otherwise in Flux standard duration (e.g., 30s, 2d, 1.5h). If a timeout value is defined directly in the workflow specification, it will override this value |
|
type |
string |
|
unbuffered |
Disable buffering of standard input and output as much as practical |
|
type |
boolean |
|
urgency |
Specify job urgency, which affects queue order. Numerically higher urgency jobs are considered by the scheduler first |
|
type |
integer |