FluxConnector

The Flux Framework connector allows running jobs on a cluster with Flux Framework in a High Performance Computing Context. Although Flux can work in a local testing container or a cloud environment and has a Python SDK, to match the design here, we follow suit and inherit from the QueueManagerConnector. In this way, users can offload jobs to local or remote PBS controllers using the stacked locations mechanism. The HPC facility is supposed to be constantly active, reducing the deployment phase to deploy the inner connector (e.g., to create an SSHConnection pointing to an HPC login node).

Warning

Note that in StreamFlow v0.1, the QueueManagerConnector directly inherited from the SSHConnector at the implementation level. Consequently, all the properties needed to open an SSH connection to the HPC login node (e.g., hostname, username, and sshKey) were defined directly in the QueueManagerConnector. This path is still supported by StreamFlow v0.2, but it is deprecated and will be removed in StreamFlow v0.3.

Interaction with the Flux scheduler happens through a Bash script with #flux directives. Users can pass the path of a custom script to the connector using the file attribute of the FluxService configuration. This file is interpreted as a Jinja2 template and populated at runtime by the connector. Alternatively, users can pass PBS options directly from YAML using the other options of a FluxService object.

As an example, suppose to have a Flux template script called batch.sh, with the following content:

#!/bin/bash

#flux --nodes=1
#flux --queue=queue_name

{{streamflow_command}}

A PBS deployment configuration which uses the batch.sh file to spawn jobs can be written as follows:

deployments:
  flux-example:
    type: pbs
    config:
      services:
        example:
          file: batch.sh

Alternatively, the same behaviour can be recreated by directly passing options through the YAML configuration, as follows:

deployments:
  flux-example:
    type: pbs
    config:
      services:
        example:
          nodes: 1
          queue: queue_name

Being passed directly to the flux batch command line, the YAML options have higher priority than the file-based ones.

Warning

Note that the file property in the upper configuration level, i.e., outside a service definition, is still supported in StreamFlow v0.2, but it is deprecated and will be removed in StreamFlow v0.3.

For a quick demo or tutorial, see our example workflow.

properties

checkHostKey

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Perform a strict validation of the host SSH keys (and return exception if key is not recognized as valid)

type

boolean

default

True

dataTransferConnection

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Sometimes HPC clusters provide dedicated hostnames for large data transfers, which guarantee a higher efficiency for data movements

type

string

SSHConnection

file

(Deprecated. Use services.) Path to a file containing a Jinja2 template, describing how the StreamFlow command should be executed in the remote environment

type

string

hostname

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Hostname of the HPC facility

type

string

maxConcurrentJobs

Maximum number of jobs concurrently scheduled for execution on the Queue Manager

type

integer

default

1

maxConcurrentSessions

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Maximum number of concurrent session to open for a single SSH client connection

type

integer

default

10

maxConnections

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Maximum number of concurrent connection to open for a single SSH node

type

integer

default

1

passwordFile

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to a file containing the password to use for authentication

type

string

pollingInterval

Time interval (in seconds) between consecutive termination checks

type

integer

default

5

services

Map containing named configurations of Flux submissions. Parameters can be either specified as #flux directives in a file or directly in YAML format.

type

object

patternProperties

^[a-z][a-zA-Z0-9._-]*$

FluxService

sshKey

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to the SSH key needed to connect with Slurm environment

type

string

sshKeyPassphraseFile

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Path to a file containing the passphrase protecting the SSH key

type

string

transferBufferSize

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Buffer size allocated for local and remote data transfers

type

integer

default

64kiB

tunnel

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) External SSH connection parameters for tunneling

type

SSHConnection

username

(Deprecated. Use the wraps directive to wrap a standalone SSH connector.) Username needed to connect with the SSH environment

type

string

FluxService

This complex type represents a submission to the Flux queue manager.

properties

beginTime

Convenience option for setting a begin-time dependency for a job. The job is guaranteed to start after the specified date and time

type

string

brokerOpts

For batch jobs, pass specified options to the Flux brokers of the new instance

type

string[]

cores

Set the total number of cores

type

integer

coresPerSlot

Set the number of cores to assign to each slot

type

integer

coresPerTask

Set the number of cores to assign to each task

type

integer

env

Control how environment variables are exported

type

string[]

envFile

Read a set of environment rules from a file

type

string[]

envRemove

Remove all environment variables matching the pattern from the current generated environment

type

string[]

exclusive

Indicate to the scheduler that nodes should be exclusively allocated to this job

type

boolean

file

Path to a file containing a Jinja2 template, describing how the StreamFlow command should be executed in the remote environment

type

string

flags

Set comma separated list of job submission flags

type

string

gpusPerNode

Request a specific number of GPUs per node

type

integer

gpusPerSlot

Set the number of GPU devices to assign to each slot

type

integer

gpusPerTask

Set the number of GPU devices to assign to each task

type

integer

jobName

Set an alternate job name for the job

type

string

labelIO

Add task rank prefixes to each line of output

type

boolean

nodes

Set the number of nodes to assign to the job

type

integer

nslots

Set the number of slots requested

type

integer

ntasks

Set the number of tasks to launch

type

integer

queue

Submit a job to a specific named queue

type

string

requires

Specify a set of allowable properties and other attributes to consider when matching resources for a job

type

string

rlimit

Control how process resource limits are propagated

type

string[]

setattr

Set jobspec attribute. Keys may include periods to denote hierarchy

type

object

setopt

Set shell option. Keys may include periods to denote hierarchy

type

object

taskmap

Choose an alternate method for mapping job task IDs to nodes of the job

type

string

tasksPerCore

Force a number of tasks per core

type

integer

tasksPerNode

Set the number of tasks per node to run

type

integer

timeLimit

Time limit in minutes when no units provided, otherwise in Flux standard duration (e.g., 30s, 2d, 1.5h). If a timeout value is defined directly in the workflow specification, it will override this value

type

string

unbuffered

Disable buffering of standard input and output as much as practical

type

boolean

urgency

Specify job urgency, which affects queue order. Numerically higher urgency jobs are considered by the scheduler first

type

integer