Operations

As shown in the architecture section, you need three different components to run a hybrid workflow with StreamFlow:

  • A workflow description, i.e. a representation of your application as a graph.

  • One or more deployment descriptions, i.e. infrastructure-as-code representations of your execution environments.

  • A StreamFlow file to bind each step of your workflow with the most suitable execution environment.

StreamFlow will automatically take care of all the secondary aspects, like checkpointing, fault-tolerance, data movements, etc.

Write your workflow

StreamFlow relies on the Common Workflow Language (CWL) standard to describe workflows. In particular, it supports version v1.2 of the standard, which introduces conditional execution of workflow steps.

The reader is referred to the official CWL documentation to learn how the workflow description language works, as StreamFlow does not introduce any modification to the original specification.

Note

StreamFlow supports all the features required by the CWL standard conformance, and nearly all optional features. For a complete overview of CWL conformance status, look here.

The following snippet contain a simple example of CWL workflow, which extracts a Java source file from a tar archive and compiles it.

cwlVersion: v1.2
class: Workflow
inputs:
  tarball: File
  name_of_file_to_extract: string

outputs:
  compiled_class:
    type: File
    outputSource: compile/classfile

steps:
  untar:
    run:
      class: CommandLineTool
      baseCommand: [tar, --extract]
      inputs:
        tarfile:
          type: File
          inputBinding:
            prefix: --file
        extractfile: string
      outputs:
        extracted_file:
          type: File
          outputBinding:
            glob: $(inputs.extractfile)
    in:
      tarfile: tarball
      extractfile: name_of_file_to_extract
    out: [extracted_file]

  compile:
    run:
      class: CommandLineTool
      baseCommand: javac
      arguments: ["-d", $(runtime.outdir)]
      inputs:
        src:
          type: File
          inputBinding:
            position: 1
      outputs:
        classfile:
          type: File
          outputBinding:
            glob: "*.class"
    in:
      src: untar/extracted_file
    out: [classfile]

Import your environment

StreamFlow relies on external specification and tools to describe and orchestrate remote execution environment. As an example, a Kubernetes-based deployment can be described in Helm, while a resource reservation request on a HPC facility can be specified with either a Slurm or PBS files.

This feature allows users to stick with the technologies they already know, or at least with production grade tools that are solid, maintained and well documented. Moreover, it adheres to the infrastructure-as-code principle, making execution environments easily portable and self-documented.

The lifecycle management of each StreamFlow model is demanded to a specific implementation of the Connector interface. Connectors provided by default in the StreamFlow codebase are reported in the table below, but users can add new connectors to the list by simply creating their own implementation of the Connector interface.

Name

Class

docker

streamflow.deployment.connector.docker.DockerConnector

docker-compose

streamflow.deployment.connector.docker.DockerComposeConnector

helm

streamflow.deployment.connector.kubernetes.Helm3Connector

helm2 (Deprecated)

streamflow.deployment.connector.kubernetes.Helm2Connector

helm3

streamflow.deployment.connector.kubernetes.Helm3Connector

occam

streamflow.deployment.connector.occam.OccamConnector

pbs

streamflow.deployment.connector.queue_manager.PBSConnector

singularity

streamflow.deployment.connector.singularity.SingularityConnector

slurm

streamflow.deployment.connector.queue_manager.SlurmConnector

ssh

streamflow.deployment.connector.ssh.SSHConnector

Put it all together

The entrypoint of each StreamFlow execution is a YAML file, conventionally called streamflow.yml. The role of such file is to link each task in a workflow with the service that should execute it.

A valid StreamFlow file contains the version number (currently v1.0) and two main sections: workflows and models. The workflows section consists of a dictionary with uniquely named workflows to be executed in the current run, while the models section contains a dictionary of uniquely named model specifications.

Describing models

Each model entry contains two main sections. The type field identifies which Connector implementation should be used for its creation, destruction and management. It should refer to one of the StreamFlow connectors described above. The config field instead contains a dictionary of configuration parameters which are specific to each Connector class.

Describing workflows

Each workflow entry contains three main sections. The type field identifies which language has been used to describe it (currently the only supported value is cwl), the config field includes the paths to the files containing such description, and the bindings section is a list of step-model associations that specifies where the execution of a specific step should be offloaded.

In particular, CWL workflows config contain a mandatory file entry that points to the workflow description file (usually a *.cwl file similar to the example reported above) and an optional settings entry that points to a secondary file, containing the initial inputs of the workflow.

Binding steps and models

Each entry in the bindings contains a step directive referring to a specific step in the workflow, and a target directive refering to a model entry in the models section of the StreamFlow file.

Each step can refer to either a single command or a nested sub-workflow. Steps are uniquely identified by means of a Posix-like path, where each simple task is mapped to a file and each sub-workflow is mapped to a folder. In partiuclar, the most external workflow description is always mapped to the root folder /. Considering the example reported above, you should specify /compile in the step directive to identify the compile step, or / to identify the entire workflow.

The target directive binds the step with a specific service in a StreamFlow model. As discussed in the architecture section, complex models can contain multple services, which represent the unit of binding in StreamFlow. The best way to identify services in a model strictly depends on the model specification itself. For example, in DockerCompose it is quite straightforward to uniquely identify each service by using its key in the services dictionary. Conversely, in Kubernetes we explicitly require users to label containers in a Pod with a unique identifier through the name attribute, in order to unambiguously identify them at deploy time.

Simpler models like single Docker or Singularity containers do not need a service layer, since the model contains a single service that is automatically uniquely identified.

Example

The following snippet contains an example of a minimal streamflow.yml file, connecting the compile step of the previous workflow with an openjdk Docker container.

version: v1.0
workflows:
  extract-and-compile:
    type: cwl
    config:
      file: main.cwl
      settings: config.yml
    bindings:
      - step: /compile
        target:
          model: docker-openjdk

models:
  docker-openjdk:
    type: docker
    config:
      image: openjdk:9.0.1-11-slim

Run your workflow

To run a workflow with the StreamFlow CLI, simply use the following command:

streamflow run /path/to/streamflow.yml

Note

For CWL workflows, StreamFlow also supports the cwl-runner interface (more details here).

The --outdir option specifies where StreamFlow must store the workflow results and the execution metadata, collected in a .streamflow folder. By default, StreamFlow uses the current directory as its output folder.

Generate a report

The streamflow report subcommand can generate a timeline report of workflow execution. A user must execute it in the parent directory of the .streamflow folder described above. By default, an interactive HTML report is generated, but users can specify a different format through the --format option.