Put it all together
The entrypoint of each StreamFlow execution is a YAML file, conventionally called streamflow.yml
. The role of such file is to link each task in a workflow with the service that should execute it.
A valid StreamFlow file contains the version
number (currently v1.0
) and two main sections: workflows
and deployments
. The workflows
section consists of a dictionary with uniquely named workflows to be executed in the current run, while the deployments
section contains a dictionary of uniquely named deployment specifications.
Describing deployments
Each deployment entry contains two main sections. The type
field identifies which Connector
implementation should be used for its creation, destruction and management. It should refer to one of the StreamFlow connectors described here. The config
field instead contains a dictionary of configuration parameters which are specific to each Connector
class.
Describing workflows
Each workflow entry contains three main sections. The type
field identifies which language has been used to describe it (currently the only supported value is cwl
), the config
field includes the paths to the files containing such description, and the bindings
section is a list of step-deployment associations that specifies where the execution of a specific step should be offloaded.
In particular, CWL workflows config
contain a mandatory file
entry that points to the workflow description file (usually a *.cwl
file similar to the example reported here) and an optional settings
entry that points to a secondary file, containing the initial inputs of the workflow.
Binding steps and deployments
Each entry in the bindings
contains a step
directive referring to a specific step in the workflow, and a target
directive referring to a deployment entry in the deployments
section of the StreamFlow file.
Each step can refer to either a single command or a nested sub-workflow. Steps are uniquely identified by means of a Posix-like path, where each simple task is mapped to a file and each sub-workflow is mapped to a folder. In partiuclar, the most external workflow description is always mapped to the root folder /
. Considering the example reported here, you should specify /compile
in the step
directive to identify the compile
step, or /
to identify the entire workflow.
The target
directive binds the step with a specific service in a StreamFlow deployment. As discussed in the architecture section, complex deployments can contain multiple services, which represent the unit of binding in StreamFlow. The best way to identify services in a deployment strictly depends on the deployment specification itself.
For example, in DockerCompose it is quite straightforward to uniquely identify each service by using its key in the services
dictionary. Conversely, in Kubernetes we explicitly require users to label containers in a Pod with a unique identifier through the name
attribute, in order to unambiguously identify them at deploy time.
Simpler deployments like single Docker or Singularity containers do not need a service layer, since the deployment contains a single service that is automatically uniquely identified.
Example
The following snippet contains an example of a minimal streamflow.yml
file, connecting the compile
step of this workflow with an openjdk
Docker container.
version: v1.0
workflows:
extract-and-compile:
type: cwl
config:
file: main.cwl
settings: config.yml
bindings:
- step: /compile
target:
deployment: docker-openjdk
deployments:
docker-openjdk:
type: docker
config:
image: openjdk:9.0.1-11-slim