Logo

Steps

Overview

An Orquestra workflow executes a collection of steps. A step is a unit of execution in a workflow and supports the following fields:

Field Description
name Name of the step
config The configuration and execution plan for this step. This is where a runtime is declared and where components are invoked
inputs The list of input parameters to this step
outputs The list of output parameters from this step

The example is a step that uses the python3 runtime configuration to invoke two components, hellov1 & hellov2, in the imports field. These two components can access parameters specified in the inputs field, with the expected result from the component referenced in the outputs field. This step also specifices the desired amount of compute resources for execution.

steps:
- name: getBioData # the name of the step
  config:
    runtime:
      language: python3 # the language used to run this step
      imports: [hellov1, hellov2] # the name of the component that is used by this step
      parameters:
        file: hellov1/src/python/orquestra/main.py # the file for the entrypoint used 
        function: main # the entry function to envoke
    resources: # the resources requested to run this step (optional)
      cpu: "1000m" # default cpu value
      memory: "1Gi" # default memory value
      disk: "10Gi" # default disk value
  inputs: # the arguments to pass into the entrypoint
  - name: "John Doe" # set var called name to string value "john doe"
    type: string
  - year: 1992 # set var called year to int value 1992
    type: int
  outputs: # one or more outputs from  the step
  - name: bio-data # the name of an output generated by the entrypoint used
    type: json

Name

The name of the step. Some restrictions:

  • Step name need to be unique to the workflow.
  • start with an alphanumeric character
  • end with an alphanumeric character
  • contains only lowercase alphanumeric characters or ‘-’
  • name cannot be greater than 30 characters in length

Config

A config is used to specify the component to invoke, its runtime context and the amount of compute resources for the quantum engine to allocate. config supports the following fields:

Field Description
runtime The runtime and imports to use for this step’s execution
resources The desired resource for this step

Runtime

Field Description
language python3
customImage Optional. Defaults to zapatacomputing/z-quantum-default:latest
imports The imports field is a list of the names of the components we want to use for the current task.
parameters The entrypoint for this step’s execution.

file: the path to the file to execute, must exist in one of the components referenced in imports list.
function: specifies the function to invoke.

Resources

The resources field is an optional field that allows a workflow author to request specific compute resources for their step.

NOTE The requested amount of resources is not guaranteed by the Quantum Engine and is subject to the current runtime environment.

Field Description
cpu Default is 1000m
memory Default value is 1Gi
disk Default value is 10Gi

Inputs

The inputs field is a list of arguments that will be made available to the step.

Field Description
<name> <name> is the argument to be exposed to the component, with its value set in the field
type The type of the value specified in <name>. Can be one of string, int, json, or <custom>.

Example

  inputs: # the arguments to pass into the entrypoint
  - name: "John Doe" # set var called name to string value "john doe"
    type: string
  - year: 1992 # set var called year to int value 1992
    type: int
  - foo: "bar" # set var called foo to value bar with a custom type `foobar`
    type: foobar

Outputs

The ouputs field is a list of arguments that result from the completion of the step.

Field Description
<name> <name> is the argument to be exposed to the component, with its value set in the field
type The type of the value specified in <name>. Can be one of string, int, json, or <custom>.

Example

  outputs: # the arguments to pass into the entrypoint
  - name: "John Doe" # set var called name to string value "john doe"
    type: string
  - year: 1992 # set var called year to int value 1992
    type: int
  - foo: "bar" # set var called foo to value bar with a custom type `foobar`
    type: foobar

Additional Functionality

Referencing Step Outputs

In workflows, it is often the case that we want to pass the output of one step as the input into the next. This allows us to perform more complex operations on our data while also maintaining the maximum amount of data for analysis.

In the example below, we have a component consisting of two steps. greeting produces an output with the name welcome of type message which is then used as the input to transform-welcome.

- name: greeting
  config:
    runtime:
      language: python3
      imports: [welcome-to-orquestra]
      parameters:
        file: welcome-to-orquestra/src/python/orquestra/welcome.py
        function: welcome
    resources:
      cpu: "1000m"
      memory: "1Gi"
      disk: "15Gi"
  outputs:
  - name: welcome
    type: message
- name: transform-welcome
  passed: [greeting]
  config:
    runtime:
      language: python3
      imports: [ztransform]
      parameters:
        file: ztransform/tasks/ztransformation.py
        function: z_transformation
  inputs:
    - message: ((greeting.welcome))
      type: message
  outputs:
    - name: zelcome
      type: zessage

As shown above, we use the syntax ((stepName.outputs.name)) to reference the output of greeting in the input of the transform-welcome step. Notice that the type is also carried over from the welcome output in the input.type specification.

Serial Steps vs Parallel Steps

In Orquestra, we also have the capability to control which steps in a component are performed in parallel and which are performed serially. By having this control, we enable workflows to take advantage of the distributed nature of Orquestra - allowing you to get your results quicker. To show how to control your step scheduling, let’s take a look at some examples.

Example 1: Serial Scheduling

To ensure that steps are performed serially, you must ensure that each step has two -'s before the name field.

  - name: serial-example
    steps:
    - - name: step-1
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]
    - - name: step-2
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]

In the above example, step-1 and step-2 both have two -'s and therefore step-2 will not begin to execute until step-1 has finished.

Example 2: Parallel Scheduling

To allow a set of steps to be performed in parallel, the syntax becomes as follows: the first step in the set of parallel steps still needs two -'s, however, all other steps in the set should only have one -.

  - name: parallel-example
    steps:
    - - name: step-1
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]
      - name: step-2
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]
      - name: step-3
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]

In the above example, step-1, step-2, and step-3 are all performed in parallel with each other.

Example 3: Mixed Scheduling

Often it is the case that we want a step in a component to perform only after all steps in a set of parallel steps have finished. See the example below for how to define this properly.

  - name: mixed-example
    steps:
    - - name: serial-step-1
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]
    - - name: parallel-step-1
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]
      - name: parallel-step-2
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]
      - name: parallel-step-3
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]
    - - name: serial-step-2
        component: example-component
        arguments:
          parameters:
          - resources: [example-resource]

In the above example, the steps execute in the following order:

  1. serial-step-1 begins
  2. serial-step-1 finishes
  3. parallel-step-1, parallel-step-2, and parallel-step-3 all begin executing
  4. parallel-step-1, parallel-step-2, and parallel-step-3 all finish executing (it is entirely possible for one of these steps to finish before another begins)
  5. serial-step-2 begins
  6. serial-step-2 finishes