1.3. Basic Concepts#
This section describes the basic concepts for users to get started on working with Common Workflow Language (CWL) workflows. Readers are expected to be familiar with workflow managers, YAML, and comfortable with following instructions for the command-line. The other sections of the user guide cover the same concepts, but in more detail. If you are already familiar with CWL or you are looking for more advanced content, you may want to skip this section.
1.3.1. The CWL Specification#
CWL is a way to describe command-line tools and connect them together to create workflows. Because CWL is a specification and not a specific piece of software, tools and workflows described using CWL are portable across a variety of platforms that support the CWL standard.
The CWL specification is a document written and maintained by the CWL community.
The specification has different versions. The version covered in this user guide
is the v1.2
.
The specification version can have up to three numbers separated by .
s (dots).
The first number is the major release, used for backward-incompatible changes like
the removal of deprecated features. The second number is the minor release,
used for new features or smaller changes that are backward-compatible. The last number
is used for bug fixes, like typos and other corrections to the specification.
Note
The model used for the specification version is called Semantic Versioning. See the end of this section to learn more about it.
1.3.2. Implementations#
An implementation of the CWL specification is any software written following what is defined in a version of the specification document. However, implementations may not implement every aspect of the specification. CWL implementations are licensed under both Open Source and commercial licenses.
CWL is well suited for describing large-scale workflows in cluster, cloud and high performance computing environments where tasks are scheduled in parallel across many nodes.
1.3.3. Processes and Requirements#
A process is a computing unit that takes inputs and produces outputs. The
behavior of a process can be affected by the inputs, requirements, and hints.
There are four types of processes defined in the CWL specification
v1.2
:
A command-line tool.
An expression tool.
An operation.
A workflow.
A command-line tool is a wrapper for a command-line utility like echo
,
ls
, and tar
. A command-line tool can be called from a workflow.
An expression tool is a wrapper for a JavaScript expression. It can be used to simplify workflows and command-line tools, moving common parts of a workflow execution into reusable JavaScript code that takes inputs and produces outputs like a command-line tool.
Operation is an abstract process that also takes inputs, produces outputs, and can be used in a workflow. But it is a special operation not so commonly used. It is discussed in the Operations section of this user guide.
The workflow is a process that contains steps. Steps can be other workflows (nested workflows), command-line tools, or expression tools. The inputs of a workflow can be passed to any of its steps, while the outputs produced by its steps can be used in the final output of the workflow.
The CWL specification allows for implementations to provide extra functionality and specify prerequisites to workflows through requirements. There are many requirements defined in the CWL specification, for instance:
InlineJavascriptRequirement
- enables JavaScript in expressions.SubworkflowFeatureRequirement
- enables nested workflows.InitialWorkDirRequirement
- controls staging files in the input directory.
Some CWL runners may provide requirements that are not in the specification.
For example, GPU requirements are supported in cwltool
through the
cwltool:CUDARequirement
requirement, but it is not part of the
v1.2
specification and may not be supported by other CWL
runners.
Hints are similar to requirements, but while requirements list features that are required, hints list optional features. Requirements are explained in detail in the Requirements section.
1.3.4. FAIR Workflows#
The FAIR principles have laid a foundation for sharing and publishing digital assets, and in particular, data. The FAIR principles emphasize machine accessibility and that all digital assets should be Findable, Accessible, Interoperable, and Reusable. Workflows encode the methods by which the scientific process is conducted and via which data are created. It is thus important that workflows both support the creation of FAIR data and themselves adhere to the FAIR principles. — FAIR Computational Workflows, Workflows Community Initiative.
CWL has roots in “make” and many similar tools that determine order of execution, based on dependencies between tasks. However, unlike “make”, CWL tasks are isolated, and you must be explicit about your inputs and outputs.
The benefit of explicitness and isolation are flexibility, portability, and scalability; tools and workflows described with CWL can transparently leverage technologies such as Docker and be used with CWL implementations from different vendors.
cwltool
also uses the PROV-O standard ontology for data provenance.
1.3.5. Learn More#
Semantic Versioning - https://semver.org/
The CWL Specification page in the CWL website: https://www.commonwl.org/specification/
The Command Line Tool Description Standard: https://w3id.org/cwl/CommandLineTool.html
The current CWL specification on GitHub: common-workflow-language/cwl-v1.2
The list of Implementations in the CWL website: https://www.commonwl.org/implementations/
PROV-O: The PROV Ontology - https://www.w3.org/TR/prov-o/
CWL Operations are covered in the Operations section of this user guide.