US Food and Drug Administration (FDA) §
BioCompute Object / IEEE Std 2791™-2020FDA approved the IEEE Std 2791™-2020 (also known as "BioCompute Object" ) standard for use in regulatory submissions of High-througput Sequencing Data. IEEE 2791™-2020 recommends that "[i]f execution portability is desired, then the included script should be in the Common Workflow Language v1.0 or later format.”
RAPT §
NCBI Insights : Read assembly and Annotation Pipeline Tool (RAPT) is available for use and testingRAPT is a NCBI pipeline designed for assembling and annotating short genomic sequencing reads obtained from bacterial or archaeal isolates. RAPT is written using docker for maximal portability. A RAPT Docker container includes SKESA, a high-accuracy assembler for short reads, PGAP, the annotation pipeline written in the common workflow language (CWL) and used by RefSeq, and cwltool, the reference implementation for CWL.
NCI Cancer Genomics Cloud §
NCI Cancer Genomics CloudThe Seven Bridges Cancer Genomics Cloud (CGC) by the US National Cancer Institute (NCI) enables researchers to rapidly access and collaborate on massive public cancer genomic datasets, including The Cancer Genome Atlas. Within the first 15 months, over 1,900 researchers have registered on the platform, representing 150 institutions across 30 countries. CGC users have deployed more than 5,000 tools or workflows and performed 80,000 executions, representing over 97 years of total computation. All tools on CGC packaged within Docker containers with execution instructions described using Common Workflow Language.
NIH BioData Catalyst §
NHLBI BioData CatalystThe BioData Catalyst powered by Seven Bridges offers researchers collaborative workspaces for analyzing genomics data at scale. Researchers can find and analyze the hosted TOPMed studies by using hundreds of optimized analysis tools and workflows (pipelines); creating their own workflows; or interactive analysis.
NCI Genomic Data Commons §
https://gdc.cancer.gov/The National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. All major GDC data production pipelines are written in the Common Workflow Language.
Gabriella Miller Kids First Data Resource Center §
Gabriella Miller Kids First Data Resource CenterThe NIH Common Fund-supported
Gabriella Miller Kids First Data Resource Center enables researchers, clinicians, and patients to work together to accelerate research and promote new discoveries for children affected with cancer and structural birth defects. The group collects patient data and processes the data with CWL workflows in order to create a dataset that is ready for further research to better know the diseases and ultimately improve outcomes for children with similar problems in the future.
Children's Hospital of Philadelphia §
CAVATICACAVATICA is a cloud-based platform for collaboratively accessing, sharing, and analyzing pediatric cancer data. Cavatica was created as part of the White House’s launch of the Precision Medicine Initiative. Cavatica enables researchers to analyze data through portable, shareable, and reproducible CWL workflows to discover the shared mutations or characteristics across many different diseases. Cavactica was created by the Center for Data-Driven Discovery in Biomedicine in collobration with Seven Bridges and interoperates with the Cancer Genomics Cloud. Cavatica was awarded Best-in-Show at the 2017 Bio-IT World Conference and was a highlighted partner of the Biden Cancer Moonshot.
European Open Science Cloud - Life (EOSC-Life) §
European Open Science Cloud - Life (EOSC-Life)EOSC-Life brings together the 13 Life Science ‘ESFRI’ research infrastructures (LS RIs) to create an open, digital and collaborative space for biological and medical research. The EOSC-Life Workflow Hub is a workflow registry designed around FAIR principles. Beta-released in Sept 2020, the Hub now holds nearly 100 workflows, including 36 curated COVID-19 workflows. Use of the Common Workflow Language is encouraged in the Workflow Hub since CWL provides a canonical description of the workflow.
BioExcel, Centre of Excellence for Computational Biomolecular Research §
BioExcel, Centre of Excellence for Computational Biomolecular ResearchThe BioExcel Center of Excellence supports academia and industry with the usage of advanced techniques for high-end computing. The CWL Viewer was created as a third-year project at The University of Manager and further developed as part of the BioExcel project. The CWL Viewer is available to visualize any CWL workflow definitions, show their annotations and composition.
ELIXIR §
Interoperability PlatformELIXIR coordinates and develops life science resources across Europe so that researchers can more easily find, analyse and share data, exchange expertise, and implement best practices. This makes it possible for them to gain greater insights into how living organisms work. ELIXIR's activities are divided into five areas called 'Platforms'. These are Data, Tools, Compute, Interoperability, and Training. The Interoperability Platform focuses on developing and encouraging the adoption of standards. One of the tasks of this plafrom is to develop a FAIR service infrastructure that incorporates tools that are fit-for-purpose. CWL is supported in this infrastructure to allow for workflow interoperability.
Memorial Sloan Kettering Cancer Center §
MSK-ACCESS teamMemorial Sloan Kettering Cancer Center has devoted more than 135 years to exceptional patient care, innovative research, and outstanding educational programs. It is one of 51 National Cancer Institute–designated Comprehensive Cancer Centers, with state-of-the-art science flourishing side by side with clinical studies and treatment. Genome scientists, bioinformaticians, and molecular pathologists at Memorial Sloan Kettering have developed MSK-Access. The MSK-ACCESS assay is a comprehensive liquid biopsy test that offers noninvasive cancer genomic profiling and disease monitoring. It is designed to detect genetic alterations in cfDNA (cell-free DNA) specimens, such as blood and other body fluids. CWL was used to create the MSK-ACCESS analysis workflows.
MGnify §
https://www.ebi.ac.uk/metagenomics/Microbiome research involves the study of all genomes present within a specific environment. The approach can provide unique insights into the complex processes performed by environmental micro-organisms and their relationship to their surroundings, to each other, and, in some cases, to their host. MGnify offers an automated pipeline for the analysis and archiving of microbiome data to help determine the taxonomic diversity and functional & metabolic potential of environmental samples. Users can submit their own data for analysis or freely browse all of the analysed public datasets held within the repository. In addition, users can request analysis of any appropriate dataset within the
European Nucleotide Archive (ENA). User-submitted or ENA-derived datasets can also be assembled on request, prior to analysis. In an effort to enable transparency of the analysis pipelines, particularly in terms of the tools (and their versions), parameters and reference databases that we employ, we have implemented our pipelines using the CWL standards.
UNLOCK §
UNLOCK-HomeUNLOCK is an open infrastructure for exploring new horizons for research on microbial communities. It is composed of three complementary experimental platforms for high-throughput discovery and characterization of microbial communities and a FAIR-data platform for large scale data storage, data extraction and analysis of high-throughput data in a cloud-based infrastructure.
PubSeq §
COVID-19 PubSeq: Public SARS-CoV-2 Sequence ResourcePubSeq is a public data and workflow initiative targeting COVID19 (initially). COVID-19 PubSeq is a free and open online bioinformatics public sequence resource with federated data using unique identifiers and with unique metadata, such as disambiguated Geo localisation. On COVID-19 PubSeq the data, metadata, and analysis tools live together, publicly and freely. PubSeq comes with on-the-fly analysis of sequenced SARS-CoV-2 samples that allows for a quick turnaround in identification of new virus strains. PubSeq allows anyone to upload sequence material in the form of FASTA or FASTQ files with accompanying metadata through a web interface or REST API. COVID-19 PubSeq uses CWL workflows and is backed by Arvados.
DataPLANT §
DataPLANTTogether with other disciplines, plant research increasingly relying on effective research data management services and infrastructures that facilitate the acquisition, archival, exchange, and processing of research data sets, to enable the exchange of interdisciplinary expertise. While various suggestions on best practices for FAIR data have been made, it is nevertheless always up to individual researchers` initiative and additional effort to adhere to them. Focused on its core mission to minimize the additional work of research data management, DataPLANT wants to support plant researchers in practice, providing technical services and infrastructure and personal support. Therefore, DataPLANT as the
NFDI (Nationale Forschungsdateninfrastruktur) for plant research, works in a data-centric way and builds on existing structures. A central element for achieving the goal is the
Annotated Research Context (ARC), which acts as single entry point and will define the structure of a future data publication. ARCs are FAIR Digital Objects covering the entire research cycle, from the experiment to the computational aspects to the actual data and metadata, as well as the resulting data publications using existing repositories. The computations that are stored inside an ARC are powered by the Common Workflow Language standards which enable researchers to describe standalone- or delegation workflows. This allows the integration and combination of workflows that were created using various workflow systems and languages while maintaining a uniform metadata description format on the top layer. Several tools and services are already available to ensure working with ARCs without friction in a collaborative research environment. Accordingly, DataPLANT represents the central point of contact for plant researchers to set up appropriate research data management.
The TogoImputation §
https://sc.ddbj.nig.ac.jp/en/advanced_guides/imputation_server/The TogoImputation enables accurate genotype imputation by leveraging controlled access datasets of the Japanese population. The server is made up of three modules: a workflow execution system, computational workflows specifying the inputs, steps, and outputs of genotype imputation, and a secure web user interface that allows users to specify parameters. Additionally, the server offers reference panels that are ready to use, saving users from having to perform laborious computations on whole-genome sequencing data. With its fully containerized implementation in CWL, the entire system is portable.
MGX 2.0 §
https://mgx-metagenomics.github.io/MGX uses an in-house developed workflow engine for read-level analysis
of metagenomic sequence datasets. That workflow engine (Conveyor) keeps
data in system memory and only writes it to files when e.g. external
programs are invoked to process the data. Since this is rather
impractical for metagenomic assembly, we added support for CWL and
implemented corresponding analysis workflows.
SUS-MIRRI.IT §
Strengthening the MIRRI Italian Research Infrastructure for Sustainable Bioscience and BioeconomySUS-MIRRI.IT is a research project targeting Italian microbial Culture Collections (CCs), financed by European
funds of the National Recovery and Resilience Plan (PNRR) for about 17 Million Euros. The project is coordinated
by the University of Turin and involves 24 Operative Units, belonging to 15 different Research Institutions,
spread across the whole Italian territory. SUS-MIRRI.IT objectives include: the harmonization of the procedures
for the certification of Italian CCs to meet international quality standards; the establishment of an efficient
system for their management; the creation of a unique on-line platform to access Italian microbial resources
along with their associated metadata, cutting-edge technologies, services, and expertise offered by the collections
to National and International stakeholders. The microbial analysis workflows developed in the project
will be made available as a service, leveraging CWL as a canonical description of the workflow.
Open Geospatial Consortium §
Open Geospatial ConsortiumThe Open Geospatial Consortium (OGC) is "an international consortium of more than 500 businesses, government agencies, research organizations, and universities driven to make geospatial (location) information and services FAIR - Findable, Accessible, Interoperable, and Reusable."
The European Open Science Cloud for Research Pilot Project §
Science Demonstrator: Earth Sciences – Hydrology: Switching on the EOSC for Reproducible Computational Hydrology by FAIRifying eWaterCycle and SWITCH-ON
The European Open Science Cloud will offer 1.7 million European researchers and 70 million professionals in science and technology a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines by federating existing scientific data infrastructures, today scattered across disciplines and Member States. The EOSCpilot project has been funded to support the first phase in the development of the European Open Science Cloud (EOSC). Within this pilot, science demonstrators show the relevance and usefulness of EOSC Services and how they enable data reuse, and will drive EOSC development.
The European Open Science Cloud for Research Pilot Project §
Science Demonstrator: LOFAR data
The European Open Science Cloud will offer 1.7 million European researchers and 70 million professionals in science and technology a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines by federating existing scientific data infrastructures, today scattered across disciplines and Member States. The EOSCpilot project has been funded to support the first phase in the development of the European Open Science Cloud (EOSC). Within this pilot, science demonstrators show the relevance and usefulness of EOSC Services and how they enable data reuse, and will drive EOSC development.
The Cherenkov Telescope Array Observatory §
https://www.cta-observatory.org/The Cherenkov Telescope Array Observatory (CTAO) is the next generation observatory for gamma-ray astronomy. It will consist of two arrays of Cherenkov telescopes, spread between two sites: one in the Northern hemisphere in La Palma (Spain), and one in the Southern hemisphere in Paranal (Chile). Currently under construction, CTAO will start scientific operations in the next years for about 30 years. To process observation data and run Monte Carlo simulations to derive the instrument response functions of the telescopes, CTAO has developed a production system prototype. It is based on the
DIRAC framework and manages the workload on a distributed infrastructure. The interface to this production system allows for the configuration and submission of simulation and data processing workflows described in CWL.
The [Netherlands] National Plan Open Science §
National Plan Open Science - Feb. 2019The National Programme Open Science (NPOS) was established to join and coordinate efforts towards open science and promote its importance. The National Plan Open Science serves to implement a transition towards an open science system in the Netherlands. The National Plan Open Science contains an inventory of existing initiatives. The eScience center is recongnized for the research activity support it provides. The eScience Center ensures that the entire scientific community has public access, on an open-source fully documented basis, to research data related to all research projects involving Dutch universities, through the eScience Technology Platform and open platforms such as GitHub and Zenodo. The eScience Center also makes an active contribution to the continued development and Dutch use of the most promising software and tools in the field of open science. Examples include the Common Workflow Language, for the description of analyses and the use of software, and Docker, which can be used to create a durable record of software for reuse. Both assist with the reproducible performance of analyses irrespective of the computer or environment.
University of Manchester, eScience Lab §
eScience Lab - Common Workflow LanguageThe eScience Lab team is focused on research and development around a set of tools designed for data driven and computational research. The eScience Lab has participated in the CWL project since its early days, contributing with our workflow and reproducibility expertise from Apache Taverna, Research Objects and myExperiment. The CWL Viewer, developed by the eScience Lab with support from BioExcel, provides a graphical visualization of CWL workflows.
WorkflowHub §
https://www.workflowhub.orgWorkflowHub is a registry for describing, sharing and publishing scientific computational workflows.
The registry supports any workflow in its native repository.
WorkflowHub aims to facilitate discovery and re-use of workflows in an accessible and interoperable way. This is achieved through extensive use of open standards and tools, including Common Workflow Language (CWL), RO-Crate, BioSchemas and TRS, in accordance with the FAIR principles.
The ACROSS European Project §
HPC Big Data Artificial Intelligence Cross Stack Platform Towards ExascaleThe ACROSS project (
G.A. 955648) is an European High-Performance Computing Joint Undertaking that will co-design and develop an HPC, BD, and Artificial Intelligence (AI) convergent platform, supporting applications in the
Aeronautics,
Climate and Weather, and
Energy domains. The project will leverage on effective mechanisms to describe and manage complex workflows for the next generation of heterogeneous pre-exascale infrastructures.
The EUPEX European Project §
European Pilot for ExascaleThe EUPEX pilot (
G.A. 101033975) brings together academic and commercial stakeholders to co-design a European modular Exascale-ready pilot system. Together, they will deploy a pilot hardware and software platform integrating the full spectrum of European technologies, and will demonstrate the readiness and scalability of these technologies, and particularly of the Modular Supercomputing Architecture (MSA), towards Exascale. EUPEX’s ambition is to support actively the European industrial ecosystem around HPC, as well as to prepare applications and users to efficiently exploit future European exascale supercomputers.