Research
Select Project
nf-core/scge Pipeline
Cloud-native tumor-normal sequencing pipeline for the NIH Common Fund's Somatic Cell Genome Editing program. Built with Nextflow DSL2 and DRAGEN hardware acceleration on AWS.
Source ↗
Nextflow AWS Batch DRAGEN nf-core Genomics
The Problem
The NIH Common Fund’s Somatic Cell Genome Editing (SCGE) program required a reproducible, scalable tumor-normal variant calling pipeline capable of handling petabyte-scale sequencing datasets. Existing workflows lacked the cloud elasticity and hardware acceleration needed for high-depth clinical-grade analysis under tight nf-core reproducibility standards.
The Solution
Co-developed a cloud-native Nextflow DSL2 pipeline adhering to full nf-core community conventions — enabling automated, reproducible somatic variant calling at NIH scale.
Architecture
- Nextflow DSL2 for modular, composable workflow components with strict versioning
- Illumina DRAGEN hardware acceleration on AWS Batch for high-depth tumor-normal variant calling — dramatically reducing compute time vs. software-only approaches
- nf-core conventions for reproducibility, automated testing, CI/CD, and community contribution standards
- Docker/Singularity dual containerization for portability across HPC and cloud environments
- AWS Batch elastic compute scaling — spin up and tear down resources automatically based on input queue depth
Key Results
- Minimized runtime for high-depth tumor-normal variant calling through DRAGEN acceleration
- Full compliance with nf-core community standards, enabling peer review and external validation
- Handles petabyte-scale sequencing datasets with automated ingestion and provenance tracking
- Production-deployed for active NIH SCGE program research cohorts