NGS makes the step from raw reads to a clinical variant call look like a button click. It is not. The pipeline between FASTQ and "pathogenic" holds at least eight places where a bad default quietly produces a false positive that then gets reported to a patient. This piece walks through the QC gates that stop that from happening.
The stages and their failure modes
1. FASTQ QC
FastQC + MultiQC are the baseline but the thresholds matter. For clinical germline:
- Per-base Q30 ≥ 80% across the read, not just the mean.
- Adapter contamination < 1%.
- Per-tile sequence quality flat — tile hot-spots mean physical lane damage.
- Overrepresented sequences investigated individually (often index hopping or primer dimer).
2. Trimming and alignment
Aggressive trimming (e.g., sliding-window Q20) removes real reads in low-coverage regions and inflates variant allele fractions. Use fixed-length trimming for adapters and let downstream callers handle base quality. Alignment with BWA-MEM2 for short reads, minimap2 for long reads. Report duplicate rate, insert-size distribution, and per-chromosome coverage.
3. Coverage QC
The number most clinical reports hide behind: mean coverage. Mean coverage 150x still means 3% of the target panel has < 20x — and that 3% is where your reportable variant might be. Report:
- Mean and median coverage.
- Fraction of target at ≥ 20x, ≥ 30x, ≥ 50x.
- Per-exon coverage table with dropouts flagged.
- Uniformity (fraction of bases within 0.2x of the mean).
4. Variant calling
DeepVariant, GATK HaplotypeCaller, or Strelka2 — each has known blind spots. The practical answer is ensemble calling: run two callers, intersect calls, and flag disagreements for manual review. For somatic: Mutect2 + Strelka2 with tumor-normal pairing; panel-of-normals is mandatory.
5. Variant QC filters
Defaults from the caller are rarely clinically appropriate. Minimum useful filter set:
- Depth ≥ 20 at the variant position.
- Allele fraction ≥ 0.3 for germline het (flag 0.2–0.3 for mosaic investigation); somatic thresholds are panel-specific.
- Strand bias (SOR / FS) within caller-recommended range.
- Base quality and mapping quality of supporting reads.
- Position within read (end-of-read variants are error-prone).
- Repetitive-region flag from a masked reference.
6. Annotation
VEP or snpEff with a pinned cache version. Population frequencies from gnomAD (pinned version). ClinVar + HGMD for known pathogenicity. OMIM for gene-disease linkage. Every annotation source version must be in the report — "gnomAD frequency 0.001" is meaningless without the dump date.
7. ACMG classification
For germline, the ACMG 2015 + 2018 refinements are the reference. Automated classifiers (InterVar, Franklin) get to the same call as expert review ~80% of the time — the 20% difference is where clinical judgment matters. Use the automated call as a starting point, never as the final call, and document the override.
8. Reporting
A clinical NGS report needs: specimen ID, panel version, coverage summary, reportable variants with ACMG classification and evidence, VUSes with evidence, technical limitations (regions not covered), signatory. Everything that a physician might need to question the call has to be retrievable from the report.
ISO 15189 specifics for NGS
- Method validation per panel: analytical sensitivity (≥ 99% at ≥ 30x), specificity (< 1 false positive per Mb), reproducibility across runs / operators / reagent lots.
- Limit of detection for mosaic and somatic variants with explicit allele-fraction cut-offs.
- Proficiency testing — participation in a recognized EQA (GenQA, NEQAS) is not optional.
- Change control — reference genome, caller version, annotation cache version all need impact assessment before update.
How AiLabrix fits
Drop FASTQ (or BAM, or VCF if you only want post-call QC). The pipeline runs FastQC → alignment → coverage QC → ensemble variant calling → filters → VEP+gnomAD annotation → InterVar scaffolding → signed PDF with coverage heatmap, variant tables, ACMG evidence bullets and the full pipeline version lock. Reference material and truth sets (GIAB) are baked in for per-run sanity. [email protected] for a pilot.
See AiLabrix on your data
Drop in a CSV. The 26-agent pipeline produces a signed GxP report with full audit trail.
Request a 30-minute demo →