FOR RESEARCH USE ONLY — AI-ASSISTED — NOT FOR CLINICAL DECISION MAKING
Article

NGS variant calling QC — from FASTQ to defendable variants

NGS makes the step from raw reads to a clinical variant call look like a button click. It is not. The pipeline between FASTQ and "pathogenic" holds at least eight places where a bad default quietly produces a false positive that then gets reported to a patient. This piece walks through the QC gates that stop that from happening.

The stages and their failure modes

1. FASTQ QC

FastQC + MultiQC are the baseline but the thresholds matter. For clinical germline:

2. Trimming and alignment

Aggressive trimming (e.g., sliding-window Q20) removes real reads in low-coverage regions and inflates variant allele fractions. Use fixed-length trimming for adapters and let downstream callers handle base quality. Alignment with BWA-MEM2 for short reads, minimap2 for long reads. Report duplicate rate, insert-size distribution, and per-chromosome coverage.

3. Coverage QC

The number most clinical reports hide behind: mean coverage. Mean coverage 150x still means 3% of the target panel has < 20x — and that 3% is where your reportable variant might be. Report:

4. Variant calling

DeepVariant, GATK HaplotypeCaller, or Strelka2 — each has known blind spots. The practical answer is ensemble calling: run two callers, intersect calls, and flag disagreements for manual review. For somatic: Mutect2 + Strelka2 with tumor-normal pairing; panel-of-normals is mandatory.

5. Variant QC filters

Defaults from the caller are rarely clinically appropriate. Minimum useful filter set:

6. Annotation

VEP or snpEff with a pinned cache version. Population frequencies from gnomAD (pinned version). ClinVar + HGMD for known pathogenicity. OMIM for gene-disease linkage. Every annotation source version must be in the report — "gnomAD frequency 0.001" is meaningless without the dump date.

7. ACMG classification

For germline, the ACMG 2015 + 2018 refinements are the reference. Automated classifiers (InterVar, Franklin) get to the same call as expert review ~80% of the time — the 20% difference is where clinical judgment matters. Use the automated call as a starting point, never as the final call, and document the override.

8. Reporting

A clinical NGS report needs: specimen ID, panel version, coverage summary, reportable variants with ACMG classification and evidence, VUSes with evidence, technical limitations (regions not covered), signatory. Everything that a physician might need to question the call has to be retrievable from the report.

ISO 15189 specifics for NGS

How AiLabrix fits

Drop FASTQ (or BAM, or VCF if you only want post-call QC). The pipeline runs FastQC → alignment → coverage QC → ensemble variant calling → filters → VEP+gnomAD annotation → InterVar scaffolding → signed PDF with coverage heatmap, variant tables, ACMG evidence bullets and the full pipeline version lock. Reference material and truth sets (GIAB) are baked in for per-run sanity. [email protected] for a pilot.

See AiLabrix on your data

Drop in a CSV. The 26-agent pipeline produces a signed GxP report with full audit trail.

Request a 30-minute demo →