FOR RESEARCH USE ONLY — AI-ASSISTED — NOT FOR CLINICAL DECISION MAKING
Article

LC-MS metabolomics — a reproducible pipeline from raw spectra to pathway enrichment

Metabolomics is the omics where batch effects win more papers than biology. Between peak picking, alignment, normalization, identification and enrichment, a typical LC-MS untargeted run accumulates so many defaults that two labs analyzing the same data rarely converge. This piece walks through a pipeline that makes the run reproducible and the biology defensible.

Reproducibility starts at the bench

No pipeline saves a run with poor acquisition. The bench-side discipline that actually matters:

Without these, no downstream correction recovers the lost signal-to-noise.

The pipeline

1. Raw data conversion

Vendor format → mzML via ProteoWizard msconvert with centroiding applied at conversion (not re-applied downstream). Keep the raw files and the mzML — they will be asked for.

2. Peak picking + alignment

XCMS (R/Bioconductor) remains the reference. For LC-MS DDA:

Lock every parameter in the SOP. "We tuned peak picking until it looked right" is not a method.

3. QC-based normalization

Raw intensities drift ~10–30% across a 200-sample batch. QC-RSC (robust splines on pooled QC) corrects most of it without overfitting. Report pre- and post-normalization RSD on QC samples per feature — < 20% RSD is the working threshold for keeping a feature.

4. Batch effect detection

PCA on QC samples is your alignment across batches; PCA on biological samples is your biology check. If the two overlap, you have a batch effect that is masking your signal. ComBat or RUV-random can correct, but correction is better avoided — fix it upstream where possible.

5. Identification

Metabolomics Standards Initiative (MSI) levels:

Every identified metabolite must carry its MSI level in the report. A paper claiming "120 metabolites" where 115 are Level 3 is a paper about unknowns.

6. Statistical analysis

Log-transform (log2 with offset) before testing. Multiple-testing correction with Benjamini-Hochberg FDR, not Bonferroni, and not raw p-values. Effect size (fold change) reported alongside q-values. PLS-DA is a visualization, not a hypothesis test — validate with permutation testing.

7. Pathway enrichment

MetaboAnalyst or Mummichog for untargeted, with KEGG or Reactome pathway sets. Report pathway hits with the feature count, the p-value, and the FDR. Interpret hits with suspicion when feature count < 3 — too few to carry meaning.

The audit trail that survives peer review

How AiLabrix fits

Drop the mzML batch plus the sample metadata CSV. The pipeline runs XCMS with SOP-locked parameters, QC-RSC normalization, batch-effect detection, library-based identification with MSI leveling, statistical tests with FDR, and pathway enrichment via MetaboAnalyst. Output is a signed PDF plus machine-readable feature tables that Methods sections can link to directly. [email protected].

See AiLabrix on your data

Drop in a CSV. The 26-agent pipeline produces a signed GxP report with full audit trail.

Request a 30-minute demo →