High-content imaging (HCI) generates more data per experiment than any other cell biology technique — thousands of cells per well, hundreds of features per cell, hundreds of wells per plate — and most of it is analyzed with tools that would embarrass a clinical chemist. Threshold-based manual selection, unstandardized illumination correction, and ad-hoc feature selection are the norm. This piece covers the pipeline that turns HCI data into reproducible, mechanistically interpretable results.
Imaging protocol — the upstream decisions that fix the analysis
Image quality problems cannot be corrected downstream. The acquisition decisions that matter most:
- Illumination uniformity: flatfield correction is non-negotiable. Illumination intensity typically varies 15–40% across the field of view. Acquire at least 200 flat-field images per channel per plate run (or use a fluorescent slide) and apply correction before any other processing. Correcting post-hoc with CellProfiler CorrectIlluminationApply is standard but requires the flat-field reference to be from the same plate, same session, and same objective position.
- Focus strategy: autofocus based on the nuclear channel (Hoechst, DAPI) produces consistent results across wells; cell-based autofocus is more robust than image-based at low cell density. Document the focus algorithm and offset settings.
- Magnification choice: 10× covers more cells per well (better statistics) at lower resolution; 20× resolves subcellular structures but requires tiling for sufficient n. Match the magnification to the feature of interest — nuclear morphology at 10× is sufficient; mitochondrial network topology needs 40×.
- Exposure time and dynamic range: set exposure to avoid saturation in the brightest expected positive control. Saturated pixels cannot be recovered and corrupt texture features. Verify the dynamic range with a dilution series of the staining reagent.
Image segmentation — the step where most projects fail
Segmentation quality determines every downstream feature. Poor segmentation cannot be compensated by feature selection.
Nuclear segmentation (primary objects)
Watershed-based segmentation on the nuclear channel is robust for non-touching nuclei. Dense cultures require declumping: adaptive thresholding (Otsu per tile) + shape-guided watershed with concavity detection. Validate segmentation on ≥ 3 representative wells from the density extremes expected in the assay. Acceptable over-segmentation rate: < 5% of objects; acceptable under-segmentation (merged nuclei): < 3%.
Cell body segmentation (secondary objects)
Propagate from the nuclear seed outward using a cell-body stain (CellMask, plasma membrane marker, or cytoplasmic fluorescence). Background thresholding parameters must be validated at every staining concentration — changing the stain concentration invalidates the segmentation threshold.
Subcellular compartment segmentation
Mitochondria (MitoTracker), ER (ER-Tracker), puncta (lysosomes, endosomes), cytoskeleton — each requires its own segmentation strategy, typically intensity-based detection from the secondary object mask.
Feature extraction — what to measure
CellProfiler extracts ~1,000–3,000 features per cell per staining panel if run without filtering. This is both the power and the problem of HCI: high dimensionality enables nuanced phenotyping but also overfitting, batch effects that dominate biological signal, and analysis pipelines that are impossible to reproduce from a paragraph in a Methods section.
Standard feature categories:
- Morphology: area, perimeter, form factor, eccentricity, solidity, compactness — computed per object (nucleus, cell, each organelle compartment).
- Intensity: mean, standard deviation, median, minimum/maximum, integrated intensity — per channel per object. Cross-channel correlations are particularly informative.
- Texture: Haralick features, Gabor, Zernike moments — sensitive to subcellular pattern but also highly sensitive to focus and illumination variation. Use with caution if illumination correction is imperfect.
- Radial distribution: fractional intensity at defined radii from nucleus center — captures nuclear-to-cytoplasmic translocation (NF-κB, FOXO3 assays).
- Neighbors: distance to nearest neighbor, number of neighbors within a defined radius — quantifies cell clustering and multicellular organization.
Quality control at the well and plate level
- Cell count per well: flag wells with < 50% or > 200% of the median cell count on the plate — too few cells means uneven seeding or dead cells washed away; too many suggests segmentation failure (merged objects).
- Focus score per well: a global texture metric (e.g., Brenner gradient on the raw nuclear image) quantifies sharpness; flag wells below the 5th percentile of the plate distribution.
- Contamination detection: wells with a large fraction of objects with unusual circularity, abnormal nuclear area, or non-cellular texture are flagged for visual inspection.
- Batch effect detection: run a positive control (known phenotypic compound) and a vehicle control on every plate; track the separation between them with a standardized effect size (e.g., SSMD) per plate. SSMD ≥ 2 indicates a usable plate; SSMD < 1 means the plate cannot support phenotypic conclusions.
Dimensionality reduction and phenotypic profiling
~1,500 features per cell → per-well means → dimensionality reduction is a required step, not an optional visualization.
- Feature normalization: subtract the median of the vehicle control wells per plate and divide by the MAD (robust Z-score) — this removes plate-to-plate systematic variation from the feature space before dimensionality reduction.
- Feature selection: remove near-zero-variance features, remove features with Pearson correlation > 0.95 to another feature (one from each correlated pair). Typical reduction: 1,500 → 400–600 features.
- PCA: first reduction pass; check scree plot — if > 80% variance in PC1 alone, a dominant technical artifact exists (often illumination or focus).
- UMAP or t-SNE: visualization of phenotypic clusters; do not use for distance-based analysis — PCA scores are better for machine learning inputs.
- Phenotypic clustering: k-means or HDBSCAN on PCA scores; validate cluster assignment against known mechanism-of-action compounds. A good HCI platform separates cytoskeletal disruptors, kinase inhibitors, and DNA-damage agents into distinct clusters from the morphological profile alone.
Mechanism-of-action profiling with Cell Painting
Cell Painting (Bray et al., Nature Protocols 2016) is a standardized six-channel morphological profiling assay: nucleus, ER, nucleoli, actin/plasma membrane/mitochondria, Golgi/actin. It produces ~3,000 features and enables mechanism-of-action prediction by nearest-neighbor to a reference compound library (RxRx, JUMP-CP). For this to work:
- Protocol must be followed exactly — any substitution invalidates the cross-study comparison.
- Image acquisition parameters must match the reference dataset (magnification, bit depth, channel order).
- Feature normalization must use the same pipeline version as the reference library.
How AiLabrix fits
Drop the CellProfiler output CSV (or raw image folder for in-pipeline segmentation) plus the plate metadata. The pipeline applies flatfield-corrected segmentation QC, per-well cell count and focus filtering, SSMD-based plate QC, robust Z-score normalization, feature selection, PCA + UMAP dimensionality reduction, phenotypic clustering with silhouette scoring, and nearest-neighbor mechanism-of-action annotation from a reference compound library. For Cell Painting runs, the Bray 2016 feature normalization pipeline is applied automatically. Signed PDF with segmentation QC figures, batch-effect checks, phenotypic UMAP, compound ranking by cluster assignment and morphological distance, and a full feature-selection audit trail. [email protected] for a demo on your imaging data.
See AiLabrix on your data
Drop in a CSV. The 26-agent pipeline produces a signed GxP report with full audit trail.
Request a 30-minute demo →