Agent Skill
2/7/2026

clustermarkersofallcells

Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

P
pwwang
20GitHub Stars
1Views
npx skills add pwwang/immunopipe

SKILL.md

Nameclustermarkersofallcells
DescriptionFinds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

name: clustermarkersofallcells description: Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

ClusterMarkersOfAllCells Process Configuration

Purpose

Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

When to Use

  • After SeuratClusteringOfAllCells: Runs on all cells before T/B selection
  • Before TOrBCellSelection: Provides markers to identify which clusters are T/B cells
  • Broad cell type identification: Distinguish major immune cell types from mixed populations
  • Mixed cell populations: When your data contains T, B, Myeloid, NK, and other cell types
  • Initial cell typing: First-pass identification before detailed annotation
  • Data quality check: Verify expected cell types are present in your data

Configuration Structure

Process Enablement

[ClusterMarkersOfAllCells]
cache = true

Input Specification

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
# Accepts output from SeuratClusteringOfAllCells process

Environment Variables

All parameters are inherited from ClusterMarkers and MarkersFinder:

[ClusterMarkersOfAllCells.envs]
# Parallel computing
ncores = 1

# Grouping (uses seurat_clusters by default)
group_by = null  # null = use Seurat::Idents() (usually "seurat_clusters")

# Statistical test parameters (passed to Seurat::FindMarkers())
test.use = "wilcox"           # wilcox (Wilcoxon), bimod, roc, t, negbinom, poisson
min.pct = 0.1                  # Only test genes detected in >=10% of cells
logfc.threshold = 0.25         # Minimum log2 fold change

# Marker filtering
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"  # Filter for significant markers

# Enrichment analysis
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]
enrich_style = "enrichr"       # enrichr or clusterprofiler

# Error handling
error = false                  # Don't error out if no markers found

# Visualization
marker_plots_defaults = {"order_by": "desc(avg_log2FC)"}
allmarker_plots = {"Top 10 markers of all clusters": {"plot_type": "heatmap"}}

External References

Seurat FindMarkers Parameters

  • Full reference: https://satijalab.org/seurat/reference/findmarkers
  • Statistical tests: test.use parameter
    • "wilcox": Wilcoxon Rank Sum test (default, recommended)
    • "roc": Receiver Operating Characteristic
    • "t": Student's t-test
    • "negbinom": Negative binomial (requires DESeq2)
    • "poisson": Poisson test
  • Common arguments (use - instead of . in TOML):
    • min-pct: Minimum detection percentage in either group
    • logfc-threshold: Minimum log2 fold change threshold
    • only-pos: Only return positive markers
    • min-diff-pct: Minimum difference in detection percentage

Enrichment Databases

Configuration Examples

Minimal Configuration

[SeuratClusteringOfAllCells]
[ClusterMarkersOfAllCells]

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]

Standard Marker Finding

[SeuratClusteringOfAllCells]
[ClusterMarkersOfAllCells]

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]

[ClusterMarkersOfAllCells.envs]
# Find markers for broad cell type identification
dbs = ["MSigDB_Hallmark_2020", "KEGG_2021_Human"]
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0.25"

# Generate key visualizations
[ClusterMarkersOfAllCells.envs.marker_plots."Volcano Plot (log2FC)"]
plot_type = "volcano_log2fc"

[ClusterMarkersOfAllCells.envs.allmarker_plots."Top 10 markers of all clusters"]
plot_type = "heatmap"

[ClusterMarkersOfAllCells.envs.enrich_plots."Bar Plot"]
plot_type = "bar"
top_term = 10

Common Patterns

Pattern 1: Broad Cell Type Markers

[ClusterMarkersOfAllCells.envs]
# Optimized for distinguishing T/B/Myeloid/NK cells
min-pct = 0.1              # Require detection in >=10% of cells
logfc-threshold = 0.25     # Minimum log2 fold change
test.use = "wilcox"        # Fast and robust
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"

# Visualize markers to identify cell types
[ClusterMarkersOfAllCells.envs.allmarker_plots."Top 20 markers per cluster"]
plot_type = "heatmap"

# Check for expected markers in outputs
# T cells: CD3D, CD3E, CD3G, CD4, CD8A
# B cells: CD19, MS4A1 (CD20), CD79A, CD79B
# Myeloid: CD14, LYZ, FCGR3A, CD68
# NK cells: NCAM1 (CD56), KLRD1 (CD94), NKG7

Pattern 2: Quick Wilcoxon for Large Datasets

[ClusterMarkersOfAllCells.envs]
# Fast analysis for large datasets (>50k cells)
ncores = 8                  # Use multiple cores
test.use = "wilcox"
min-pct = 0.15              # More stringent to reduce noise
logfc-threshold = 0.3
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 0.5"

# Skip enrichment to save time
dbs = []

# Generate only essential plots
[ClusterMarkersOfAllCells.envs.allmarker_plots."Top markers heatmap"]
plot_type = "heatmap"

Pattern 3: Identify T/B Cell Clusters

[ClusterMarkersOfAllCells.envs]
# Focus on finding T and B cell markers for selection
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 1"

# Will help identify which clusters express:
# T cell markers: CD3D, CD3E, CD3G
# B cell markers: CD19, MS4A1, CD79A

[ClusterMarkersOfAllCells.envs.allmarker_plots."All markers heatmap"]
plot_type = "heatmap"

Difference from ClusterMarkers

AspectClusterMarkersOfAllCellsClusterMarkers
TimingBEFORE TOrBCellSelectionAFTER TOrBCellSelection
Data ScopeALL cells (mixed population)SELECTED T/B cells only
PurposeIdentify broad cell typesFine-grained sub-clusters
Typical markersCD3, CD19, CD14, NK markersActivation, differentiation markers
Use case"Which clusters are T/B/Myeloid?""What subtypes exist within T cells?"
UpstreamSeuratClusteringOfAllCellsSeuratClustering (post-selection)
DownstreamTOrBCellSelectionCell type annotation, downstream analysis

Key insight: Use ClusterMarkersOfAllCells when you need to separate T/B cells from other cell types. Use ClusterMarkers when you want to analyze sub-clusters within already-purified T or B cell populations.

Dependencies

Upstream Processes

  • SeuratClusteringOfAllCells: Required - provides clustered object with seurat_clusters metadata
  • SeuratPreparing: Indirect - provides normalized Seurat object
  • SampleInfo or LoadingRNAFromSeurat: Entry point for data

Downstream Processes

  • TOrBCellSelection: Primary consumer - uses marker results to select T/B cells
  • TopExpressingGenesOfAllCells: Optional complementary analysis

Validation Rules

Required Inputs

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]  # Must be specified

Process Enablement

  • Process automatically enabled when SeuratClusteringOfAllCells is in config
  • No need to explicitly set [ClusterMarkersOfAllCells] if SeuratClusteringOfAllCells is enabled

Parameter Constraints

  • test.use: Must be one of "wilcox", "roc", "t", "negbinom", "poisson"
  • min-pct: Should be between 0 and 1 (e.g., 0.1 = 10%)
  • logfc-threshold: Numeric value (log2 scale)
  • sigmarkers: Valid dplyr filter expression

Common Errors

  • Missing clustering: Ensure SeuratClusteringOfAllCells runs first
  • No markers found: Adjust sigmarkers or logfc-threshold if too stringent
  • Memory issues: Reduce ncores or subset data with large datasets

Troubleshooting

Issue: No significant markers found

Symptoms: Empty output directory or warning about no markers

Solutions:

[ClusterMarkersOfAllCells.envs]
# Less stringent thresholds
logfc-threshold = 0.1           # Lower fold change requirement
min-pct = 0.05                 # Lower detection percentage
sigmarkers = "p_val_adj < 0.1"  # More relaxed p-value

# Or check data quality
# - Are cells properly clustered?
# - Is expression matrix normalized?
# - Are there enough cells per cluster (>30 recommended)?

Issue: Too many markers (slow enrichment)

Symptoms: Process takes very long, memory issues

Solutions:

[ClusterMarkersOfAllCells.envs]
# More stringent filtering
logfc-threshold = 0.5
min-pct = 0.2
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"

# Reduce enrichment databases
dbs = ["MSigDB_Hallmark_2020"]

# Or skip enrichment entirely
dbs = []

Issue: Can't identify T/B cell clusters

Symptoms: Markers don't show clear T/B cell signatures

Solutions:

  1. Check marker gene presence:

    # Verify expected markers are in your data
    # Use SeuratClusterStats to visualize:
    [SeuratClusterStats.envs.features_defaults]
    features = ["CD3D", "CD3E", "CD19", "MS4A1", "CD14", "LYZ"]
    
  2. Adjust clustering parameters:

    [SeuratClusteringOfAllCells.envs]
    res = 0.5  # Try different resolutions (0.2-1.5)
    
  3. Check data quality:

    • Are genes properly normalized?
    • Are there enough cells per cluster?
    • Is species correct (human vs mouse gene symbols)?

Issue: Process not running

Symptoms: Process skipped in workflow

Solutions:

  • Verify SeuratClusteringOfAllCells is in config
  • Check dependencies are running correctly
  • Ensure TCR data requires T/B selection (not all T cells already)

Typical Marker Genes for Identification

Cell TypePositive MarkersNegative Markers
T cellsCD3D, CD3E, CD3G, CD4, CD8ACD19, MS4A1, CD14
B cellsCD19, MS4A1 (CD20), CD79A, CD79BCD3E, CD3D, CD14
MonocytesCD14, LYZ, FCGR3A, S100A8CD3E, CD19
NK cellsNCAM1 (CD56), KLRD1 (CD94), NKG7CD3E, CD19, CD14
Dendritic cellsFCER1A, CST3CD3E, CD19, CD14
MegakaryocytesPPBP, PF4CD3E, CD19, CD14

Use these marker lists to identify which clusters correspond to which cell types in your allmarker_plots heatmaps.

Skills Info
Original Name:clustermarkersofallcellsAuthor:pwwang