topexpressinggenesofallcells
Identifies and visualizes the top expressing genes per cluster across ALL cells (before T/B cell selection), followed by pathway enrichment analysis. Provides initial overview of all cell populations by highlighting the most highly expressed genes and their biological functions.
SKILL.md
| Name | topexpressinggenesofallcells |
| Description | Identifies and visualizes the top expressing genes per cluster across ALL cells (before T/B cell selection), followed by pathway enrichment analysis. Provides initial overview of all cell populations by highlighting the most highly expressed genes and their biological functions. |
name: topexpressinggenesofallcells description: Identifies and visualizes the top expressing genes per cluster across ALL cells (before T/B cell selection), followed by pathway enrichment analysis. Provides initial overview of all cell populations by highlighting the most highly expressed genes and their biological functions.
TopExpressingGenesOfAllCells Process Configuration
Purpose
Identifies and visualizes the top expressing genes per cluster across ALL cells (before T/B cell selection), followed by pathway enrichment analysis. Provides initial overview of all cell populations by highlighting the most highly expressed genes and their biological functions.
When to Use
- After:
SeuratClusteringOfAllCellsprocess - Before:
TOrBCellSelection(this is a pre-selection analysis) - Use cases:
- Quick overview of ALL cell populations before separation
- Initial assessment of broad cell type signatures
- Understanding overall cell composition before T/B selection
- Pathway enrichment on cell type markers before detailed analysis
- Quality check for unexpected cell types
- Complementary to
ClusterMarkersOfAllCellsfor complete pre-selection profiling
- Optional process: Enable only when pre-selection analysis is needed
Configuration Structure
Process Enablement
[TopExpressingGenesOfAllCells]
cache = true
Input Specification
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
Note: srtobj accepts the output from SeuratClusteringOfAllCells.
Environment Variables
Core Parameters
[TopExpressingGenesOfAllCells.envs]
# Number of top expressing genes to identify per cluster
n = 250
# Enrichment style
enrich_style = "enrichr" # Options: "enrichr", "clusterprofiler"
# Enrichment databases
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]
Enrichment Plot Settings
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
# Plot type for enrichment results
plot_type = "bar" # Options: "bar", "dot", "lollipop", "network", "enrichmap", "wordcloud"
# Device parameters
devpars = {res = 100, width = 800, height = 600}
# Additional output formats
more_formats = []
# Save R code to reproduce plots
save_code = false
# Top terms to display
top_term = 10 # Number of top enriched pathways to show
ncol = 1 # Number of columns in multi-panel plots
Cell Subsetting
[TopExpressingGenesOfAllCells.envs]
# Subset cells before analysis (optional)
subset = ""
Cache Control
[TopExpressingGenesOfAllCells.envs]
# Cache intermediate results
cache = "/tmp" # true, false, or directory path
Configuration Examples
Minimal Configuration
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
Top 10 Genes for Broad Cell Type ID
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 10
dbs = ["MSigDB_Hallmark_2020"]
Multiple Databases for Comprehensive Overview
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 100
dbs = [
"KEGG_2021_Human",
"MSigDB_Hallmark_2020",
"GO_Biological_Process_2025"
]
Common Patterns
Pattern 1: Quick All-Cell Overview (Pre-Selection)
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 10
dbs = ["MSigDB_Hallmark_2020"]
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
plot_type = "bar"
top_term = 10
What to expect: Top 10 genes per cluster showing broad cell type markers (CD3 for T cells, CD19 for B cells, CD14 for monocytes, etc.)
Pattern 2: Broad Cell Type Signature Identification
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 50
[TopExpressingGenesOfAllCells.envs.enrich_plots]
"T Cell Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}
"B Cell Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}
"Myeloid Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}
What to expect: Identification of T cell (CD3E, CD3D), B cell (CD19, MS4A1), and myeloid (CD14, LYZ) signatures across clusters
Pattern 3: Quality Check for Unexpected Cell Types
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 20
dbs = [
"GO_Biological_Process_2025",
"GO_Cellular_Component_2025"
]
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
plot_type = "dot"
top_term = 15
What to expect: Detection of contamination (e.g., EPCAM for epithelial, COL1A1 for fibroblasts, RBC markers)
Difference from TopExpressingGenes
TopExpressingGenesOfAllCells vs TopExpressingGenes:
| Aspect | TopExpressingGenesOfAllCells | TopExpressingGenes |
|---|---|---|
| When it runs | BEFORE TOrBCellSelection | AFTER TOrBCellSelection |
| Input data | All cells (unfiltered) | Only selected T or B cells |
| Upstream process | SeuratClusteringOfAllCells | SeuratClustering + TOrBCellSelection |
| Use case | Initial assessment, quality check | Detailed T/B cell analysis |
| Cell types | ALL cell types present | Only T OR B cells |
| Typical markers | CD3, CD19, CD14, etc. | Specific T/B cell subtypes |
| Position in workflow | Pre-selection overview | Post-selection deep dive |
Workflow context:
RNA Input → SeuratPreparing → SeuratClusteringOfAllCells
↓
TopExpressingGenesOfAllCells ← Runs here
↓
TOrBCellSelection (separates T/B)
↓
SeuratClustering (on selected cells)
↓
TopExpressingGenes ← Runs here
Recommendation:
- Use
TopExpressingGenesOfAllCellsto assess overall data quality and cell type composition - Use
TopExpressingGenesfor detailed analysis of T or B cell subtypes - Enable both for comprehensive analysis: pre-selection overview + post-selection deep dive
Dependencies
- Upstream:
SeuratClusteringOfAllCells - Downstream:
TOrBCellSelection(optional - this process provides pre-selection context) - Data: Seurat object with cluster assignments for ALL cells
Validation Rules
nparameter: Must be positive integer (typically 10-500)dbs: Must be valid enrichit/Enrichr database names or local GMT file pathsenrich_style: Must be "enrichr" or "clusterprofiler"plot_type: Must be valid scplotter plot type- Workflow requirement: Only runs when
SeuratClusteringOfAllCellsis enabled
Troubleshooting
Process Not Running
Issue: TopExpressingGenesOfAllCells not executed despite being in config
Causes:
SeuratClusteringOfAllCellsnot enabled- Missing dependency in workflow
- Process disabled via validation warning
Solutions:
- Ensure
SeuratClusteringOfAllCellsis enabled in config - Check validation warnings:
python -m immunopipe.validate_config config.toml - Verify both processes in config:
[SeuratClusteringOfAllCells] [TopExpressingGenesOfAllCells]
Mixed Cell Types in Results
Issue: Clusters show multiple cell type markers (CD3 + CD19)
Causes:
- Overlapping clusters (resolution too low)
- Doublets/multiplets not filtered
- Contamination in data
Solutions:
- Adjust clustering resolution in
SeuratClusteringOfAllCells - Filter doublets in
SeuratPreparingstep - Use
TOrBCellSelectionafter assessment to clean data
No Clear Cell Type Signatures
Issue: Top genes list lacks expected markers (CD3, CD19, CD14)
Causes:
- Data quality issues (low counts, high mitochondrial)
- Wrong organism (human vs mouse gene symbols)
- Incomplete clustering
Solutions:
- Check QC metrics in
SeuratClusterStatsOfAllCells - Verify organism (uppercase=human, titlecase=mouse)
- Review clustering results from
SeuratClusteringOfAllCells
Ribosomal/Mitochondrial Gene Dominance
Issue: Top genes list dominated by housekeeping genes (RPS, RPL, MT-)
Solutions:
- Increase
nparameter to see beyond housekeeping genes - Filter out ribosomal/mitochondrial genes in
SeuratPreparingstep - Use
ClusterMarkersOfAllCellsfor differential expression
Empty Enrichment Results
Issue: No pathways enriched despite top genes identified
Causes:
- Gene identifiers don't match database
ntoo small for meaningful enrichment- Database not appropriate for cell type
Solutions:
- Increase
nto 100-500 genes - Verify species match (check gene symbols)
- Try different databases (e.g.,
GO_Biological_Process_2025)
Plot Rendering Errors
Issue: Enrichment plots fail to render
Causes:
- Network plots with too many terms
- Missing dependencies in R environment
Solutions:
- Reduce
top_termparameter - Use simpler plot types (
bar,dot) - Verify R packages installed:
enrichit,scplotter
Output Structure
<srtobj_stem>.top_expressing_genes/
├── <cluster_name>/ # One subdirectory per cluster (ALL cells)
│ ├── top_genes.tsv # Top N genes with expression metrics
│ └── enrich/ # Enrichment results
│ ├── <db_name>/ # One subdirectory per database
│ │ ├── *.Bar-Plot.png # Enrichment plots
│ │ ├── *.enrich.tsv # Enrichment tables
│ │ └── ...
External References
Enrichment Databases (enrichit)
Built-in databases:
KEGG_2021_Human- KEGG pathways (human)MSigDB_Hallmark_2020- MSigDB Hallmark gene setsGO_Biological_Process_2025- GO Biological Process termsGO_Cellular_Component_2025- GO Cellular Component termsGO_Molecular_Function_2025- GO Molecular Function termsReactome_Pathways_2024- Reactome pathwaysWikiPathways_2024_Human- WikiPathways (human)
Enrichr libraries: See https://maayanlab.cloud/Enrichr/#libraries
Enrichment Plot Types (scplotter)
bar- Bar chart of enriched termsdot- Dot plot (bubble chart)lollipop- Lollipop plotnetwork- Network visualization of term relationshipsenrichmap- Enrichment map (similar to Cytoscape)wordcloud- Word cloud visualization
Enrichment Styles
enrichr- Fisher's exact test (Enrichr-style)clusterprofiler- Hypergeometric test (clusterProfiler-style)
See Also
TopExpressingGenes- Top genes for selected T/B cells after selectionClusterMarkersOfAllCells- Differential expression for all cells before selectionSeuratClusteringOfAllCells- Clustering on all cells before T/B selectionTOrBCellSelection- T/B cell separation process