name: modulescorecalculator description: Configuration skill for immunopipe process

ModuleScoreCalculator Process Configuration

Purpose: Calculate module/pathway/gene signature scores per cell using Seurat's AddModuleScore or CellCycleScoring functions.

When to Use

To score cells for specific gene programs (exhaustion, cytotoxicity, proliferation)
For pathway activity analysis using curated gene sets
To quantify functional states (activation, differentiation, memory)
To add diffusion map components for trajectory analysis
For cell cycle scoring to identify S and G2M phase cells

Configuration Structure

Process Enablement

[ModuleScoreCalculator]
cache = true

Input Specification

[ModuleScoreCalculator.in]
# Input: Seurat object from SeuratClustering
srtobj = ["SeuratClustering"]

Environment Variables

[ModuleScoreCalculator.envs]
# Default parameters inherited by all modules
defaults = { nbin = 24, ctrl = 100, seed = 8525, agg = "mean" }

# Module definitions (key = module name, value = gene set parameters)
modules = {}

# Post-scoring metadata transformations
post_mutaters = {}

External References

Seurat AddModuleScore Parameters

Parameter	Type	Default	Description
`features`	string/list	Required	Gene names or `cc.genes`/`cc.genes.updated.2019` for cell cycle
`nbin`	int	24	Number of bins for aggregate expression levels of all analyzed features
`ctrl`	int	100	Number of control features selected from same bin per analyzed feature
`k`	boolean	false	Use feature clusters from `DoKMeans` instead of random selection
`assay`	string	NULL	The assay to use (defaults to active assay)
`seed`	int	8525	Random seed for reproducibility
`search`	boolean	false	Search for symbol synonyms if features don't match
`keep`	boolean	false	Keep individual feature scores (non-cell cycle only)
`agg`	string	"mean"	Aggregation function: `mean`, `median`, `sum`, `max`, `min`, `var`, `sd`

Reference: https://satijalab.org/seurat/reference/addmodulescore

CellCycleScoring Parameters

When using features = "cc.genes" or "cc.genes.updated.2019", adds:

S.Score - S phase score per cell
G2M.Score - G2M phase score per cell
Phase - Cell cycle phase assignment (G1, S, G2M)

Reference: https://satijalab.org/seurat/reference/cellcyclescoring

Diffusion Map Parameters

{"DC": {"features": 2, "kind": "diffmap"}}

Adds first N diffusion components as metadata columns (DC_1, DC_2, ...).

Reference: https://www.rdocumentation.org/packages/destiny/versions/2.0.4/topics/DiffusionMap

Configuration Examples

Minimal Configuration

[ModuleScoreCalculator]
[ModuleScoreCalculator.in]
srtobj = ["SeuratClustering"]

Cell Cycle Scoring

[ModuleScoreCalculator.envs.modules]
CellCycle = { features = "cc.genes.updated.2019" }

Output columns: S.Score, G2M.Score, Phase

Exhaustion Score (T Cells)

[ModuleScoreCalculator.envs.modules.Exhaustion]
features = "HAVCR2,ENTPD1,LAYN,LAG3,TIGIT,PDCD1,TOX"

Cytotoxicity Score (CD8+ T Cells, NK Cells)

[ModuleScoreCalculator.envs.modules.Cytotoxicity]
features = "GZMB,PRF1,NKG7,GNLY,CTSW"

Proliferation Score

[ModuleScoreCalculator.envs.modules.Proliferation]
features = "MKI67,STMN1,TUBB,PCNA,TOP2A"

Activation Score

[ModuleScoreCalculator.envs.modules.Activation]
features = "IFNG,TNF,CD69,CD25"

Multiple Gene Sets (Functional States)

[ModuleScoreCalculator.envs.modules]
[ModuleScoreCalculator.envs.modules.CellCycle]
features = "cc.genes.updated.2019"

[ModuleScoreCalculator.envs.modules.Exhaustion]
features = "HAVCR2,ENTPD1,LAYN,LAG3,TIGIT,PDCD1"

[ModuleScoreCalculator.envs.modules.Activation]
features = "IFNG,TNF,CD69,CD25"

[ModuleScoreCalculator.envs.modules.Proliferation]
features = "MKI67,STMN1,TUBB,PCNA"

Diffusion Map Components

[ModuleScoreCalculator.envs.modules]
DC = { features = 2, kind = "diffmap" }

Use with: env.dimplots in SeuratClusterStats with reduction = "DC"

Post-Metadata Transformation

[ModuleScoreCalculator.envs.post_mutaters]
# Calculate combined exhaustion-activation ratio
Exh_Act_Ratio = "Exhaustion1 / Activation1"

# Classify high vs low exhaustion
Exhaustion_Level = "ifelse(Exhaustion1 > median(Exhaustion1, na.rm = TRUE), 'High', 'Low')"

Common Patterns

Pattern 1: T Cell Functional States

[ModuleScoreCalculator.envs.modules]
# Exhaustion markers (checkpoint genes)
Exhaustion = {
    features = "HAVCR2,ENTPD1,LAYN,LAG3,TIGIT,PDCD1,TOX,CTLA4"
}

# Activation markers
Activation = {
    features = "IFNG,TNF,CD69,CD25,IL2RA"
}

# Memory markers
Memory = {
    features = "IL7R,CCR7,SELL,S100A4"
}

# Terminal differentiation
Terminal_Diff = {
    features = "TIGIT,PDCD1,CD274,CD244,CD160"
}

Pattern 2: NK Cell Functional States

[ModuleScoreCalculator.envs.modules]
# Cytotoxicity
Cytotoxicity = {
    features = "GZMB,PRF1,NKG7,GNLY,CTSW"
}

# Activation
NK_Activation = {
    features = "NCAM1,KLRD1,FCGR3A"
}

# Exhaustion
NK_Exhaustion = {
    features = "HAVCR2,LAG3,PDCD1,TIGIT"
}

Pattern 3: Cell Cycle with Custom Parameters

[ModuleScoreCalculator.envs.defaults]
nbin = 24
ctrl = 100
seed = 8525

[ModuleScoreCalculator.envs.modules]
CellCycle = {
    features = "cc.genes.updated.2019"
}

Pattern 4: Metabolic Pathway Scores

[ModuleScoreCalculator.envs.modules]
# Glycolysis (Warburg effect)
Glycolysis = {
    features = "HK2,PKM,LDHA,PFKL,ENO1"
}

# Oxidative phosphorylation
OXPHOS = {
    features = "ND1,ND2,ND3,COX1,COX2,ATP5A1"
}

# Fatty acid oxidation
FAO = {
    features = "CPT1A,ACOX1,HADHA"
}

Pattern 5: B Cell Functionality

[ModuleScoreCalculator.envs.modules]
# Plasma cell differentiation
Plasma = {
    features = "MZB1,SSR4,SDC1,XBP1,PRDM1"
}

# Germinal center
Germinal_Center = {
    features = "BCL6,AICDA,MEF2B"
}

# Naive vs memory
Naive = {
    features = "IL7R,CCR7,IGHD"
}
Memory = {
    features = "CD27,IGG1,IGHG1"
}

Gene Set Resources

MSigDB (Molecular Signatures Database)

URL: https://www.gsea-msigdb.org/
Hallmark Collection: 50 curated gene sets for biological processes
- HALLMARK_INTERFERON_GAMMA_RESPONSE
- HALLMARK_TNFA_SIGNALING_VIA_NFKB
- HALLMARK_INFLAMMATORY_RESPONSE
- HALLMARK_HYPOXIA
- HALLMARK_APOPTOSIS
Immunologic Signatures (C7): Gene sets from immunology studies
Download: Available in GMT format for direct use

CellMarker Database

URL: http://bioinfo.life.hust.edu.cn/CellMarker/
Cell type-specific markers for human and mouse

Literature-Derived Signatures

T Cell Exhaustion Markers:

Primary: HAVCR2 (TIM-3), PDCD1 (PD-1), LAG3, TIGIT, CTLA4
Transcription factors: TOX, NR4A1, EOMES

T Cell Activation Markers:

Cytokines: IFNG, TNF, IL2
Surface: CD69, CD25 (IL2RA), CD38

Cytotoxicity Markers:

Granzymes: GZMB, GZMA, GZMH
Perforin: PRF1
NK receptors: NKG7, GNLY, CTSW

Proliferation Markers:

Ki-67: MKI67
Tubulin: STMN1, TUBB
PCNA: PCNA, TOP2A

Cell Cycle Genes (Seurat built-in):

cc.genes - Original Tirosh et al. 2016 gene set
cc.genes.updated.2019 - Updated with 2019 gene symbols

Dependencies

Upstream Processes

Required: SeuratClustering - Provides the Seurat object
Optional: TOrBCellSelection - If working with T/B cell subsets

Downstream Processes

SeuratClusterStats - Visualize module scores across clusters
CellCellCommunication - Correlate scores with cell interactions
ScFGSEA - Validate module activity with enrichment analysis

Validation Rules

Gene Set Format Validation

Comma-separated strings: "GENE1,GENE2,GENE3" ✓
Cell cycle keywords: "cc.genes" or "cc.genes.updated.2019" ✓
Diffusion map: {"features": N, "kind": "diffmap"} ✓

Gene Name Matching

Human genes: Uppercase (MKI67, IFNG) ✓
Mouse genes: Title case (Mki67, Ifng) ✓
Search mode: Set search = true to automatically find synonyms
Keep mode: Set keep = true to retain unmatched features

Parameter Constraints

nbin: Typically 10-50 (default 24)
ctrl: Typically 10-500 (default 100)
Minimum genes: ≥5 genes recommended for robust scoring
Maximum genes: No hard limit, but performance may degrade >1000 genes

Troubleshooting

Issue: Genes Not Found in Object

Symptom: Warning "XX% of features not found in object"

Solutions:

Check gene name format (uppercase for human, title case for mouse)
Enable search = true to find symbol synonyms
Verify gene symbols match your Seurat object's row names
Use search = true + keep = true to debug missing genes

Issue: Too Few Genes in Set

Symptom: Module score is NA or unreliable

Solutions:

Ensure ≥5 genes in gene set for robust scoring
Add alternative markers to expand gene set
Check if genes are expressed in your dataset
Use keep = true to see how many genes matched

Issue: Cell Cycle Score All G1

Symptom: Most cells classified as G1 phase

Solutions:

Check if cells are truly non-proliferating (e.g., memory T cells)
Verify data quality (low UMI counts may obscure cell cycle)
Consider using cc.genes instead of cc.genes.updated.2019
Check S.Score and G2M.Score values directly

Issue: Module Scores All Similar

Symptom: No variation in scores across cells

Solutions:

Genes may be uniformly expressed or not detected
Try adjusting nbin and ctrl parameters
Verify assay selection (assay = "RNA" vs "SCT")
Check if cells express the expected markers

Issue: Diffusion Map Components Not Added

Symptom: DC_1, DC_2 columns missing

Solutions:

Ensure destiny R package is installed
Verify SingleCellExperiment package is available
Use correct format: {"DC": {"features": 2, "kind": "diffmap"}}
Requires R packages: SingleCellExperiment, destiny

Best Practices

Gene Set Selection

Use literature-validated signatures when possible
Combine complementary markers (e.g., exhaustion: HAVCR2 + PDCD1 + LAG3)
Consider species-specific marker expression patterns
Test gene sets on a subset before full pipeline run

Parameter Tuning

nbin = 24: Default works well for most datasets
ctrl = 100: Increase if many genes have similar expression levels
seed = 8525: Keep fixed for reproducibility across runs
agg = "mean": Use median for outlier-resistant aggregation

Visualization

Use SeuratClusterStats.envs.dimplots to visualize scores
Add to SeuratClusterStats.envs.violins for distribution plots
Correlate scores with clusters or annotations
Consider post_mutaters for custom score transformations

Performance

Module scoring is computationally cheap (<5 min for typical datasets)
Larger gene sets (>1000 genes) may take longer
Diffusion map computation scales with cell number (O(n²))

Integration Example

# Complete workflow with multiple scores
[ModuleScoreCalculator.envs.defaults]
nbin = 24
ctrl = 100
seed = 8525

[ModuleScoreCalculator.envs.modules]
# Cell cycle
CellCycle = { features = "cc.genes.updated.2019" }

# T cell function
Exhaustion = { features = "HAVCR2,ENTPD1,LAYN,LAG3,TIGIT,PDCD1,TOX,CTLA4" }
Activation = { features = "IFNG,TNF,CD69,CD25" }
Memory = { features = "IL7R,CCR7,SELL,S100A4" }

# Cytotoxicity
Cytotoxicity = { features = "GZMB,PRF1,NKG7,GNLY" }

# Metabolism
Glycolysis = { features = "HK2,PKM,LDHA,PFKL,ENO1" }

[ModuleScoreCalculator.envs.post_mutaters]
# Classify T cell states
Tcell_State = """
case_when(
    Exhaustion1 > median(Exhaustion1, na.rm = TRUE) ~ 'Exhausted',
    Activation1 > median(Activation1, na.rm = TRUE) ~ 'Activated',
    Memory1 > median(Memory1, na.rm = TRUE) ~ 'Memory',
    TRUE ~ 'Naive'
)
"""

# Combined functional score
Functionality = "(Activation1 + Cytotoxicity1) / (Exhaustion1 + 1)"

Notes

Process is optional: Only runs when [ModuleScoreCalculator] section exists in config
Multiple modules: Define any number of modules in modules dictionary
Column naming: Scores stored as ModuleName1, ModuleName2, etc.
Cell cycle special case: Uses CellCycleScoring() which adds S.Score, G2M.Score, Phase
Diffusion map: Special module type for trajectory analysis
Post-processing: Use post_mutaters for custom metadata calculations
Visualization: Scores available in SeuratClusterStats for plotting

Quick Reference

Gene Set Formats:

# Comma-separated
features = "GENE1,GENE2,GENE3"

# Cell cycle (built-in)
features = "cc.genes.updated.2019"

# Diffusion map (special)
features = 2
kind = "diffmap"

Common Gene Sets:

Exhaustion = "HAVCR2,PDCD1,LAG3,TIGIT,CTLA4,TOX"
Cytotoxicity = "GZMB,PRF1,NKG7,GNLY"
Proliferation = "MKI67,STMN1,PCNA,TOP2A"
Activation = "IFNG,TNF,CD69,CD25"
Memory = "IL7R,CCR7,SELL"

Process Location: /immunopipe/processes.py (line 455) Documentation: /docs/processes/ModuleScoreCalculator.md Function: Seurat::AddModuleScore(), Seurat::CellCycleScoring()

Name	modulescorecalculator
Description	Configuration skill for immunopipe process

modulescorecalculator

SKILL.md

name: modulescorecalculator description: Configuration skill for immunopipe process

ModuleScoreCalculator Process Configuration

When to Use

Configuration Structure

Process Enablement

Input Specification

Environment Variables

External References

Seurat AddModuleScore Parameters

CellCycleScoring Parameters

Diffusion Map Parameters

Configuration Examples

Minimal Configuration

Cell Cycle Scoring

Exhaustion Score (T Cells)

Cytotoxicity Score (CD8+ T Cells, NK Cells)

Proliferation Score

Activation Score

Multiple Gene Sets (Functional States)

Diffusion Map Components

Post-Metadata Transformation

Common Patterns

Pattern 1: T Cell Functional States

Pattern 2: NK Cell Functional States

Pattern 3: Cell Cycle with Custom Parameters

Pattern 4: Metabolic Pathway Scores

Pattern 5: B Cell Functionality

Gene Set Resources

MSigDB (Molecular Signatures Database)

CellMarker Database

Literature-Derived Signatures

Dependencies

Upstream Processes

Downstream Processes

Validation Rules

Gene Set Format Validation

Gene Name Matching

Parameter Constraints

Troubleshooting

Issue: Genes Not Found in Object

Issue: Too Few Genes in Set

Issue: Cell Cycle Score All G1

Issue: Module Scores All Similar

Issue: Diffusion Map Components Not Added

Best Practices

Gene Set Selection

Parameter Tuning

Visualization

Performance

Integration Example

Notes

Quick Reference