bio-causal-genomics-mendelian-randomization
Estimate causal effects between exposures and outcomes using genetic variants as instrumental variables with TwoSampleMR. Implements IVW, MR-Egger, weighted median, and MR-PRESSO methods for robust causal inference from GWAS summary statistics. Use when testing whether an exposure causally affects an outcome using genetic instruments.
SKILL.md
| Name | bio-causal-genomics-mendelian-randomization |
| Description | Estimate causal effects between exposures and outcomes using genetic variants as instrumental variables with TwoSampleMR. Implements IVW, MR-Egger, weighted median, and MR-PRESSO methods for robust causal inference from GWAS summary statistics. Use when testing whether an exposure causally affects an outcome using genetic instruments. |
name: bio-causal-genomics-mendelian-randomization description: Estimate causal effects of an exposure on an outcome from GWAS summary statistics using genetic instruments. Implements IVW (fixed/random), MR-Egger, weighted median/mode, MR-RAPS, CAUSE, GSMR-HEIDI, MR-PRESSO, MVMR, MR-Clust, LCV, and LHC-MR via TwoSampleMR, MendelianRandomization, MR-PRESSO, cause, and lhcMR. Use when testing causal direction between traits, evaluating drug-target effects via cis-pQTL/cis-eQTL, performing multivariable mediation MR, distinguishing causation from correlated horizontal pleiotropy, or producing STROBE-MR-compliant sensitivity batteries. tool_type: mixed primary_tool: TwoSampleMR
Version Compatibility
Reference examples tested with: TwoSampleMR 0.6.0+, MendelianRandomization 0.10+, MR-PRESSO 1.0+, cause 1.2+, MVMR 0.4+, ieugwasr 1.0+, MRlap 0.0.3.2+, coloc 5.2+, mrclust 0.1+, lhcMR 0.0.1+, R 4.4+. Both TwoSampleMR 0.6.0 and ieugwasr 1.0 are the JWT-transition versions; older versions still expect deprecated OAuth.
Before using code patterns, verify installed versions match. If versions differ:
- R:
packageVersion('<pkg>')then?function_nameto verify parameters - CLI (plink, GCTA-GSMR):
<tool> --versionthen<tool> --help
If code throws an error referencing a function that has moved (e.g. ieugwasr::ld_clump vs TwoSampleMR::clump_data) or an OAuth token failure, introspect the installed API and adapt the example rather than retrying.
Mendelian Randomization
"Test whether trait X causally affects trait Y from GWAS summary statistics" -> Use genetic variants as instrumental variables (IVs) that satisfy three assumptions (relevance, independence, exclusion restriction) to estimate beta_causal = beta_outcome / beta_exposure under the IV framework (Davey Smith & Ebrahim 2003 IJE 32:1; Burgess Thompson 2017 SAGE textbook). Tool choice is a decision about the regime (one-sample vs two-sample, sparse vs polygenic, drug-target vs polygenic exposure) and the pleiotropy model (balanced, directional InSIDE, correlated horizontal). Wrong tool inflates Type-I error or attenuates true effects in a direction predictable from the bias structure.
- R:
TwoSampleMR::mr()orchestrates IVW + Egger + weighted median + weighted mode in one call - R:
MendelianRandomization::mr_ivw / mr_egger / mr_median / mr_mbe / mr_conmixper-method API (S4 objects; MR-RAPS is NOT in this package -- useTwoSampleMR::mr_raps()which wraps the GitHubmr.raps) - R:
MRPRESSO::mr_presso()global / outlier / distortion tests - R:
cause::cause()correlated horizontal pleiotropy mixture - R:
MVMR::strength_mvmr() + MVMR::ivw_mvmr()multivariable conditional-F + IVW
Statistical Model Taxonomy
| Method | Pleiotropy assumption | Min instruments | Strength | Fails when |
|---|---|---|---|---|
| IVW (fixed) | All IVs valid | 2 | Most efficient under no pleiotropy | Any directional or balanced pleiotropy inflates Type-I |
| IVW (random effects) | Balanced + InSIDE | 3 | Standard primary; absorbs heterogeneity into wider SE | Directional pleiotropy biases the point estimate |
| MR-Egger | Directional pleiotropy + InSIDE | 10+ for power | Detects + corrects directional pleiotropy via intercept (Bowden 2015 IJE 44:512) | NOME violated (I^2_GX < 0.9); SIMEX correction required; underpowered <10 SNPs |
| Weighted median | Up to 50% invalid IVs | 3 | Robust to a minority of bad instruments (Bowden 2016 Genet Epidemiol 40:304) | >50% invalid IVs |
| Weighted mode | Zero modal pleiotropy (ZEMPA) | 3 | Robust if the modal estimate is unbiased (Hartwig 2017 IJE 46:1985) | Bimodal pleiotropy; small numbers |
| MR-RAPS | Balanced pleiotropy + weak instruments | 10+ | Profile-score robust to weak-IV + balanced horizontal pleiotropy (Zhao 2020 Ann Stat 48:1742) | Strong directional pleiotropy; CRAN-archived 2025-03-01 |
| CAUSE | Correlated horizontal pleiotropy (CHP) | 100+ sig SNPs | Explicit shared-factor mixture; protects against CHP-driven false positives (Morrison 2020 Nat Genet 52:740) | Sparse polygenic exposures; <100 sig SNPs |
| GSMR + HEIDI-outlier | Outlier removal under InSIDE | 10+ | Alternative outlier detection; integrates with LD reference (Zhu 2018 Nat Commun 9:224) | Requires individual-level LD; HEIDI conservative |
| MR-PRESSO | Outlier-driven horizontal pleiotropy | 4+ | Global / outlier / distortion three-step (Verbanck 2018 Nat Genet 50:693) | Blind to CHP; computationally heavy at large NbDistribution |
| MVMR (IVW) | Conditional independence after measured pleiotropy | 1+ per exposure | Accounts for measured horizontal pleiotropy via multivariable regression (Sanderson 2019 IJE 48:713) | Conditional F < 10 on any exposure |
| MR-Clust | Heterogeneous causal effects (multiple mechanisms) | 30+ | Clusters SNPs by their causal-effect estimate (Foley 2020 Bioinformatics 37:531) | Single causal mechanism; small instrument sets |
| Contamination mixture | Mixture of valid + invalid IVs | 10+ | Profile-likelihood mixture (Burgess 2020 Nat Commun 11:376) | Sparse signal |
| LCV | Genome-wide; distinguishes causation vs genetic correlation | All SNPs | Tests gcp parameter using LDSC-style block jackknife (O'Connor & Price 2018 Nat Genet 50:1728) | Two-trait covariance dominated by a third confounder |
| LHC-MR | Bidirectional + heritable confounder | All SNPs | Joint likelihood over genome-wide markers; estimates both directions + confounder (Darrous 2021 Nat Commun 12:7274) | Computationally heavy; rare-variant trait |
| MRlap | Sample overlap + winner's curse + weak-IV jointly | Genome-wide sumstats | LDSC-scaffolded joint correction (Mounier & Kutalik 2023 Genet Epidemiol 47:314) | LDSC intercept poorly estimated (h^2 < 0.05); non-EUR without matched LD scores |
| Doubly-Ranked MR (DRMR) | Non-linear, non-parametric | 5+ strata | Non-parametric stratification (Tian 2023 PLoS Genet 19:e1010823); replaces residual stratification when linearity fails | Continuous exposures only; needs individual-level data; Hamilton 2023 medRxiv 23293658 shows stratum-specific bias from age/sex |
Methodology evolves; benchmark consensus shifts every 2-3 years. Verify against the current Slob & Burgess 2020 Genet Epidemiol, Burgess 2023 Wellcome Open Res "Guidelines for performing Mendelian randomization" (v3+), and STROBE-MR 2021 reporting standards before locking a method as primary.
Decision Tree by Experimental Scenario
| Scenario | Primary method | Sensitivity battery | Why |
|---|---|---|---|
| Standard two-sample, independent cohorts, polygenic exposure | IVW (random) | Egger + weighted median + MR-PRESSO + MR-RAPS + Steiger | Default; covers balanced, directional, outlier, weak-IV regimes |
| One-sample (e.g. UK Biobank both ends) | IVW with weak-IV-aware (MR-RAPS) | Egger + LCV + jackknife SE | One-sample bias formulas: Bowden 2019 IJE 48:728; F-stat floor shifts to F >= 20; jackknife SE preferred over analytic at one-sample scale; do NOT run exposure GWAS and outcome GWAS on the same individuals then claim two-sample (Hartwig 2021 collider bias); within-stratum MR (e.g. "MR among smokers") risks collider bias from the stratification variable |
| Partial sample overlap (UKB exposure + UKB outcome) | MR-RAPS with overlap correction | Sample-overlap-adjusted IVW (Burgess 2016 Genet Epidemiol 40:597) | Bias is intermediate, proportional to z-score correlation |
| Drug-target / cis-MR (cis-pQTL, cis-eQTL) | IVW restricted to cis window | Colocalization PP.H4 + LD-prune within window | Exclusion restriction relaxed because the protein/transcript directly mediates effect (Schmidt 2020 Nat Commun 11:3255) |
| MVMR for measured pleiotropy (e.g. LDL adjusted for HDL/TG) | MVMR::ivw_mvmr | Conditional F + Q_A heterogeneity | Required when exposures correlate via shared SNPs |
| Mediation MR (X -> M -> Y) | MVMR difference of total vs direct | Two-step MR + product-of-coefficients (Carter 2021 Eur J Epidemiol 36:465) | Network MR; quantifies indirect effect |
| Polygenic exposure with potential CHP (e.g. BMI -> CHD) | CAUSE (primary) + IVW (secondary) | Egger + MR-PRESSO + LCV | CAUSE explicitly models CHP via shared-factor; needs >=100 sig SNPs |
| Binary outcome (e.g. T2D) on linear scale | IVW on log-OR with log-additive coding | All sensitivity on log-OR; report exp(beta) | Linearity of MR estimating equation holds on log-OR not OR |
| Time-to-event (Cox) outcome | IVW on log-HR | Burgess & Labrecque 2018 Eur J Epidemiol 33:947 framework | Non-collapsibility caveats apply |
| Non-linear MR (e.g. alcohol J-curve) | DRMR (Tian 2023) + residual stratification side-by-side | Negative-control outcomes (genotype-vs-sex within strata); Hamilton 2023 limitation cited | Both methods produce stratum-specific bias from age/sex effects (Hamilton 2023 medRxiv 23293658); pre-specify the non-linear hypothesis, do not data-snoop the J-curve, report negative-control sanity checks |
| Single-patient rare disease | Not MR -- use FRASER/DROP outlier framework | See alternative-splicing/outlier-splicing-detection | MR requires summary stats; n=1 is wrong regime |
One-Sample vs Two-Sample Bias Direction
| Design | Weak-IV bias direction | Reason |
|---|---|---|
| One-sample, F<10 | Toward confounded observational estimate (overestimates causal effect if confounding is in same direction) | Sample correlation between IV-X and IV-Y residuals |
| Two-sample non-overlapping, F<10 | Toward null | Independent samples decouple residuals (Burgess 2011 IJE 40:755) |
| Two-sample with partial overlap | Intermediate; proportional to overlap fraction and z-score correlation | Burgess 2016 Genet Epidemiol 40:597; correction available |
Operational rule: Whenever both GWAS came from UK Biobank (or any single biobank), treat the analysis as one-sample-equivalent and prefer MR-RAPS as primary. Treating it as "two-sample because separate GWAS files" is a common error and produces overestimates.
MRlap: unified correction for sample overlap + winner's curse + weak instruments
MRlap (Mounier & Kutalik 2023 Genet Epidemiol 47:314) jointly corrects three biases that previously required three separate tools: sample overlap, winner's curse, and weak-instrument bias. It builds on an LDSC scaffold (cross-trait LD-score regression intercept estimates the overlap-induced covariance) and reweights the IVW estimate against the analytical bias-correction formula.
remotes::install_github('n-mounier/MRlap') # never on CRAN; bioconductor unsuitable
library(MRlap)
fit <- MRlap(
exposure = gwas_X_df, exposure_name = 'BMI',
outcome = gwas_Y_df, outcome_name = 'T2D',
ld = 'eur_w_ld_chr/', hm3 = 'w_hm3.snplist', # LDSC reference files
MR_threshold = 5e-8, MR_pruning_dist = 500, MR_pruning_LD = 0.05
)
fit$MRcorrection$corrected_effect # overlap + winner's curse + weak-IV corrected
fit$MRcorrection$corrected_effect_se
fit$LDSC$h2_LDSC # heritability sanity check
fit$LDSC$lambda # sample-overlap-induced lambda; ~0 means no overlap
Decision rule -- prefer MRlap when: (a) any sample overlap is suspected, (b) only sumstats are available (no individual-level data for re-running GWAS on disjoint samples), (c) exposure discovery and outcome were both run inside the same biobank (UKB-on-UKB, FinnGen-on-FinnGen). MRlap returns NA / unstable estimates when h^2 < 0.05; in that regime, fall back to Burgess 2016 overlap-corrected IVW plus MR-RAPS for the weak-IV component.
Drug-Target / cis-MR Framework
cis-MR restricts instruments to the cis-regulatory window of the gene encoding the protein/transcript exposure (Schmidt 2020 Nat Commun 11:3255), relaxing the exclusion-restriction assumption because the protein product directly mediates the SNP's effect on the outcome. Operational core: extract cis-pQTL/cis-eQTL within +/-500 kb of the gene; clump at r2 < 0.1 (looser than polygenic MR to retain power within a narrow window); require colocalization PP.H4 >= 0.7; flag protein-altering variants (PAV) which can break SomaScan/Olink aptamer/antibody binding rather than reflect biology.
Full drug-target cis-MR workflow including UKB-PPP / deCODE / Fenland pQTL panels, PAV flagging, Olink vs SomaScan replication (~15-30% cross-platform disagreement), and the operational claim ladder lives in causal-genomics/proteome-mr-drug-target. Use that skill for any drug-target nomination.
Binary outcomes and non-collapsibility
MR with logistic-GWAS sumstats returns per-allele log-OR on the population-averaged (marginal) scale, NOT the conditional log-OR (Burgess & Labrecque 2018 Eur J Epidemiol 33:947; Burgess 2017 Stat Methods Med Res 26:2333). For rare disease (prevalence < 10%), OR ~= RR ~= HR and the distinction is harmless. For common disease, OR diverges from RR/HR and the MR estimate cannot be back-converted to a conditional effect without strong assumptions; report as "per 1-SD increase in genetically-predicted X, OR for Y = ..." rather than implying an individual-level intervention effect.
Collider bias in case-only or disease-progression cohorts (Hu 2022 IJE 51:1289): conditioning the outcome on disease status (e.g. studying progression among diabetics) opens a collider path between any cause of disease and any cause of progression. MR within affected subsets without explicit adjustment for selection probability is fragile; weight by inverse probability of selection or restrict claims to the unconditioned population.
Per-Method Failure Modes
IVW under directional pleiotropy
Trigger: Several SNPs affect the outcome through pathways not via the exposure, in a consistent direction.
Mechanism: IVW is a weighted regression through origin; non-zero mean pleiotropy shifts the slope.
Symptom: Egger intercept p < 0.05 with non-zero estimate; IVW differs from weighted median; MR-PRESSO global test p < 0.05.
Fix: Use Egger (if I^2_GX >= 0.9 -- otherwise SIMEX-correct via the simex package applied to the Egger fit, treating se.exposure as measurement error in beta.exposure); cross-check with weighted median, MR-PRESSO, and CAUSE; report IVW only as one of a panel, never alone. The MendelianRandomization::mr_egger() function accepts distribution='normal' and reports NOME-corrected estimates internally but does NOT expose a SIMEX wrapper.
Weak-instrument bias direction
Trigger: Mean per-instrument F-statistic < 10, or several individual F < 10.
Mechanism: Weak IVs amplify finite-sample correlation between IV-X and IV-Y errors; bias direction depends on overlap regime (see table above).
Symptom: Estimates shift markedly when removing the weakest instruments; one-sample MR estimates much larger than two-sample.
Fix: Compute F per instrument from the EXPOSURE GWAS, not the outcome; exclude F < 10; use MR-RAPS (handles weak IVs by design); for two-sample, also report unweighted IVW (less weak-IV-bias-inflated than weighted in some regimes).
Winner's curse at P~5e-8
Trigger: Discovery GWAS is the source of both instrument selection and effect-size estimates.
Mechanism: SNPs that just cross 5e-8 in discovery have over-estimated effect sizes (regression toward the mean in independent replication); MR uses inflated beta_X, biasing causal estimate.
Symptom: MR effect shrinks substantially when using effect sizes from an independent replication GWAS.
Fix: (1) Three-sample design (discovery / replication-for-instrument-effect / outcome) where feasible. (2) When sumstats-only: MRlap (Mounier 2023), MR-SimSS (sample-splitting from sumstats), or RIVW (Ma 2023 JASA -- rerandomized IVW) jointly correct winner's curse + weak IVs + overlap. (3) Sadreev 2024 IJE 52:1209 empirical magnitude: 5-15% bias inflation at p < 5e-8; larger at relaxed thresholds.
NOME violation invalidating Egger
Trigger: Running MR-Egger with I^2_GX < 0.9.
Mechanism: Egger assumes NO Measurement Error in exposure effect sizes (NOME); when violated, Egger slope is attenuated toward null with reciprocal bias on the intercept.
Symptom: mr_pleiotropy_test() Egger estimate disagrees with weighted median in magnitude but agrees in direction; Isq() function returns <0.9.
Fix: Compute Isq(beta_X, se_X) (Bowden 2016 IJE 45:1961); if <0.9, apply SIMEX correction via simex package or report Egger as exploratory only. The MendelianRandomization package's mr_egger() returns a NOME-aware corrected estimate.
Steiger filter false flag under unmeasured confounding
Trigger: Applying steiger_filtering() on traits with unmeasured shared confounders (e.g. SES).
Mechanism: Steiger compares variance explained in exposure vs outcome per SNP; an unmeasured confounder upstream of both produces SNPs that explain more variance in the outcome than the exposure, falsely flagging "reverse causation" (Hemani Tilling 2022 Wellcome Open Res 7:14).
Symptom: Many SNPs flagged as wrong-direction yet biology and prior MR support forward causation.
Fix: Treat Steiger as a heuristic, not gospel; cross-validate direction with bidirectional MR (forward + reverse with independent instrument sets); for known-confounder-rich domains (psychiatric traits, SES proxies) use LCV or LHC-MR instead, which jointly model confounders.
Palindromic SNP harmonization
Trigger: SNPs with alleles A/T or C/G near MAF 0.5.
Mechanism: Strand orientation is ambiguous for palindromic SNPs when allele frequencies are intermediate; flipping introduces sign errors that look like pleiotropy.
Symptom: harmonise_data() reports many palindromic SNPs dropped; remaining SNPs show heterogeneity from a handful.
Fix: Default action = 2 (infer from allele frequencies) drops MAF~0.5 palindromes; action = 3 drops ALL palindromes (most conservative); never use action = 1 (assumes forward strand) unless both GWAS are guaranteed to use the same strand convention. Document choice in methods.
Quantitative Thresholds
| Threshold | Source | Rationale |
|---|---|---|
| F-statistic > 10 per instrument | Staiger & Stock 1997 (linear IV) | Heuristic; debated (Burgess 2011 IJE; Zhao 2020 argues 10 is too low for one-sample) |
| Conditional F > 10 per exposure (MVMR) | Sanderson 2019 IJE 48:713 | Total F can be high while conditional F low; per-exposure F is what matters |
I^2_GX >= 0.9 for Egger | Bowden 2016 IJE 45:1961 | NOME assumption; below this, SIMEX correction required |
| CAUSE >= 100 significant SNPs | Morrison 2020 Nat Genet 52:740 | Mixture model needs signal density for shared-factor estimation |
| Egger >= 10 instruments | Bowden 2015 IJE 44:512 | Power for slope test in weighted regression |
| Sample-overlap z-score correlation | Burgess 2016 Genet Epidemiol 40:597 | Use LDSC bivariate intercept as proxy; correct IVW SE accordingly |
| Clumping r2 < 0.001, 10 Mb window | TwoSampleMR default; matches GWAS LD norms | Polygenic MR; cis-MR uses r2 < 0.1 within window |
| Steiger p < 0.05 | Hemani 2017 PLoS Genet 13:e1007081 | Heuristic; subject to confounder caveat (Hemani Tilling 2022) |
| MR-PRESSO NbDistribution | 1000 exploratory; >= 5000 publication; >= 10000 stringent | Verbanck 2018 Nat Genet 50:693; precision of global p-value scales with NbDistribution |
| STROBE-MR all 20 items | Skrivankova 2021 JAMA 326:1614; BMJ 375:n2233 | Required since 2022 by most epidemiology journals |
| Bonferroni for pheWAS-MR | Standard | Many outcomes; FDR if exploratory |
TwoSampleMR Standard Workflow
Goal: Produce a defensible primary IVW estimate plus a full sensitivity battery from two-sample summary statistics.
Approach: Extract genome-wide significant instruments -> clump (local plink preferred) -> extract outcome -> harmonise -> mr -> pleiotropy + heterogeneity + leave-one-out -> Steiger -> MR-PRESSO -> report.
library(TwoSampleMR)
library(ieugwasr)
exposure_raw <- read_exposure_data(
filename = 'exposure_gwas.tsv', sep = '\t',
snp_col = 'SNP', beta_col = 'BETA', se_col = 'SE',
effect_allele_col = 'A1', other_allele_col = 'A2',
eaf_col = 'EAF', pval_col = 'P'
)
exposure_sig <- subset(exposure_raw, pval.exposure < 5e-08) # genome-wide significance
# F-statistic computed from EXPOSURE (Burgess 2011); ratio of squared effect to its variance
exposure_sig$f_stat <- (exposure_sig$beta.exposure / exposure_sig$se.exposure)^2
exposure_sig <- subset(exposure_sig, f_stat >= 10) # Staiger-Stock 1997 weak-IV heuristic
clumped <- ld_clump(
data.frame(rsid = exposure_sig$SNP, pval = exposure_sig$pval.exposure),
clump_r2 = 0.001, clump_kb = 10000, # polygenic MR convention
plink_bin = genetics.binaRies::get_plink_binary(),
bfile = '1kg_EUR/EUR'
)
exposure_dat <- subset(exposure_sig, SNP %in% clumped$rsid)
outcome_dat <- read_outcome_data(
filename = 'outcome_gwas.tsv', snps = exposure_dat$SNP, sep = '\t',
snp_col = 'SNP', beta_col = 'BETA', se_col = 'SE',
effect_allele_col = 'A1', other_allele_col = 'A2',
eaf_col = 'EAF', pval_col = 'P'
)
dat <- harmonise_data(exposure_dat, outcome_dat, action = 2) # infer from EAF; drops MAF~0.5 palindromes
primary <- mr(dat, method_list = c('mr_ivw', 'mr_egger_regression',
'mr_weighted_median', 'mr_weighted_mode'))
heterogeneity <- mr_heterogeneity(dat) # Cochran Q
pleiotropy <- mr_pleiotropy_test(dat) # Egger intercept
loo <- mr_leaveoneout(dat) # influential-SNP check
steiger <- directionality_test(dat) # variance-explained direction
MR-PRESSO Outlier Detection
Goal: Detect horizontal-pleiotropy outliers, remove them, and test whether the corrected estimate differs from the uncorrected one (distortion test).
Approach: Three-step framework: global test (presence of pleiotropy), outlier test (per-SNP), distortion test (effect change after outlier removal).
library(MRPRESSO)
presso <- mr_presso(
BetaOutcome = 'beta.outcome', BetaExposure = 'beta.exposure',
SdOutcome = 'se.outcome', SdExposure = 'se.exposure',
OUTLIERtest = TRUE, DISTORTIONtest = TRUE,
data = dat, NbDistribution = 10000, # >= 10000 for publication-grade p-value precision
SignifThreshold = 0.05
)
print(presso$`MR-PRESSO results`$`Global Test`) # any pleiotropy
print(presso$`MR-PRESSO results`$`Distortion Test`) # change after outlier removal
outlier_snps <- which(presso$`MR-PRESSO results`$`Outlier Test`$Pvalue < 0.05 / nrow(dat))
CAUSE for Correlated Horizontal Pleiotropy
CAUSE (Morrison 2020 Nat Genet 52:740) fits a shared-factor mixture to genome-wide sumstats and compares causal vs sharing-only models by delta-ELPD. Workflow: gwas_merge() -> sample ~1M variants for est_cause_params() -> filter sig SNPs (P < 1e-3) and optionally LD-prune -> cause(X, variants, param_ests). Needs >= 100 sig SNPs. Full annotated example and ELPD interpretation in causal-genomics/pleiotropy-detection.
MVMR with Conditional F
Goal: Estimate the causal effect of exposure X1 on Y, adjusting for measured pleiotropy via X2.
Approach: Format exposures + outcome into MVMR object; compute conditional F per exposure (>10 required); run multivariable IVW; report Q_A heterogeneity.
library(MVMR)
mvmr_dat <- format_mvmr(
BXGs = cbind(dat$beta.x1, dat$beta.x2),
BYG = dat$beta.y,
seBXGs = cbind(dat$se.x1, dat$se.x2),
seBYG = dat$se.y,
RSID = dat$SNP
)
condF <- strength_mvmr(r_input = mvmr_dat, gencov = 0) # per-exposure conditional F
# condF must be > 10 for EACH exposure (Sanderson 2019); total F is misleading
mv_ivw <- ivw_mvmr(r_input = mvmr_dat)
mv_qa <- qhet_mvmr(r_input = mvmr_dat, pcor = 0) # Q_A heterogeneity
gencov = 0 is valid ONLY if the exposure GWAS samples don't overlap; for overlapping exposures use the bivariate LDSC intercept matrix as gencov. If any conditional F < 10, the IVW point estimate is weak-IV-biased; switch to the Q-minimization estimator: qhet_mvmr(r_input, pcor, CI = TRUE, iterations = 1000) (Sanderson 2021 Stat Med 40:5434), which minimizes Q-statistic heterogeneity rather than weighting by inverse variance and is robust to weak conditional instruments.
Bidirectional and Steiger
exposure_rev <- format_data(outcome_raw, type = 'exposure') # treat former outcome as exposure
outcome_rev <- format_data(exposure_raw, type = 'outcome')
dat_rev <- harmonise_data(exposure_rev, outcome_rev, action = 2)
results_rev <- mr(dat_rev, method_list = 'mr_ivw')
dat_filt <- steiger_filtering(dat) # per-SNP; flags SNPs where variance(Y) > variance(X)
dir_test <- directionality_test(dat) # global; correct_causal_direction == TRUE if forward
Report direction operationally: forward p < 5e-8 with reverse p > 0.05; directionality_test() correct_causal_direction == TRUE with Steiger p < 0.05; point estimate in reverse direction has |effect| substantially smaller than forward (reflecting reverse-instrument-strength asymmetry). Example sentence: "Forward MR showed BMI -> T2D (IVW beta = 0.85, p = 2e-15); reverse MR was null (IVW beta = 0.02, p = 0.61); Steiger directionality test favored the forward direction (p = 3e-7)."
Reconciliation: When Methods Disagree
| Pattern | Likely cause | Action |
|---|---|---|
| IVW sig, Egger null with non-zero intercept | Directional pleiotropy | Report Egger as primary if I^2_GX >= 0.9; otherwise SIMEX-correct |
| IVW sig, weighted median sig, mode null | Mode underpowered or bimodal pleiotropy | Trust the agreement of IVW + median |
| MR-PRESSO distortion test sig | Outliers materially shift estimate | Report PRESSO-adjusted estimate as primary |
| CAUSE sig, IVW sig, same direction | High-confidence causal claim | Report both; emphasize CAUSE rules out CHP |
| CAUSE null, IVW sig, same direction | CHP indistinguishable from causation | Downgrade to "consistent with causation but not separable from CHP" |
| Forward MR sig, reverse MR also sig | Bidirectional causation OR confounded | Use LHC-MR or LCV for genome-wide resolution |
IVW sig but mean F << 10 in one-sample design | Weak-IV bias toward observational | Re-run with MR-RAPS; report adjusted estimate |
Operational rule for publication: Primary IVW + concordant Egger (or weighted median if NOME violated) + non-significant MR-PRESSO global test + Steiger correct direction = publication-ready evidence. CAUSE concordance is required when the exposure is polygenic and the prior on CHP is high (e.g. BMI -> outcome, lipids -> outcome). Single-method "significant IVW" claims should be downgraded to exploratory.
Cohort Gotchas
- UKB GWAS commonly include non-EUR participants; pan-UKB EUR/AFR/EAS/SAS subsets are separate releases. Mixing ancestries inflates instrument strength via population stratification rather than biology.
- FinnGen DF12 (2024) cohort: Finnish founder effects produce narrower LD blocks and higher winner's curse magnitude than UKB at matched sample size.
- MVP / GBMI / AoU: multi-ancestry meta-analyses; stratify by ancestry before MR or use ancestry-specific subsets.
- UKB-on-UKB MR creates one-sample-equivalent bias regardless of "different GWAS file" appearance; use MRlap or move the outcome to an external cohort (FinnGen, BBJ, MVP).
Anticipated Reviewer Pushback
| Pushback | Standard response |
|---|---|
| "Sample overlap between exposure and outcome GWAS?" | LDSC bivariate intercept reported; MRlap or sample-overlap-corrected IVW applied; document overlap fraction |
| "Weak instruments (F < 10)?" | F computed from EXPOSURE; per-instrument and mean F reported; MR-RAPS used as sensitivity if mean F borderline |
| "Horizontal pleiotropy?" | IVW + Egger + weighted median + weighted mode + MR-PRESSO; if rg > 0.3 also CAUSE (see pleiotropy-detection) |
| "Reverse causation?" | Steiger filter applied; bidirectional MR ran; LCV gcp reported if rg > 0.3 |
| "Pre-registered?" | OSF protocol filed; STROBE-MR all 20 items reported |
| "InSIDE assumption?" | INstrument Strength Independent of Direct Effect -- pleiotropic effects alpha uncorrelated with instrument-exposure effects gamma. Tested via Egger intercept + CHP-aware sensitivity (CAUSE) |
Common Errors
| Error / symptom | Cause | Solution |
|---|---|---|
| F-statistic computed from outcome | Reading off beta.outcome / se.outcome | Compute from EXPOSURE; outcome F is meaningless for IV strength |
| All instruments dropped at harmonise | All SNPs palindromic; or allele coding mismatch | Verify A1/A2 conventions match; try action = 3 to drop, not assume |
Unauthorized / 403 from OpenGWAS | OAuth deprecated May 2024; JWT token required | Generate at api.opengwas.io -> set OPENGWAS_JWT=<token> in ~/.Renviron -> restart R -> verify with ieugwasr::get_opengwas_jwt(). For production, skip the API: use ld_clump_local() with local 1KG bfile (saves rate-limit + auth headaches). |
| Conditional F vs total F confusion in MVMR | Reporting mean(F) not strength_mvmr() per-exposure | Always use MVMR::strength_mvmr() and report each column |
install.packages("mr.raps") fails | CRAN-archived 2025-03-01 | remotes::install_github('qingyuanzhao/mr.raps'); TwoSampleMR's mr_raps() wrapper internally calls this package |
| MR-PRESSO returns NA p-value | NbDistribution too small; signal too thin | Increase to >= 10000; check that >= 4 SNPs remain after harmonization |
| Egger intercept "highly significant" with 5 SNPs | Underpowered Egger over-fits the slope | Egger needs >= 10 SNPs; below that, intercept is unreliable |
| Sample-overlap correction ignored | Treating UKB-on-UKB as two-sample | Apply Burgess 2016 correction; or use MR-RAPS |
cause() runs forever | Default model fit on too many SNPs | Filter to sig SNPs (P < 1e-3) before cause(); est_cause_params uses the random subset |
| MAF column missing -> harmonise action=2 silently downgrades | EAF unavailable | Provide EAF or use action = 3 and document the loss |
Tool Installation Notes
# CRAN-stable
install.packages(c('remotes', 'MendelianRandomization', 'MVMR', 'coloc'))
# GitHub-only or recently archived
remotes::install_github('MRCIEU/TwoSampleMR') # primary orchestrator
remotes::install_github('MRCIEU/ieugwasr') # OpenGWAS client + local clumping
remotes::install_github('rondolab/MR-PRESSO') # never on CRAN
remotes::install_github('qingyuanzhao/mr.raps') # CRAN-archived 2025-03-01
remotes::install_github('jean997/cause') # depends on mixsqp; suggests Rfast
remotes::install_github('cnfoley/mrclust') # heterogeneity clusters
remotes::install_github('LizaDarrous/lhcMR') # bidirectional + heritable confounder
remotes::install_github('HDTian/DRMR') # doubly-ranked stratification
remotes::install_github('n-mounier/MRlap') # joint overlap + winner's-curse + weak-IV correction
TwoSampleMR::mr_raps() is a thin wrapper that calls mr.raps::mr.raps() under the hood; the GitHub mr.raps install above is therefore required. The MendelianRandomization package does NOT export mr_raps() (verify with ls('package:MendelianRandomization')); only TwoSampleMR offers a MR-RAPS entry point. For local clumping, install plink2 and download a 1KG EUR (or matched-ancestry) reference bfile.
STROBE-MR Reporting
STROBE-MR (Skrivankova 2021 JAMA 326:1614; BMJ 375:n2233): 20-item checklist required by Eur J Epi / Nat Genet / JAMA / Diabetologia since 2022. See causal-genomics/pleiotropy-detection for the per-item table -- that skill is the consolidated reporting reference for the full MR + sensitivity battery.
References
- Davey Smith G & Ebrahim S 2003 IJE 32:1 (foundational framework)
- Burgess S & Thompson SG 2017 SAGE textbook (MR canonical reference)
- Bowden J et al 2015 IJE 44:512 (MR-Egger)
- Bowden J et al 2016 Genet Epidemiol 40:304 (weighted median)
- Hartwig FP et al 2017 IJE 46:1985 (weighted mode)
- Zhao Q et al 2020 Ann Stat 48:1742 (MR-RAPS)
- Morrison J et al 2020 Nat Genet 52:740 (CAUSE)
- Zhu Z et al 2018 Nat Commun 9:224 (GSMR + HEIDI)
- Verbanck M et al 2018 Nat Genet 50:693 (MR-PRESSO)
- Sanderson E et al 2019 IJE 48:713 (MVMR conditional F)
- Foley CN et al 2020 Bioinformatics 37:531 (MR-Clust)
- Burgess S et al 2020 Nat Commun 11:376 (contamination mixture)
- O'Connor LJ & Price AL 2018 Nat Genet 50:1728 (LCV)
- Darrous L et al 2021 Nat Commun 12:7274 (LHC-MR)
- Tian H et al 2023 PLoS Genet 19:e1010823 (DRMR)
- Burgess S et al 2016 Genet Epidemiol 40:597 (sample-overlap correction)
- Schmidt AF et al 2020 Nat Commun 11:3255 (drug-target / cis-MR framework)
- Hemani G & Tilling K 2022 Wellcome Open Res 7:14 (Steiger filter caveat)
- Skrivankova VW et al 2021 JAMA 326:1614 (STROBE-MR statement)
- Mounier N & Kutalik Z 2023 Genet Epidemiol 47:314 (MRlap joint correction)
- Sanderson E et al 2021 Stat Med 40:5434 (qhet_mvmr Q-minimization estimator)
- Burgess S & Labrecque JA 2018 Eur J Epidemiol 33:947 (binary outcomes non-collapsibility)
- Hu Y et al 2022 IJE 51:1289 (collider bias in case-only / progression cohorts)
- Bowden J et al 2019 IJE 48:728 (one-sample MR bias formulas)
- Hamilton FW et al 2023 medRxiv 23293658 (NLMR stratum-specific bias critique)
- Sadreev II et al 2024 IJE 52:1209 (winner's curse empirical magnitude)
- Ma S et al 2023 JASA (RIVW rerandomized IVW)
Related Skills
- causal-genomics/colocalization-analysis - Confirm shared causal variant for cis-MR drug-target work
- causal-genomics/pleiotropy-detection - Deep dive on MR-PRESSO, Egger, contamination mixture diagnostics
- causal-genomics/fine-mapping - Credible-set construction at instrument loci before cis-MR
- causal-genomics/mediation-analysis - Two-step MR and MVMR difference method for X -> M -> Y mediation
- causal-genomics/transcriptome-wide-association - TWAS as MR-adjacent framework for gene-level inference
- causal-genomics/proteome-mr-drug-target - Dedicated cis-pQTL drug-target MR workflow with UKB-PPP/deCODE/Fenland and coloc triangulation
- population-genetics/association-testing - GWAS source for instrument selection
- clinical-databases/clinvar-lookup - Annotate instrument SNPs for downstream interpretation