Causal Inference: Understanding Cause-and-Effect in Machine Learning

By Ansarul Haque May 10, 2026 0 Comments

Master causal inference. Complete guide to causal machine learning, causal graphs, treatments, and discovering cause-and-effect relationships from data.

Introduction: Causal Inference

Correlation is not causation. Everyone knows this.

Yet it’s the hardest lesson to apply.

You observe: “Users who receive email campaign spend more.”
You conclude: “Email causes spending.”
You increase email frequency.
Spending drops.

What happened? The original relationship was correlation (users inclined to spend also open emails), not causation. More emails didn’t cause spending; it annoyed users.

Causal inference—rigorously determining cause-and-effect—is one of ML’s hardest and most important problems.

Why important? Because:

Policy decisions depend on causal effects (“Will this intervention help?”)
Predictions fail when underlying causal structure changes
Optimization requires understanding what actions cause outcomes
Fairness requires causal reasoning (discrimination vs correlation)

This guide covers causal inference: from understanding why correlation fails to methods for discovering causation from data to practical applications.

Correlation vs Causation

Why Correlation Fails

Example: Coffee and Heart Disease

Observation: “People who drink more coffee have more heart disease.”

Possible explanations:

Coffee causes heart disease (causal)
Unhealthy people drink more coffee to stay awake (reverse causation)
Coffee drinkers smoke more (confounding: smoking causes both)

All are compatible with the observed correlation.

Confounding

Most common reason correlation ≠ causation.

Confounded Variable (Confounder): Affects both treatment and outcome.

Smoking ←── confounder ──→ Heart Disease
    ↓
Coffee Drinking (caused by smoking)

Observing: Coffee + Heart Disease
But: Smoking is the real cause

Result: Coffee appears to cause disease when it doesn’t.

Selection Bias

Study participants differ from population.

Example: Online survey about dieting
Who responds? People interested in dieting
Biased sample: Over-represents diet enthusiasts
Results don't generalize to population

Causal Graphs and DAGs

Directed Acyclic Graph (DAG): Visual representation of causal relationships.

Example: Income and Education

   Parents' Income
        ↓
   Education Level ──→ Personal Income
        ↑
   Intelligence

Interpretation:

Parents’ income affects education (some causality)
Intelligence affects education (some causality)
Education affects personal income (causality)
Intelligence affects personal income directly
No cycles (acyclic)

Reading DAGs

Arrow → means: Causality flows this direction

X → Y: X causes Y (direct effect)
X → Z → Y: X affects Y through Z (indirect effect)
X ← Z → Y: Z affects both X and Y (confounding)

Constructing DAGs

Based on:

Subject matter expertise: Understand how variables relate
Temporal order: Causes must precede effects
Prior research: What’s known about relationships
Domain knowledge: How do systems work

Critical step: Getting DAG right determines whether causal inference works.

Causal Identification

Question: Can we estimate causal effect from data?

Answer depends on causal structure.

Identifiability

Identifiable: Can estimate causal effect from observational data

Parent Income → Education → Personal Income

Even without randomization, can estimate:
- Effect of education on income
(Using causal inference methods)

Non-Identifiable: Cannot estimate from observational data

Education ← Ability → Earnings
Intelligence → Education (confounder)

Cannot disentangle:
- Effect of education
- Effect of ability
Need randomization

Key Concepts

Back-door Criterion: Path that confounds treatment and outcome.

Treatment ← Confounder → Outcome

This "back-door" path creates correlation without causation
Must block it (adjust for confounder)

Front-door Criterion: Can estimate causation through mediating variables.

Treatment → Mediator → Outcome

Even with unobserved confounding of Treatment-Outcome
Can estimate causal effect if mediating path known

Randomized Controlled Trials (RCTs)

Gold Standard: Randomly assign treatment, measure outcome.

Why RCTs Work

Random assignment breaks confounding.

Without randomization:
Healthy people exercise more (health causes exercise)
Exercisers healthier (correlation)
Confounded: Can't tell if exercise helps

With randomization:
Randomly assign: Exercise vs No Exercise
Health status equal in both groups (on average)
Difference in outcomes = causal effect

Design

Process:

Recruit participants
Randomly assign to treatment/control
Give treatment to treatment group
Measure outcome in both
Compare outcomes

Strengths:

Unbiased causal effect (with large sample)
Simple interpretation
Handles unmeasured confounders

Limitations:

Expensive
Time-consuming
Sometimes unethical
May not generalize

Observational Studies

When randomization impossible (unethical or impractical), use observational data.

Observational Data: Not randomly assigned, just observed.

Challenge

With observational data, must be careful about confounding.

Example: Does smoking cause cancer?
Can't randomize (unethical)
Use observational data (who smokes, who gets cancer)
Account for confounders (age, genetics, etc.)
Estimate causal effect

Methods for Adjustment

1. Matching: Compare smokers to similar non-smokers (matched on age, genetics, etc.)

Smoker: Age 45, Genetics A
Compare to non-smoker: Age 45, Genetics A
Difference in cancer → causal effect of smoking

2. Regression Adjustment: Include confounders in regression model.

Cancer ~ Smoking + Age + Genetics + ...
Coefficient on Smoking = causal effect (if all confounders included)

3. Instrumental Variables: Use variable that affects treatment but not outcome (except through treatment).

Example: Tax on cigarettes
Affects whether people smoke (treatment)
Doesn't directly affect health (except through smoking)
Can use tax as instrument to estimate smoking's effect on health

Treatment Effects

Types of Treatment Effects

Average Treatment Effect (ATE):

ATE = E[Y(1)] - E[Y(0)]
    = Expected outcome if everyone treated - if no one treated

Example: Average effect of medication on health

Heterogeneous Treatment Effects (HTE): Effect differs by person.

Medication helps people with severe disease
Doesn't help people with mild disease
Effect heterogeneous by disease severity

Conditional Average Treatment Effect (CATE):

CATE(X) = E[Y(1) - Y(0) | X]
= Treatment effect for people with characteristics X

Example: Medicine works for older people, not younger

Why Heterogeneous Effects Matter

Example: A/B Test

Overall: Treatment improves conversion 2%
But: Works for new users (+5%), hurts for loyal users (-2%)

Optimal: Treat new users, don’t treat loyal users

Methods for Causal Inference

Propensity Score Matching

Idea: Treated and untreated similar except for treatment.

1. Estimate probability of treatment given characteristics (propensity score)
2. Match treated units with similar untreated units (same propensity score)
3. Compare outcomes
4. Difference = causal effect

Advantage: Simple, interpretable
Disadvantage: Requires strong assumptions

Doubly Robust Estimation

Combine two methods, more robust.

1. Propensity score weighting
2. Outcome regression

Even if one wrong, often still get right answer
More reliable than single method

Causal Forests

Machine learning approach to estimate heterogeneous treatment effects.

Forest learns:
- Who benefits from treatment
- Who doesn't
- How much each group benefits

More flexible than linear methods
Captures non-linear effects

Causal Machine Learning

Combining ML with causal inference.

Challenges

Standard ML optimizes: Predict Y given X (whatever correlation helps)

Causal ML needs: Understand how X causes Y (causal mechanism)

Different objectives:

Prediction: P(Y | X) any correlation works
Causation: What happens if we change X?

Approaches

1. Causal Forests (Athey & Wager): ML + causal inference for treatment effects

2. CATE Estimation: Use ML to estimate treatment effect as function of X

ML model learns:
Effect(X) = How much treatment helps someone with characteristics X

3. Double Machine Learning (Chernozhukov): Use ML for nuisance parameters, causal inference for parameters of interest

Advantage: Leverage ML’s flexibility, maintain causal guarantees

Applications

Healthcare

Treatment Selection: Which patients benefit from treatment?

ML identifies:
- Patients most likely to respond
- Patients likely to have side effects
- Optimal treatment per patient

E-commerce

Recommendation Interventions: What recommendations actually cause purchases?

Traditional: Show recommendation, user buys
Causal: Did recommendation cause purchase or was user buying anyway?

Causal inference answers:
Which recommendations actually convert

Policy

Policy Evaluation: Does policy cause intended outcome?

Example: Job training program
Causal inference estimates:
How much does training increase future earnings
How much for different groups
ROI of program

Key Takeaways

✓ Correlation ≠ Causation – Fundamental principle, often violated

✓ Confounding is culprit – Most common reason correlation fails

✓ DAGs help reasoning – Visualize causal structures

✓ Identifiability crucial – Can we estimate causal effect?

✓ RCTs gold standard – But expensive, sometimes unethical

✓ Observational studies possible – With careful methods

✓ Methods exist – Matching, regression, instrumental variables, etc.

✓ Heterogeneous effects important – Effects differ by person

✓ Causal ML emerging – Combining ML with causal inference

✓ Domain knowledge essential – Causal inference requires understanding domain

Frequently Asked Questions

Q: How do I know if relationship is causal?
A: Randomization (RCT) is best. Observational: Domain knowledge, DAGs, careful analysis.

Q: Can ML solve causality?
A: No. ML helps estimate, but causality fundamentally about model/assumptions, not just data.

Q: What if I can’t randomize?
A: Use observational methods (matching, regression, IV) but acknowledge limitations.

Q: Are causal effects real?
A: Yes, but often heterogeneous. What works for one person may not work for another.

Q: How do I get causal graphs right?
A: Domain expertise, literature review, expert consultation. Iterative process.

✨ AI

Written By Ansarul Haque

Founder & Editorial Lead at QuestQuip

Ansarul Haque is the founder of QuestQuip, an independent digital newsroom committed to sharp, accurate, and agenda-free journalism. The platform covers AI, celebrity news, personal finance, global travel, health, and sports — focusing on clarity, credibility, and real-world relevance.

Independent Publisher Multi-Category Coverage Editorial Oversight

View All Articles

About the Author