Tae Hyun Kim (Lowell)

Score-Based Methods Overview

Overview

Score-based methods assign a score function to each graph and search for the graph that best fits the data. Unlike constraint-based methods, they optimize model fit without CI tests.

Score-Based Methods Overview

Mermaid source (click to expand)
> flowchart LR
>     Data[Data] --> Score[Score Function]
>     Score --> Search[Search Algorithm]
>     Search --> Best[Best Graph/MEC]
>

Core Idea

Optimization problem: G=argmaxGDAGsS(G;D)\mathcal{G}^* = \arg\max_{\mathcal{G} \in \text{DAGs}} S(\mathcal{G}; D)

  • SS: Score function (BIC, BGe, etc.)
  • DD: Data
  • Search over MECs for efficiency

Comparison of Main Algorithms

AlgorithmSearch StrategyComplexityFeature
GESGreedyO(n4)O(n^4)Forward-backward phases
FGESParallel GreedyFastParallelized GES
Hill-climbingLocal searchO(n2)O(n^2)Simple but local optima
NOTEARSContinuous optO(n3)O(n^3)Acyclicity constraint

Score Functions

BIC (Bayesian Information Criterion)

SBIC(G;D)=logP(Dθ^,G)k2lognS_{\text{BIC}}(\mathcal{G}; D) = \log P(D | \hat{\theta}, \mathcal{G}) - \frac{k}{2}\log n

  • kk: Number of parameters
  • nn: Sample size
  • Penalizes complexity

BGe (Bayesian Gaussian equivalent)

SBGe(G;D)=logP(DG)S_{\text{BGe}}(\mathcal{G}; D) = \log P(D | \mathcal{G})

Marginal likelihood with Gaussian assumption:

  • Uses conjugate priors
  • Closed-form computation

BDeu (Bayesian Dirichlet equivalent uniform)

SBDeu(G;D)=ij[logΓ(αij)Γ(αij+nij)+klogΓ(αijk+nijk)Γ(αijk)]S_{\text{BDeu}}(\mathcal{G}; D) = \sum_{i} \sum_{j} \left[ \log\frac{\Gamma(\alpha_{ij})}{\Gamma(\alpha_{ij} + n_{ij})} + \sum_k \log\frac{\Gamma(\alpha_{ijk} + n_{ijk})}{\Gamma(\alpha_{ijk})} \right]

Score for discrete data

Score Equivalence

Definition: The score is constant within a MEC

G1G2    S(G1;D)=S(G2;D)\mathcal{G}_1 \equiv \mathcal{G}_2 \implies S(\mathcal{G}_1; D) = S(\mathcal{G}_2; D)

Implication:

  • Does not distinguish DAGs within a MEC
  • Can directly search over CPDAGs

Advantages and Disadvantages

Advantages

  1. No Faithfulness Required

    • Weaker assumption than constraint-based methods
    • Can work even under unfaithful distributions
  2. Principled Model Selection

    • Bayesian framework
    • Automatic complexity control (BIC penalty)
  3. Global Optimization

    • Global fit rather than local CI decisions

Disadvantages

  1. NP-hard Problem

    • DAG space is super-exponential
    • Greedy search → local optima
  2. Score Function Choice

    • Results depend on the score
    • Misspecification issues
  3. Computational Cost

    • Slow in high dimensions
    • → Mitigated by FGES

Constraint vs Score Comparison

AspectConstraint-BasedScore-Based
Core operationCI testsScore optimization
AssumptionFaithfulnessScore decomposability
Local vs GlobalLocal decisionsGlobal fit
Noise sensitivityHigh (CI test errors)Moderate
ComputationMany CI testsScore evaluations

Directions for Improvement

Hybrid Methods

Hybrid Methods Overview:

  • Build skeleton with constraints, then orient edges with scores
  • Examples: GFCI, MMHC

Continuous Optimization

NOTEARS:

  • Transforms the DAG space into a continuous space
  • Leverages an acyclicity penalty

Parallelization

FGES:

  • Parallelizes edge addition/removal
  • Handles large-scale data

Algorithm Details

  • GES - Greedy Equivalence Search
  • FGES - Fast GES
  • Scoring Criteria - BIC, BGe, BDeu details

Theoretical Foundations

  • Markov Equivalence Class - Search space
  • CPDAG - Output representation

Comparison

  • Constraint-Based Methods Overview - Alternative approach
  • Hybrid Methods Overview - Combining constraint and score
  • Asymmetry-Based Methods Overview - A different approach

Key Papers

  • Chickering, D.M. (2002). Optimal Structure Identification with Greedy Search
  • Heckerman et al. (1995). Learning Bayesian Networks
  • zangaSurveyCausalDiscovery2023 - Section 3.2

Local graph