Modelling for Pandemic Preparedness and Response Modular Shortcourse, 2025
2025-09-06
By the end of this lecture, you will be able to:
Question: How do we estimate \(\beta\) and \(\gamma\) from this noisy data?
Today’s Focus: Least Squares and MLE as foundations
Definition: The process of finding parameter values that make a mathematical model’s predictions as close as possible to observed data.
\[\hat{\theta} = \arg\min_{\theta} \mathcal{L}(\theta, \mathbf{y})\]
Where:
Differential Equations:
\[\begin{align} \frac{dS}{dt} &= -\beta SI \\ \frac{dI}{dt} &= \beta SI - \gamma I \\ \frac{dR}{dt} &= \gamma I \end{align}\]
Identifiability Problem: Multiple parameter combinations can produce similar model outputs
Example:
Both might give similar epidemic curves!
Solution: Use additional information (e.g., known infectious period)
Core Idea: Minimize the sum of squared differences between model predictions and observations
Mathematical Formulation: \[\text{SSE} = \sum_{i=1}^{n} (y_i - f(t_i, \theta))^2\]
Where:
Advantages:
# Define objective function
sse_function <- function(params, data) {
beta <- params[1]
gamma <- params[2]
# Simulate model
out <- ode(
y = init_conds,
times = data$time,
func = sir_model,
parms = c(beta = beta, gamma = gamma)
)
# Extract and rescale the predicted cases
predicted <- out[, "I"] * 1000
# Calculate sum of squared errors
sse <- sum((data$observed - predicted)^2)
return(sse)
}
# Find minimum
best_idx <- which.min(test_params$sse)
best_params <- test_params[best_idx, ]
cat("True parameters: Beta =", true_params[1], ", Gamma =", true_params[2])
True parameters: Beta = 0.3 , Gamma = 0.1
cat("Best fit parameters: Beta =", round(best_params$beta, 3), ", Gamma =", round(best_params$gamma, 3), "\n")
Best fit parameters: Beta = 0.322 , Gamma = 0.1
Computational Advantages:
Statistical Limitations:
Practical Limitations:
Core Idea: Find parameter values that make the observed data most probable
Mathematical Formulation: \[\hat{\theta} = \arg\max_{\theta} L(\theta) = \arg\max_{\theta} \prod_{i=1}^{n} f(y_i | \theta)\]
Where:
In Practice: Maximize log-likelihood \[\hat{\theta} = \arg\max_{\theta} \ell(\theta) = \arg\max_{\theta} \sum_{i=1}^{n} \log f(y_i | \theta)\]
For Count Data (e.g., Cases):
For Continuous Data:
Note
For Our SIR Example: We’ll use Poisson since we’re modeling case counts
# Define negative log-likelihood function
nll_function <- function(beta, gamma, data) {
# Simulate model
out <- ode(y = init_conds, times = data$time,
func = sir_model, parms = c(beta = beta, gamma = gamma))
# Model predictions (scaled to cases)
predicted <- out[,"I"] * 1000
# Poisson negative log-likelihood
nll <- -sum(dpois(data$observed, lambda = predicted, log = TRUE))
return(nll)
}
Beta: 0.291 ± 0.003
Gamma: 0.05 ± 0
R0: 5.79
True Beta: 0.3
MLE Beta: 0.291
True Gamma: 0.1
MLE Gamma: 0.05
# Fit different models and compare
# Model 1: SIR with Poisson
# Model 2: SIR with negative binomial
# Negative binomial likelihood
nll_nb <- function(beta, gamma, phi, data) {
out <- ode(y = init_conds, times = data$time,
func = sir_model, parms = c(beta = beta, gamma = gamma))
# Extract and scale predictions to cases
predicted <- out[,"I"] * 1000
# Negative Binomial negative log-likelihood
nll <- -sum(dnbinom(data$observed, mu = predicted, size = phi, log = TRUE))
return(nll)
}
Poisson AIC: 19650.9
Negative Binomial AIC: 3189.442
Delta AIC: -16461.45
Statistical Rigor:
Computational Challenges:
Use Least Squares When:
Use Maximum Likelihood When:
Why We Need Advanced Methods:
Core Idea: Treat parameters as random variables with prior distributions
Bayes’ Theorem: \[P(\theta | \mathbf{y}) = \frac{P(\mathbf{y} | \theta) P(\theta)}{P(\mathbf{y})}\]
# Stan model for SIR fitting
"
// SIR model in Stan (walkthrough)
// Functions block: derivative of [S, I, R]
functions {
vector SIR(real t, vector y, array[] real theta) {
real S = y[1]; real I = y[2]; real R = y[3];
real beta = theta[1]; real gamma = theta[2];
vector[3] dydt;
dydt[1] = -beta * S * I;
dydt[2] = beta * S * I - gamma * I;
dydt[3] = gamma * I;
return dydt;
}
}
// Data block: obs counts & time grid
data {
int<lower=1> n_obs;
int<lower=1> n_pop;
array[n_obs] int y;
real t0;
array[n_obs] real ts;
}
// Parameters: beta, gamma, S0
parameters {
array[2] real<lower=0> theta; // {beta, gamma}
real<lower=0,upper=1> S0; // initial susceptible fraction
}
// Transformed params: ODE solve + Poisson rate
transformed parameters {
vector[3] y_init;
array[n_obs] vector[3] y_hat;
array[n_obs] real lambda;
y_init[1] = S0; y_init[2] = 1 - S0; y_init[3] = 0;
y_hat = ode_rk45(SIR, y_init, t0, ts, theta);
for (i in 1:n_obs) lambda[i] = y_hat[i, 2] * n_pop;
}
// Model: priors + likelihood
model {
theta ~ lognormal(0, 1);
S0 ~ beta(1, 1);
y ~ poisson(lambda);
}
// GQ: derived R0
generated quantities {
real R_0 = theta[1] / theta[2];
}
"
Core Idea: Sequential Monte Carlo method for state-space models
When to Use:
# Particle filter for SIR model
library(pomp)
# Define SIR model with stochasticity
sir_pomp <- pomp(
data = data.frame(time = covid_times, cases = covid_observed),
times = "time",
t0 = 0,
rprocess = euler.sim(
step.fun = "sir_step",
delta.t = 0.1
),
rmeasure = "cases_measure",
dmeasure = "cases_dmeasure",
initializer = "sir_init",
paramnames = c("beta", "gamma", "sigma"),
statenames = c("S", "I", "R")
)
# Run particle filter
pf <- pfilter(sir_pomp, Np = 1000, params = c(beta = 0.3, gamma = 0.1, sigma = 1))
Core Idea: Approximate posterior without likelihood evaluation
When to Use:
Algorithm:
Advantages:
Core Idea: Combine multiple models or methods
Types:
Least Squares:
Maximum Likelihood:
Advanced Methods:
For Quick Exploration: Least Squares
For Publication: Maximum Likelihood
For Complex Models: Bayesian Methods
For Real-time: Particle Filtering
For Model Comparison: ABC or MCMC
General Principle: Start simple, add complexity as needed
Model fitting is both art and science:
The goal is not just to fit models, but to:
Any Questions?