Introduction
In this post, we extend our simulation techniques to binomial and Poisson regression models. These models are commonly used for binary and count data, respectively. By simulating data, we can:
- Understand the mechanics of these models.
- Evaluate statistical methods.
- Test hypotheses in controlled environments.
Step 1: Simulating a Binomial Regression Model
A binomial regression model (logistic regression) relates a binary outcome ( y ) to predictors ( x ) through a logistic function:
[ (p) = () = _0 + _1 x ]
Where ( p ) is the probability of success.
Simulating Data for Binomial Regression
# Parameters
n <- 100 # Number of observations
beta_0 <- -1 # Intercept
beta_1 <- 0.5 # Slope
# Simulate data
set.seed(123) # For reproducibility
x <- runif(n, -2, 2) # Random predictor variable
logit_p <- beta_0 + beta_1 * x # Linear predictor
p <- exp(logit_p) / (1 + exp(logit_p)) # Convert to probability
y <- rbinom(n, size = 1, prob = p) # Generate binary outcome
# Combine into a data frame
data_binomial <- data.frame(x = x, y = y)
# View the first few rows
head(data_binomial)Visualizing the Data
library(ggplot2)
# Scatter plot with logistic curve
ggplot(data_binomial, aes(x = x, y = y)) +
geom_jitter(height = 0.1, color = "blue", alpha = 0.7) +
stat_smooth(method = "glm", method.args = list(family = "binomial"), color = "red") +
labs(title = "Simulated Binomial Regression Data",
x = "Predictor (x)",
y = "Binary Outcome (y)") +
theme_minimal()Fitting a Logistic Regression Model
# Fit a logistic regression model
model_binomial <- glm(y ~ x, family = binomial, data = data_binomial)
# Summary of the model
summary(model_binomial)
# Compare true and estimated parameters
true_params <- c(beta_0 = beta_0, beta_1 = beta_1)
estimated_params <- coef(model_binomial)
# Display parameters
true_params
estimated_paramsStep 2: Simulating a Poisson Regression Model
A Poisson regression model relates a count outcome ( y ) to predictors ( x ) through a log link function:
[ () = _0 + _1 x ]
Where ( ) is the expected count.
Simulating Data for Poisson Regression
# Parameters
n <- 100 # Number of observations
beta_0 <- 1 # Intercept
beta_1 <- 0.3 # Slope
# Simulate data
set.seed(123) # For reproducibility
x <- runif(n, 0, 5) # Random predictor variable
log_lambda <- beta_0 + beta_1 * x # Linear predictor
lambda <- exp(log_lambda) # Convert to rate
y <- rpois(n, lambda = lambda) # Generate count outcome
# Combine into a data frame
data_poisson <- data.frame(x = x, y = y)
# View the first few rows
head(data_poisson)Visualizing the Data
# Scatter plot with Poisson curve
ggplot(data_poisson, aes(x = x, y = y)) +
geom_point(color = "blue", alpha = 0.7) +
stat_smooth(method = "glm", method.args = list(family = "poisson"), color = "red") +
labs(title = "Simulated Poisson Regression Data",
x = "Predictor (x)",
y = "Count Outcome (y)") +
theme_minimal()Fitting a Poisson Regression Model
# Fit a Poisson regression model
model_poisson <- glm(y ~ x, family = poisson, data = data_poisson)
# Summary of the model
summary(model_poisson)
# Compare true and estimated parameters
true_params <- c(beta_0 = beta_0, beta_1 = beta_1)
estimated_params <- coef(model_poisson)
# Display parameters
true_params
estimated_paramsConclusion
Simulating binomial and Poisson regression models provides valuable insights into their mechanics and assumptions. By experimenting with these techniques, you can explore how model parameters and predictors influence outcomes.
Further Reading
Happy simulating!
Citation
@online{jarvis2022,
author = {Jarvis, Christopher},
title = {Simulation in {R} - {Part} 5},
date = {2022-12-05},
url = {https://christopher.jarvis.io/posts/2024-12-05-rsim5/},
langid = {en}
}