Introduction
In this post, we extend our simulation techniques to binomial and Poisson regression models. These models are commonly used for binary and count data, respectively. By simulating data, we can:
- Understand the mechanics of these models.
- Evaluate statistical methods.
- Test hypotheses in controlled environments.
Step 1: Simulating a Binomial Regression Model
A binomial regression model (logistic regression) relates a binary outcome ( y ) to predictors ( x ) through a logistic function:
[ (p) = () = _0 + _1 x ]
Where ( p ) is the probability of success.
Simulating Data for Binomial Regression
# Parameters
<- 100 # Number of observations
n <- -1 # Intercept
beta_0 <- 0.5 # Slope
beta_1
# Simulate data
set.seed(123) # For reproducibility
<- runif(n, -2, 2) # Random predictor variable
x <- beta_0 + beta_1 * x # Linear predictor
logit_p <- exp(logit_p) / (1 + exp(logit_p)) # Convert to probability
p <- rbinom(n, size = 1, prob = p) # Generate binary outcome
y
# Combine into a data frame
<- data.frame(x = x, y = y)
data_binomial
# View the first few rows
head(data_binomial)
Visualizing the Data
library(ggplot2)
# Scatter plot with logistic curve
ggplot(data_binomial, aes(x = x, y = y)) +
geom_jitter(height = 0.1, color = "blue", alpha = 0.7) +
stat_smooth(method = "glm", method.args = list(family = "binomial"), color = "red") +
labs(title = "Simulated Binomial Regression Data",
x = "Predictor (x)",
y = "Binary Outcome (y)") +
theme_minimal()
Fitting a Logistic Regression Model
# Fit a logistic regression model
<- glm(y ~ x, family = binomial, data = data_binomial)
model_binomial
# Summary of the model
summary(model_binomial)
# Compare true and estimated parameters
<- c(beta_0 = beta_0, beta_1 = beta_1)
true_params <- coef(model_binomial)
estimated_params
# Display parameters
true_params estimated_params
Step 2: Simulating a Poisson Regression Model
A Poisson regression model relates a count outcome ( y ) to predictors ( x ) through a log link function:
[ () = _0 + _1 x ]
Where ( ) is the expected count.
Simulating Data for Poisson Regression
# Parameters
<- 100 # Number of observations
n <- 1 # Intercept
beta_0 <- 0.3 # Slope
beta_1
# Simulate data
set.seed(123) # For reproducibility
<- runif(n, 0, 5) # Random predictor variable
x <- beta_0 + beta_1 * x # Linear predictor
log_lambda <- exp(log_lambda) # Convert to rate
lambda <- rpois(n, lambda = lambda) # Generate count outcome
y
# Combine into a data frame
<- data.frame(x = x, y = y)
data_poisson
# View the first few rows
head(data_poisson)
Visualizing the Data
# Scatter plot with Poisson curve
ggplot(data_poisson, aes(x = x, y = y)) +
geom_point(color = "blue", alpha = 0.7) +
stat_smooth(method = "glm", method.args = list(family = "poisson"), color = "red") +
labs(title = "Simulated Poisson Regression Data",
x = "Predictor (x)",
y = "Count Outcome (y)") +
theme_minimal()
Fitting a Poisson Regression Model
# Fit a Poisson regression model
<- glm(y ~ x, family = poisson, data = data_poisson)
model_poisson
# Summary of the model
summary(model_poisson)
# Compare true and estimated parameters
<- c(beta_0 = beta_0, beta_1 = beta_1)
true_params <- coef(model_poisson)
estimated_params
# Display parameters
true_params estimated_params
Conclusion
Simulating binomial and Poisson regression models provides valuable insights into their mechanics and assumptions. By experimenting with these techniques, you can explore how model parameters and predictors influence outcomes.
Further Reading
Happy simulating!
Citation
@online{jarvis2022,
author = {Jarvis, Christopher},
title = {Simulation in {R} - {Part} 5},
date = {2022-12-05},
url = {https://christopher.jarvis.io/posts/2024-12-05-rsim5/},
langid = {en}
}