Introduction
Simulating data is a powerful way to explore statistical concepts, test hypotheses, or validate models. In this post, you’ll learn how to simulate a continuous variable in R, step by step.
Step 1: Understanding Continuous Distributions
A continuous variable can take on any value within a range. R provides functions to simulate random numbers from various continuous distributions, such as:
- Normal Distribution (
rnorm()
) - Uniform Distribution (
runif()
) - Exponential Distribution (
rexp()
)
In this example, we’ll focus on the normal distribution.
Step 2: Simulating a Normal Distribution
The normal distribution is characterized by its mean (mean
) and standard deviation (sd
). Use the rnorm()
function to simulate data.
# Parameters for the normal distribution
<- 1000 # Number of observations
n <- 50 # Mean of the distribution
mean <- 10 # Standard deviation of the distribution
sd
# Simulating data
set.seed(123) # Set a seed for reproducibility
<- rnorm(n, mean, sd)
simulated_data
# Display the first few simulated values
head(simulated_data)
Step 3: Visualizing the Simulated Data
Visualizing the data helps ensure it looks as expected. Use a histogram to check the shape of the distribution.
# Load ggplot2 for visualization
library(ggplot2)
# Create a histogram
ggplot(data.frame(x = simulated_data), aes(x)) +
geom_histogram(binwidth = 2, fill = "blue", alpha = 0.7) +
labs(title = "Histogram of Simulated Data", x = "Value", y = "Frequency") +
theme_minimal()
Step 4: Simulating Other Continuous Distributions
Uniform Distribution
The uniform distribution generates values between a specified minimum and maximum.
# Simulating a uniform distribution
<- 0
min_val <- 100
max_val <- runif(n, min_val, max_val)
uniform_data head(uniform_data)
Exponential Distribution
The exponential distribution is often used to model time between events.
# Simulating an exponential distribution
<- 0.1
rate <- rexp(n, rate)
exponential_data head(exponential_data)
Conclusion
Simulating continuous variables in R is straightforward and highly versatile. Whether you’re modeling normal, uniform, or exponential data, R’s built-in functions make it easy to generate and explore synthetic datasets.
Try experimenting with different distributions and parameters to see how they influence your data!
What’s Next?
Ready to take your simulations further? Check out the next post in this series:
Part 3: Simulating Discrete Distributions in R
Further Reading
Happy coding!
Citation
@online{jarvis2022,
author = {Jarvis, Christopher},
title = {Simulation in {R} - Part 2},
date = {2022-12-02},
url = {https://christopher.jarvis.io/posts/2024-12-02-rsim2/},
langid = {en}
}