For simple statistical models (e.g., t-test, correlation), calculating the estimated power can be done analytically (for example, one can use the ‘pwr’ package). But for more complex models, it is difficult to provide a good estimate of power without the use of simulation. Simulations repeatedly generate random data based on one’s predefined model, then analyze each data set and count the proportion of results that are significant. That proportion is the estimated power for the model.
The ‘paramtest’ package makes this process simple by providing a
general-purpose, flexible function to perform simulations. You simply
create a function that generates the data and analyses you want, and
then use one of the functions, such as grid_search()
, to
run your user-defined function repeatedly and collate the results. The
grid_search()
function allows you to easily run your
function iteratively while varying particular parameters, so you can
test the sensitivity of your analyses or how these parameters change the
results.
# load all packages used in this vignette
library('paramtest')
library('pwr')
library('ggplot2')
library('knitr')
library('nlme')
library('lavaan')
library('dplyr')
If we wanted to estimate the power for a two-sample t-test, we could calculate it analytically using the ‘pwr’ package:
pwr.t.test(n=50, d=.5, type='two.sample')
#>
#> Two-sample t test power calculation
#>
#> n = 50
#> d = 0.5
#> sig.level = 0.05
#> power = 0.6968934
#> alternative = two.sided
#>
#> NOTE: n is number in *each* group
We can see that the estimated power when Cohen’s d = .50 and n = 50 per cell is approximately .70. Simulating power in this simple case is likely overkill, but this example will demonstrate that simulations provide comparable results, at least for this model.
# create user-defined function to generate and analyze data
t_func <- function(simNum, N, d) {
x1 <- rnorm(N, 0, 1)
x2 <- rnorm(N, d, 1)
t <- t.test(x1, x2, var.equal=TRUE) # run t-test on generated data
stat <- t$statistic
p <- t$p.value
return(c(t=stat, p=p, sig=(p < .05)))
# return a named vector with the results we want to keep
}
power_ttest <- run_test(t_func, n.iter=5000, output='data.frame', N=50, d=.5) # simulate data
results(power_ttest) %>%
summarise(power=mean(sig))
#> power
#> 1 0.7024
We first create a function that simulates normally distributed data
for two groups, and performs a t-test. The t statistic
and p-value are then returned as a named vector, along with a
boolean value determining whether the test is significant or not.
Finally, we use the run_test()
function to perform 5000
simulations, and then summarize the ‘sig’ value, which (by default)
calculates the mean, giving us the proportion of simulations that were
significant. This number agrees very closely with the analytic solution
above.
While in the above example we used run_test()
because we
were not varying any parameters, with the ‘paramtest’ package we can
also use a function called grid_search()
to easily vary
parameters. For example, using the same t-test simulation as
above:
# give 'params' a list of parameters we want to vary;
# testing at N=25, N=50, and N=100
power_ttest_vary <- grid_search(t_func, params=list(N=c(25, 50, 100)),
n.iter=5000, output='data.frame', d=.5)
results(power_ttest_vary) %>%
group_by(N.test) %>%
summarise(power=mean(sig))
N.test | power |
---|---|
25 | 0.402 |
50 | 0.703 |
100 | 0.940 |
We were easily able to run simuations for three different sample
sizes: 25 per cell, 50 per cell, and 100 per cell. The
summary()
function shows us the proportion of simulations
that were significant for each sample size. Clearly, power increases as
N increases. If we wanted to vary across two separate
parameters, that is easy to do as well:
# varying N and Cohen's d
power_ttest_vary2 <- grid_search(t_func, params=list(N=c(25, 50, 100), d=c(.2, .5)),
n.iter=5000, output='data.frame')
power <- results(power_ttest_vary2) %>%
group_by(N.test, d.test) %>%
summarise(power=mean(sig))
print(power)
ggplot(power, aes(x=N.test, y=power, group=factor(d.test), colour=factor(d.test))) +
geom_point() +
geom_line() +
ylim(c(0, 1)) +
labs(x='Sample Size', y='Power', colour="Cohen's d") +
theme_minimal()
#> `summarise()` has grouped output by 'N.test'. You can override using the
#> `.groups` argument.
N.test | d.test | power |
---|---|---|
25 | 0.2 | 0.098 |
25 | 0.5 | 0.409 |
50 | 0.2 | 0.171 |
50 | 0.5 | 0.704 |
100 | 0.2 | 0.298 |
100 | 0.5 | 0.936 |
Note that grid_search()
will fully cross all parameters:
the three sample sizes are tested at d = .2, and at d
= .5. So be careful when adding new parameters, as this can greatly
increase the total number of simulations to be run!
In order to cut down on the time taken to run simulations, the ‘paramtest’ package supports parallel processing via the ‘parallel’ package. If your system has multiple processor cores or supports multithreading, you can incorporate this into your function call:
# time with no parallel processing (your mileage may very greatly)
system.time(power_ttest_vary3 <- grid_search(t_func,
params=list(N=c(25, 50, 100), d=c(.2, .5)), n.iter=5000, output='data.frame'))
#> user system elapsed
#> 4.33 0.03 4.36
# using parallel processing; I am using a Windows system, so I use parallel='snow';
# see the documentation for the 'parallel' package for more details
system.time(power_ttest_vary4 <- grid_search(t_func,
params=list(N=c(25, 50, 100), d=c(.2, .5)), n.iter=5000, output='data.frame', parallel='snow', ncpus=4))
#> user system elapsed
#> 0.42 0.03 2.25
Theoretically, when using n cores/threads, your code should take 1/n the amount of time. In practice, there is some overhead to setting up the parallel threads, so you may not get quite such an improvement. But the proportion of time saved should grow the longer the overall job is.
In addition to running a function that creates simulated data, the ‘paramtest’ package can also handle bootstrapping, where you have data that you want to randomly sample from. This features relies on the ‘boot’ package:
# user function must take data and indices as first two arguments; see 'boot'
# package documentation for more details
t_func_boot <- function(data, indices) {
sample_data <- data[indices, ]
treatGroup <- sample_data[sample_data$group == 'trt2', 'weight']
ctrlGroup <- sample_data[sample_data$group == 'ctrl', 'weight']
t <- t.test(treatGroup, ctrlGroup, var.equal=TRUE)
stat <- t$statistic
p <- t$p.value
return(c(t=stat, p=p, sig=(p < .05)))
}
# example using built-in dataset PlantGrowth
power_ttest_boot <- run_test(t_func_boot, n.iter=5000, output='data.frame', boot=TRUE,
bootParams=list(data=PlantGrowth))
results(power_ttest_boot) %>%
summarise(power=mean(sig))
#> power
#> 1 0.5376
We can use grid_search()
to calculate the power for a
particular coefficient in a linear model:
lm_test <- function(simNum, N, b1, b0=0, xm=0, xsd=1) {
x <- rnorm(N, xm, xsd)
y <- rnorm(N, b0 + b1*x, sqrt(1 - b1^2)) # var. approx. 1 after accounting
# for explained variance by x
model <- lm(y ~ x)
# pull output from model
est <- coef(summary(model))['x', 'Estimate']
se <- coef(summary(model))['x', 'Std. Error']
p <- coef(summary(model))['x', 'Pr(>|t|)']
return(c(xm=mean(x), xsd=sd(x), ym=mean(y), ysd=sd(y), est=est, se=se, p=p,
sig=(p < .05)))
}
# we vary N at 200 and 300; we are also setting coefficient of x predicting
# y to be approx. .15 across all simulations
power_lm <- grid_search(lm_test, params=list(N=c(200, 300)), n.iter=5000, output='data.frame', b1=.15,
parallel='snow', ncpus=4)
results(power_lm) %>%
group_by(N.test) %>%
summarise(power=mean(sig))
N.test | power |
---|---|
200 | 0.568 |
300 | 0.733 |
Of course, in the case of a single predictor, one can determine an analytic solution to this:
# f2 = R^2 / (1 - R^2)
pwr.f2.test(u=1, v=c(200-2, 300-2), f2=(.15^2) / (1 - .15^2))
#>
#> Multiple regression power calculation
#>
#> u = 1
#> v = 198, 298
#> f2 = 0.0230179
#> sig.level = 0.05
#> power = 0.5695666, 0.7451719
However, using simulation, we can determine the power for more complex models, including interactions and simple effects.
lm_test_interaction <- function(simNum, N, b1, b2, b3, b0=0, x1m=0, x1sd=1,
x2m=0, x2sd=1) {
x1 <- rnorm(N, x1m, x1sd)
x2 <- rnorm(N, x2m, x2sd)
yvar <- sqrt(1 - b1^2 - b2^2 - b3^2) # residual variance
y <- rnorm(N, b0 + b1*x1 + b2*x2 + b3*x1*x2, yvar)
model <- lm(y ~ x1 * x2)
# pull output from model (two main effects and interaction)
est_x1 <- coef(summary(model))['x1', 'Estimate']
p_x1 <- coef(summary(model))['x1', 'Pr(>|t|)']
sig_x1 <- p_x1 < .05
est_x2 <- coef(summary(model))['x2', 'Estimate']
p_x2 <- coef(summary(model))['x2', 'Pr(>|t|)']
sig_x2 <- p_x2 < .05
est_int <- coef(summary(model))['x1:x2', 'Estimate']
p_int <- coef(summary(model))['x1:x2', 'Pr(>|t|)']
sig_int <- p_int < .05
return(c(est_x1=est_x1, p_x1=p_x1, sig_x1=sig_x1, est_x2=est_x2, p_x2=p_x2,
sig_x2=sig_x2, est_int=est_int, p_int=p_int, sig_int=sig_int))
}
# varying N at 200 and 300; setting coefficient of x1 = .15, coefficient of
# x2 = 0, and coefficien of interaction = .3
power_lm_int <- grid_search(lm_test_interaction, params=list(N=c(200, 300)),
n.iter=5000, output='data.frame', b1=.15, b2=0, b3=.3, parallel='snow', ncpus=4)
results(power_lm_int) %>%
group_by(N.test) %>%
summarise(
power_x1=mean(sig_x1),
power_x2=mean(sig_x2),
power_int=mean(sig_int))
N.test | power_x1 | power_x2 | power_int |
---|---|---|---|
200 | 0.593 | 0.052 | 0.988 |
300 | 0.774 | 0.051 | 0.999 |
Here, we are able to calculate the power for three coefficients at the same time. Note that for the coefficient b2, even though we set the true parameter equal to 0, power will trend toward your alpha level (typically .05), as this is the rate of false positives.
We can also calculate the power for simple effects:
lm_test_simple <- function(simNum, N, b1, b2, b3, b0=0, x1m=0, x1sd=1, x2m=0, x2sd=1) {
x1 <- rnorm(N, x1m, x1sd)
x2 <- rnorm(N, x2m, x2sd)
yvar <- sqrt(1 - b1^2 - b2^2 - b3^2)
y <- rnorm(N, b0 + b1*x1 + b2*x2 + b3*x1*x2, yvar)
model <- lm(y ~ x1 * x2) # here is the original model
est_int <- coef(summary(model))['x1:x2', 'Estimate']
p_int <- coef(summary(model))['x1:x2', 'Pr(>|t|)']
sig_int <- p_int < .05
# calculate x1 at +/- 1 SD, to look at simple effects
x1minus1sd <- x1 - mean(x1) + sd(x1)
x1plus1sd <- x1 - mean(x1) - sd(x1)
# new models to examine simple effects
model2 <- lm(y ~ x1minus1sd * x2)
model3 <- lm(y ~ x1plus1sd * x2)
# test effect of x2 when x1 is at +/- 1 SD
est_x2_minus1 <- coef(summary(model2))['x2', 'Estimate']
p_x2_minus1 <- coef(summary(model2))['x2', 'Pr(>|t|)']
sig_x2_minus1 <- p_x2_minus1 < .05
est_x2_plus1 <- coef(summary(model3))['x2', 'Estimate']
p_x2_plus1 <- coef(summary(model3))['x2', 'Pr(>|t|)']
sig_x2_plus1 <- p_x2_plus1 < .05
return(c(est_int=est_int, p_int=p_int, sig_int=sig_int,
est_x2_minus1=est_x2_minus1, p_x2_minus1=p_x2_minus1,
sig_x2_minus1=sig_x2_minus1, est_x2_plus1=est_x2_plus1,
p_x2_plus1=p_x2_plus1, sig_x2_plus1=sig_x2_plus1))
}
power_lm_simple <- grid_search(lm_test_simple, params=list(N=c(200, 300)),
n.iter=5000, output='data.frame', b1=.15, b2=0, b3=.3, parallel='snow', ncpus=4)
results(power_lm_simple) %>%
group_by(N.test) %>%
summarise(
power_x2_minus1=mean(sig_x2_minus1),
power_x2_plus1=mean(sig_x2_plus1))
N.test | power_x2_minus1 | power_x2_plus1 |
---|---|---|
200 | 0.862 | 0.866 |
300 | 0.964 | 0.968 |
Multilevel models (MLM) can be especially difficult to estimate power for, as there are numerous parameters that can vary. In the example below, we examine a simple MLM examining the effect of time on a variable measured at 4 time points.
mlm_test <- function(simNum, N, b1, b0=0, xm=0, xsd=1, varInt=1, varSlope=1, varResid=1) {
timePoints <- 4
subject <- rep(1:N, each=timePoints)
sub_int <- rep(rnorm(N, 0, sqrt(varInt)), each=timePoints) # random intercept
sub_slope <- rep(rnorm(N, 0, sqrt(varSlope)), each=timePoints) # random slope
time <- rep(0:(timePoints-1), N)
y <- (b0 + sub_int) + (b1 + sub_slope)*time + rnorm(N*timePoints, 0, sqrt(varResid))
# y-intercept as a function of b0 plus random intercept;
# slope as a function of b1 plus random slope
data <- data.frame(subject, sub_int, sub_slope, time, y)
# for more complex models that might not converge, tryCatch() is probably
# a good idea
return <- tryCatch({
model <- nlme::lme(y ~ time, random=~time|subject, data=data)
# when using parallel processing, we must refer to functions from
# packages directly, e.g., package::function()
est <- summary(model)$tTable['time', 'Value']
se <- summary(model)$tTable['time', 'Std.Error']
p <- summary(model)$tTable['time', 'p-value']
return(c(est=est, se=se, p=p, sig=(p < .05)))
},
error=function(e) {
#message(e) # print error message
return(c(est=NA, se=NA, p=NA, sig=NA))
})
return(return)
}
# I am cutting this down to 500 iterations so that the document compiles faster; I would, however,
# recommend more iterations for a stable estimate
power_mlm <- grid_search(mlm_test, params=list(N=c(200, 300)), n.iter=500, output='data.frame', b1=.15,
varInt=.05, varSlope=.15, varResid=.4, parallel='snow', ncpus=4)
results(power_mlm) %>%
group_by(N.test) %>%
summarise(
power=mean(sig, na.rm=TRUE),
na=sum(is.na(sig))) # we use this to count up how many cases did not properly converge
N.test | power | na |
---|---|---|
200 | 0.993 | 65 |
300 | 1.000 | 32 |
Here is an example of a simple mediation model:
med_test <- function(simNum, N, aa, bb, cc) {
x <- rnorm(N, 0, 1)
m <- rnorm(N, aa*x, sqrt(1 - aa^2))
y <- rnorm(N, cc*x + bb*m, sqrt(1 - cc^2 - bb^2))
data <- data.frame(x, m, y)
# set up lavaan model to calculate indirect effect (ab) and total effect
model <- '
m ~ a*x
y ~ c*x
y ~ b*m
ab := a*b
total := c + (a*b)'
fit <- lavaan::sem(model, data=data)
ests <- lavaan::parameterEstimates(fit)
# when using parallel processing, we must refer to functions from
# packages directly, e.g., package::function()
# pull output from model
a_est <- ests[ests$label == 'a', 'est']
a_p <- ests[ests$label == 'a', 'pvalue']
b_est <- ests[ests$label == 'b', 'est']
b_p <- ests[ests$label == 'b', 'pvalue']
c_est <- ests[ests$label == 'c', 'est']
c_p <- ests[ests$label == 'c', 'pvalue']
ab_est <- ests[ests$label == 'ab', 'est']
ab_p <- ests[ests$label == 'ab', 'pvalue']
return(c(a_est=a_est, a_p=a_p, b_est=b_est, b_p=b_p, c_est=c_est, c_p=c_p,
ab_est=ab_est, ab_p=ab_p, sig=(ab_p < .05)))
}
# set up mediation model where x -> m = .15, m -> y = .2, and x -> y = .05
power_med <- grid_search(med_test, params=list(N=c(200, 300)), n.iter=1000, output='data.frame', aa=.15,
bb=.2, cc=.05, parallel='snow', ncpus=4)
results(power_med) %>%
group_by(N.test) %>%
summarise(
power=mean(sig, na.rm=TRUE))
N.test | power |
---|---|
200 | 0.238 |
300 | 0.482 |
And an example of predicting a latent variable:
latent_test <- function(simNum, N, b1, ind1, ind2, ind3) {
# matrix of factor structure; we have x as observed predictor, and y is a
# latent variable with three indicators
fmodel <- matrix(
c(1, 0, # x
0, ind1, # y1
0, ind2, # y2
0, ind3), # y3
nrow=4, ncol=2, byrow=TRUE, dimnames=list(
c('x', 'y1', 'y2', 'y3'), # rows (observed)
c('x', 'y'))) # cols (latent)
# matrix of effects structure (variance-covariance); we are using x to
# predict y (with coefficient specified as b1)
y_resid_var <- sqrt(1 - b1^2)
effects <- matrix(
c(1, b1, # x
0, y_resid_var), # y
nrow=2, ncol=2, byrow=TRUE, dimnames=list(
c('x', 'y'), c('x', 'y')))
data <- paramtest::gen_data(fmodel, effects)
# generates the data using factor and effects matrices
model <- '
y =~ y1 + y2 + y3
y ~ b1*x'
fit <- lavaan::sem(model, data=data)
ests <- lavaan::parameterEstimates(fit)
est <- ests[ests$label == 'b1', 'est']
p <- ests[ests$label == 'b1', 'pvalue']
return(c(est=est, p=p, sig=(p < .05)))
}
power_sem <- grid_search(latent_test, params=list(N=c(200, 300)), n.iter=1000, output='data.frame',
b1=.15, ind1=.4, ind2=.4, ind3=.4, parallel='snow', ncpus=4)
results(power_sem) %>%
group_by(N.test) %>%
summarise(
power=mean(sig, na.rm=TRUE))
N.test | power |
---|---|
200 | 0.802 |
300 | 0.801 |
Simulating the statistical power of complex models can be challenging due to the number of parameters that one needs to estimate. Making assumptions about how the variables covary, how they relate to each other, etc. can make it difficult. However, using the ‘paramtest’ package provides a flexible way to run simulations in order to properly estimate power across a variety of assumptions. Hopefully, the examples in this vignette can provide you with a useful template from which to create models that fit your particular needs.