December 16, 2013

Nested Bootstrap for Hypothesis Testing and Sample Power in R

Another one in the series

Because I’m a bit slow I’ve been stuck for awhile wondering how to determine sample size for Levene’s test: I’d got the understanding of shifting data so we had two data sets with known difference in medians for tests like Mood’s Median, but I couldn’t figure out how to shift the data so we had a known difference in variance.

It’s embarrassingly simple.

Say, for sake of example, I have a set of non-normal data and I’m measuring its variance using 5th quantile to 95th quantile span and this value is 150. I have a second set of data whose span is 50 and I’m going to compare these sets with Levene’s test, but I want to know the power my sample size provides.

Well, all I have to do is to take my first set of data and make a new data set from it by multiplying the first by 50/150. This “shifts” it so we have two data sets with a known difference in variation (span).

Then I can bootstrap as previously to determine sample power:

#First data set
data1 <- c(...)

#Shift data (No real need for integers, beyond prettier data)
data.shift <- as.integer(data1 * 50/150) 

#Function to assess power of sample size
power.of.levenes = function(group1, group2, reps=5000, size=10) {
	results  <- sapply(1:reps, function(r) {
		resample1 <- sample(group1, size=size, replace = T) 
		resample2 <- sample(group2, size=size, replace = T) 
		#leveneTest from Cars package
		test <- leveneTest(values ~ ind, data = stack(data.frame(resample1, resample2)))
		test$"Pr(>F)"[1]
	})
	sum(results<0.05)/reps
}

#Calculate power
power.of.levenes(data1, data.shift, reps=1000, size=18)

And this will give me the power of my sample size for detecting the difference in variation I’m interested in.

Tags:
- program

atomicules

Mostly walking the dogs

Nested Bootstrap for Hypothesis Testing and Sample Power in R