Standard disclaimer applies

And so, just like that - hey, I have a shiny new hammer so suddenly everything is a nail - I came across another situation that required bootstrapping. I was even more stuck this time though: very non-normal data and wanting to prove (or at least get a “feel” for) whether the sample size I had was adequate, but there was no suitable* standard hypothesis test I could use; Median tests like Mood’s or Mann Whitney Wilcoxon made no sense because although the medians were EXACTLY the same in my sample and “population” a median test comparing the two would reject the null hypothesis.

Searching around I came across this paper, Bootstrap Hypothesis Testing by Rob Gould, that showed how a bootstrap could be used for hypothesis testing. So I had a cunning idea to perform a nested bootstrap: the inner bootstrap performing a hypothesis test and the outer determining sample power.

The benefit of using a bootstrap for hypothesis testing is that you can perform any test you can dream up and you can also therefore invert the null hypothesis: that is, the null hypothesis of standard tests usually assumes the “norm”, that there is no difference, but with a bootstrap hypothesis test you can switch to the norm being the alternative hypothesis. This actually suits testing for statistical power better because of the definition, “the probability that the test will reject the null hypothesis when the alternative hypothesis is true”.

For the sample and population I decided my hypothesis test would be to look at the range between the 25th percentile and the 75th percentile (hey, I was short of time) as I knew they matched. Therefore my null hypothesis would be, “the range will differ” and the alternative that, “the range will be the same”. So the bootstrap for the hypothesis test ends up like so:

#Build up distribution of quantile ranges by repeatedly resampling from population
dist.qnt.range <- sapply(1:1000, function(r) {
	smp <- sample(pop, size=size, replace = T)
	qnt <- quantile(smp, c(0.25, 0.75))
	test <- qnt[[2]] - qnt[[1]]
})
#Use distribution to calculate a probability of not equalling the expected quantile range
prob.qnt.h0 <- sum(dist.qnt.range != expected.qnt.range)/reps

The above can then be nested in a bootstrap to determine power for a given sample size. I.e I can increase the sample size until the null hypothesis is being rejected to such an extent that I can be confident the sample represents the population (using our test as a way of assessing that representation).

power = function(pop, reps=500, size=30) {
	#Build up probability distribution of not equalling the expected quantile range
	dist.prob.h0  <- sapply(1:reps, function(r) {
		#Build up distribution of quantile ranges by repeatedly resampling from population
		dist.qnt.range  <- sapply(1:reps, function(r) {
			smp <- sample(pop, size=size, replace = T)
			qnt <- quantile(smp, c(0.25, 0.75))
			test <- qnt[[2]] - qnt[[1]]
		})
		#Use distribution to calculate a probability of not equalling the expected quantile range
		prob.qnt.h0 <- sum(dist.qnt.range != expected.qnt.range)/reps
	})
	#Power is then count of how many times null hypothesis is rejected
	sum(dist.prob.h0<0.05)/reps
}

The only problem with a nested bootstrap is that it’s a nested bootstrap: Computers seem really fast nowadays until you nest loops of thousands!

* Of course, it’s only as I write this post that I realise the data was bi-modal, although very skewed, and so I might have been able to use some kind of standard test.