Introduction

Imagine we have a new drug to test an illness and we gave that drug to 8 different people that had that illness. For 5 of them, the drug help them feel better, but 3 of them felt worse. If we calculate the mean response to the drug it’s 0.5.

0.5 is not a huge improvement.
But, since most (5 of 8) people improved, maybe this drug is better than using no drugs at all.
However, maybe these 5 people all felt better because they were healthier to begin with.
So, how can we tell if the response 0.5 (or any other number for that matter) is actually not due to some random things out of our control?
Is there anything we can do to decide if the drug works or not? YES
- Replicate the experiment bunch of times (expensive and time-consuming)
- Bootstrapping

Bootstrapping

How to bootstrapp the above example?

For $1 \rightarrow n$ :
- From each measurements, choose one at random.
- Repeat this process 8 times (Note: same values can be selected more than once $\rightarrow$ Sampling with Replacement)
  - This new dataset is called Bootstrapped Dataset.
- Calculate the mean of the bootstrapped dataset.
Create a histogram of bootstrapped means.

Bootstrapping consists of 4 steps:

Make a Bootstrapped Dataset.
Calculate something (e.g. mean, median, std. , etc.)
Keep track of that calculation.
Repeat steps 1 to 3 a bunch of times (1000s of times).

Standard Error and CI with Bootstrapping

You can plot a histrogram of those calculated values to get a sense of likelihood of each calculation.

Note: Because the histrogram tells us how the mean might change if we redid the expriment a bunch of times, if we want to know the Standard Error of the mean value from the original dataset, we only need to calculate the Standard Deviation of the histogram. A 95% Confidence Interval is just an interval that covers 95% of the bootstrapped means.

Note: There are other fancier ways to use bootstrapping to calculate confidence intervals.

Note: So far, we have used bootstrapping to calculate SE and CI for the mean. BUT, both SE and CI can be calculated directly with a formula, without having to create bootstrapped datasets. So, what is it that makes bootstrapping so awesome?

The awesome thing about bootstrapping is that we can apply it to any statistic to create a histogram of what might happen if we repeated the experiment a bunch of times. We can use that histogram to calculate stuff like SE and CI without having to worry about whether or not there is a nice formula.

Regardless of the statistic we calculate, bootstrapping allows us to see it in the context of a distribution and we can use that distribution to help us intrepret the initial results.