Introduction

Imagine we have a new drug to test an illness and we gave that drug to 8 different people that had that illness. For 5 of them, the drug help them feel better, but 3 of them felt worse. If we calculate the mean response to the drug it’s 0.5.

Screen Shot 2022-03-18 at 12.41.54 PM

Bootstrapping

How to bootstrapp the above example?


Bootstrapping consists of 4 steps:

  1. Make a Bootstrapped Dataset.
  2. Calculate something (e.g. mean, median, std. , etc.)
  3. Keep track of that calculation.
  4. Repeat steps 1 to 3 a bunch of times (1000s of times).

Standard Error and CI with Bootstrapping

You can plot a histrogram of those calculated values to get a sense of likelihood of each calculation.

Note: Because the histrogram tells us how the mean might change if we redid the expriment a bunch of times, if we want to know the Standard Error of the mean value from the original dataset, we only need to calculate the Standard Deviation of the histogram. A 95% Confidence Interval is just an interval that covers 95% of the bootstrapped means.

Note: There are other fancier ways to use bootstrapping to calculate confidence intervals.

Note: So far, we have used bootstrapping to calculate SE and CI for the mean. BUT, both SE and CI can be calculated directly with a formula, without having to create bootstrapped datasets. So, what is it that makes bootstrapping so awesome?

Regardless of the statistic we calculate, bootstrapping allows us to see it in the context of a distribution and we can use that distribution to help us intrepret the initial results.