Bootstrap Numerical
Description
Bootstrapping is a form of statistical procedure that enables you to create multiple simulated datasets from a single dataset. It is an alternative approach to traditional statistical approaches but has the same general goal - to allow calculation of descriptive and inferential statistics on your data.
So how does it work?
Essentially the bootstrapping process takes every value within a dataset (see below example dataset) and using a random selection it resamples the same quantity of values into a new simulated dataset. Some values from the original set may be selected once, multiple times or not at all. The result is a new dataset with the same quantity of values but a slightly differing result. Essentially what you may expect if you went out and repeated the original experiment. Using bootstrapping you could perform this resampling process hundreds or thousands of times.
Why would you do this?
The main benefit of bootstrap methods is that they don’t require assumptions to be made about the distribution of data, unlike parametric methods bootstrapping does not rely on a normal distribution. Additionally, due to the resampling process you can generate a larger pool of data without the inconvenience of repeating your experiment (IF your initial dataset is of good quality).
Worked example:
Download below example excel file ‘Bootstrap Numerical’.
In the file there are five columns, Groups A, B, C, D and E.
Each column has 25 values between 1 - 10 (this is just example data, but it could represent questionnaire answers or blood tests results for instance).
- Click analyze above open the numerical bootstrap program.
- Browse your computer for ‘Bootstrap Numerical.xlsx’ (or the dataset you would like to analyse)
- Wait for it to upload
- If you have used a .csv file at this point you need to define your separator from the multiple options
- Select the variable you would like to bootstrap. For instance ‘Group A’.
- You may then click on one of three options.
- Bootstrap Mean - calculates the bootstrapped mean and Confidence Intervals
- Bootstrap Quartiles (BCa) - calculates the bootstrapped quartiles with Bias Corrected and Accelerated.
- Bootstrap Quartiles (Perc) - calculates the bootstrapped quartiles using percentiles
- By viewing these tabs you will see the statistical output.
- If you would like to download your output, select the format and click download
- Congratulations you have run a bootstrap numerical analysis
Should I use bootstrap BCa or bootstrap Perc to calculate my confidence intervals?
The short answer is BCa intervals are adjusted intervals that are more accurate at the expense of taking slightly longer to compute.
Are there any disadvantages to bootstrap analyses?
Only values within the original dataset will be sampled, if you have significant outliers or other issues with your data quality then bootstrap methods will only exacerbate these issues.
Written by Daniel Richardson