Bootstrap Categorical



Bootstrapping is a statistical technique that is used to create multiple datasets from one observed set of data. These data sets are created by randomly selecting a value from the observed set and then replacing it before randomly selecting another value. This process is repeated until the same number of values have been selected as that of the observed data set. This means that each of the observed values may be selected once, more than once or not at all in the new data set. This process means that thousands of new data sets can be made from the one that has been observed. Assuming that the observed data set is an accurate reflection of the population it was selected from, this has the same effect as carrying out the original study thousands of times and will enable you to estimate the statistics of the background population more accurately.

The bootstrap - categorical function will enable you to calculate mean occurrences with standard errors and 95% confidence intervals for binary data. Some examples of these variables in medical statistics would be sex (M/F) or presence of a pathology (affected/unaffected).


What is the benefit of bootstrapping over parametric methods?

Bootstrapping can be done without assuming the distribution of data, whereas parametric analysis relies on the data being normally distributed.


What are the disadvantages of bootstrapping?

The main disadvantage of bootstrapping is that as the values in the observed data set are sampled many thousands of times, any outliers or inaccurate values will continue to be heavily included and could affect the validity of the data. 


Worked Example: 

Download below example excel file ‘Bootstrap Categorical’. 

In the file there is a spreadsheet with three columns - Group, Sex and Pathology.


  1. Click analyze above to open the categorical bootstrap program
  2. Upload ‘Bootstrap categorical example data.xlsx’ (or the dataset you would like to analyse) using the browse function
  3. If you have used a .csv file at this point you need to define your separator from the multiple options 
  4. Select the variable you would like to bootstrap. For instance ‘Sex’. 
  5. If you select the ‘Bootstrap – categorical’ tab you can now see the following:
    • Bootstrap Statistics - calculates the bootstrapped mean, bias and standard error 
    • Bootstrap Quartiles (BCa) - calculates the bootstrapped quartiles with Bias Corrected and Accelerated.
    • Bootstrap Quartiles (Perc) - calculates the bootstrapped quartiles using percentiles 
  6. By viewing these tabs you will see the statistical output 
  7. If you would like to download your output, select the format and click download 
  8. Congratulations you have run a bootstrap categorical analysis


Written by: Aneya Scott