Lecture 18 – Permutation Testing, Bootstrapping

DSC 10, Fall 2022



Permutation testing


Permutation tests help answer questions of the form:

I have two samples, but no information about any population distributions. Do these samples look like they were drawn from the same population?

Smoking and birth weight 👶

Setup for the hypothesis test

Strategy and implementation

Shuffling the labels

The 'Maternal Smoker' column defines the original groups. The 'Shuffed_Labels' column defines the random groups.

Calculating the test statistic

For the original groups:

For the random groups:

Repeating the process

Comparing the empirical distribution to the observed statistic


Concept Check ✅ – Answer at cc.dsc10.com

Recall, babies has two columns.

To randomly assign weights to groups, we shuffled 'Maternal Smoker' column. Could we have shuffled the 'Birth Weight' column instead?

Click here to see the answer to the previous question after you've submitted an answer to it. Yes, we could have. It doesn’t matter which column we shuffle – we could shuffle one or the other, or even both, as long as we shuffle each separately. Think about it like this – pretend you bring a gift 🎁 to a Christmas party 🎄 for a gift exchange, where everyone must leave the party with a random person’s gift. Pretend everyone stands around a circular table and puts the gift they bought in front of them. To randomly assign people to gifts, you could shuffle the gifts on the table and have all the people stay in the same spot, or you could have the people physically shuffle and keep the gifts in the same spots, or you could do both – either way, everyone will end up with a random gift!

Example: Did the New England Patriots cheat? 🏈


The measurements

The question

Did the Patriots' footballs drop in pressure more than the Colts'?

The test statistic

Similar to the baby weights example, our test statistic will be the difference between the teams' average pressure drops. We'll calculate the mean drop for the 'Patriots' minus the mean drop for the 'Colts'.

The average pressure drop for the Patriots was about 0.74 psi more than the Colts.

Creating random groups and calculating one value of the test statistic

We'll run a permutation test to see if 0.74 psi is a significant difference.

The simulation


It doesn't look good for the Patriots. What is the p-value?

This p-value is low enough to consider this result to be highly statistically significant ($p<0.01$).

Caution! ⚠️


Quote from an investigative report commissioned by the NFL:

“[T]he average pressure drop of the Patriots game balls exceeded the average pressure drop of the Colts balls by 0.45 to 1.02 psi, depending on various possible assumptions regarding the gauges used, and assuming an initial pressure of 12.5 psi for the Patriots balls and 13.0 for the Colts balls.”

Aside: Establishing causation

To actually establish causation, we need the following two statements to be true:

  1. The data must come from a randomized controlled trial, to mitigate the effects of confounding factors.
  1. A permutation test must show a statistically significant difference in the outcome between the treatment and control group.

If both of these conditions are met, then we can conclude that the treatment causes the outcome.

Bootstrapping 🥾

City of San Diego employee salary data

All City of San Diego employee salary data is public. We are using the latest available data.

When you load in a dataset that has so many columns that you can't see them all, it's a good idea to look at the column names.

We only need the 'TotalWages' column, so let's get just that column.

Concept Check ✅ – Answer at cc.dsc10.com

Consider the question

What is the median salary of all San Diego city employees?

What is the right tool to answer this question?

The median salary

Let's be realistic...

In the language of statistics

The sample median

Let's survey 500 employees at random. To do so, we can use the .sample method.

We won't reassign my_sample at any point in this notebook, so it will always refer to this particular sample.

How confident are we that this is a good estimate?

The sample median is random

An impractical approach