Lecture 17 – TVD, Permutation Testing

DSC 10, Spring 2023


Remember, this lecture will not be delivered live! Instead, the corresponding videos have been posted here.


Total variation distance

Recall: Jury panels in Alameda County

We have two distributions:

Total variation distance

The Total Variation Distance (TVD) of two categorical distributions is the sum of the absolute differences of their proportions, all divided by 2.

One way of interpreting it is as the total overrepresentation across all categories.

Concept Check ✅ – Answer at cc.dsc10.com

What is the TVD between the distributions of class standing in DSC 10 and DSC 40A?

Class Standing DSC 10 DSC 40A
Freshman 0.45 0.15
Sophomore 0.35 0.35
Junior 0.15 0.35
Senior+ 0.05 0.15

Simulate drawing jury panels

Note: np.random.multinomial creates samples drawn with replacement, even though real jury panels would be drawn without replacement. However, when the sample size (1453) is small relative to the population (number of people in Alameda County), the resulting distributions will be roughly the same whether we sample with or without replacement.

The experiment

We need to repeat the process of drawing a sample and computing the total variation distance many, many times.

Repeating the experiment

Calculating the p-value

Are the jury panels representative?

Motivating A/B testing


So far, we've used hypothesis tests to answer questions of the form:

I have a population distribution, and I have one sample. Does this sample look like it was drawn from the population?

Looking forward

Consider the following form of question:

I have two samples, but no information about any population distributions. Do these samples look like they were drawn from the same population?

We can't use hypothesis testing to answer such questions yet, because all of our hypothesis tests have relied on knowing the population distribution. But what if you don't know the population distribution?

These questions are answered through A/B testing. Permutation testing is one type of A/B testing.

2008 Obama Campaign

Button choices

The winner

It is estimated that this combination of image and button brought in an additional 60 million dollars in donations versus the original version of the site.

Example: Smoking and birth weight 👶

Smoking and birth weight

Note: The 'Birth Weight' column is measured in ounces; 100 ounces = 6.25 pounds.

Visualizing the distribution of each group

What do you notice? 👀

The question

Setting up a hypothesis test

Discussion Question

We recently introduced the total variation distance (TVD) as a test statistic. Why can't we use the TVD as our test statistic in this hypothesis test?

Test statistic: the difference in group means

The test statistic we'll use is the difference in group means:

$$\substack{\text{mean birth weight of} \\ \text{non-smokers' babies}} \hspace{0.5in} - \hspace{0.5in} \substack{\text{mean birth weight of} \\ \text{smokers' babies}}$$

Note that large values of this test statistic favor the alternative hypothesis.

Let's compute the observed statistic:

Setting up a hypothesis test

Generating new samples under the null hypothesis

Constructing a population


Permutation tests

A permutation test is a type of A/B test (and a type of hypothesis test). It tests whether two samples come from the same population distribution. To conduct a permutation test:

  1. Shuffle the group labels (i.e. the Trues and Falses) to generate two new samples under the null.
  1. Compute the difference in group means (the test statistic).
  1. Repeat steps 1 and 2 to generate an empirical distribution of the difference in group means.
  1. See where the observed statistic lies in the empirical distribution. If, in our simulations, we rarely saw a difference in group means as extreme as the observed difference in group means, we have evidence to reject the null.

Permutation tests with DataFrames

Shuffling one column

As mentioned before, we'll shuffle the 'Maternal Smoker' column.

Let's look at the distributions of the two new samples we just generated.

What do you notice? 👀

How close are the means of the shuffled groups?

This is the test statistic for one experiment (one "shuffle"). Let's write a function that can compute this test statistic for any shuffle.


Running the simulation

Conclusion of the test

Where does our observed statistic lie?


Concept Check ✅ – Answer at cc.dsc10.com

Recall, babies has two columns.

To randomly assign weights to groups, we shuffled 'Maternal Smoker' column. Could we have shuffled the 'Birth Weight' column instead?

Click here to see the answer to the previous question after you've submitted an answer to it. Yes, we could have. It doesn’t matter which column we shuffle – we could shuffle one or the other, or even both, as long as we shuffle each separately. Think about it like this – pretend you bring a gift 🎁 to a Christmas party 🎄 for a gift exchange, where everyone must leave the party with a random person’s gift. Pretend everyone stands around a circular table and puts the gift they bought in front of them. To randomly assign people to gifts, you could shuffle the gifts on the table and have all the people stay in the same spot, or you could have the people physically shuffle and keep the gifts in the same spots, or you could do both – either way, everyone will end up with a random gift!

Summary, next time


A/B testing

Next time