Lecture 15 – Models and Viewpoints

DSC 10, Spring 2023



🚨 The second half of the course is more conceptual than the first. Reading the textbook (and coming to lecture) will become even more important.

Statistical models


A model is a set of assumptions about how data was generated.


Galileo's Leaning Tower of Pisa Experiment


Example: Jury selection

Swain vs. Alabama, 1965

$\substack{\text{eligible} \\ \text{population}} \xrightarrow{\substack{\text{representative} \\ \text{sample}}} \substack{\text{jury} \\ \text{panel}} \xrightarrow{\substack{\text{selection by} \\ \text{judge/attorneys}}} \substack{\text{actual} \\ \text{jury}}$

Supreme Court ruling

"... the overall percentage disparity has been small...”


Our approach: Simulation

Simulating statistics

Recall, a statistic is a number calculated from a sample.

Our plan:

  1. Run an experiment once to generate one value of our chosen statistic.
    • In this case, sample 100 people randomly from a population that is 26% Black, and count the number of Black men (statistic).
  1. Run the experiment many times, generating many values of the statistic, and store these statistics in an array.
  1. Visualize the resulting empirical distribution of the statistic.

Step 1 – Running the experiment once

np.random.multinomial(sample_size, pop_distribution)

Aside: Example usage of np.random.multinomial

On Halloween 👻 you'll trick-or-treat at 35 houses, each of which has an identical candy box, containing:

At each house, you'll select one candy blindly from the candy box.

To simulate the act of going to 35 houses, we can use np.random.multinomial:

Step 1 – Running the experiment once

In our case, a randomly selected member of our population is Black with probability 0.26 and not Black with probability 1 - 0.26 = 0.74.

Each time we run the following cell, we'll get a new random sample of 100 people from this population.

Step 1 – Running the experiment once

We also need to calculate the statistic, which in this case is the number of Black men in the random sample of 100.

Step 2 – Repeat the experiment many times

Step 3 – Visualize the resulting distribution

Was a jury panel with 8 Black men suspiciously unusual?


Example: Genetics of peas 🟢

Gregor Mendel, 1822-1884

Mendel is known as the father of genetics. Many of his experiments involved pea plants.

Mendel's model and observation

Choosing a statistic for simulation

$$| \text{sample proportion of plants with purple flowers} - 0.75 |$$

Simulating Mendel's experiment

Without context, these numbers aren't helpful – we need to see where the value of the statistic in Mendel's original observation lies in this distribution!

Mendel's experiment

Was Mendel's model any good?

Mendelian inheritance

Viewpoints and test statistics

Choosing one of two viewpoints

Goal: Choose between two views of the world, based on data in a sample.

Test statistics

How do we choose between viewpoints?

Step 1: Start by assuming one of the viewpoints is true.

Step 2: Simulate many samples according to that viewpoint.

Choosing between viewpoints

Step 3: Ask whether the observed value of the test statistic (black line) is consistent with the simulated distribution of the test statistic (red histogram).

Example: Is our coin fair?

Example: Is our coin fair?

Let's put these values in an array, since our simulations will also result in arrays.

Designing a test statistic for a pair of viewpoints

Let's consider the pair of viewpoints “This coin is fair.” OR “This coin is not fair."

Simulating a fair coin

Concept Check ✅ – Answer at cc.dsc10.com

Let's now consider the pair of viewpoints “This coin is fair.” OR “This coin is biased towards heads.” Which test statistic would be appropriate?

Another pair of viewpoints

Simulating a fair coin, again

All that will change from our previous simulation is the function we use to compute our test statistic.

Questions to consider before choosing a test statistic

Summary, next time


Next time