# Lecture 15 – Models and Viewpoints¶

## DSC 10, Fall 2022¶

### Announcements¶

• The Midterm Project is due Tuesday 11/1 at 11:59PM. Use pair programming 👯. See this post for clarifications.
• The Midterm Exam is this Friday during lecture. See this post for lots of details, including how to find your assigned seat, what to bring, and how to study.
• Check the calendar for the updated office hours schedule.
• Janine and Suraj are holding OH from 7-9PM tomorrow in the SDSC Auditorium. Come with questions from past midterms!

### Agenda¶

• Statistical models.
• Example: Jury selection.
• Example: Genetics of peas. 🟢
• Viewpoints and test statistics.
• Example: Is our coin fair?

## Statistical models¶

### Models¶

• A model is a set of assumptions about how data was generated.
• We want a way to assess the quality of a given model.

### Example¶

Galileo's Leaning Tower of Pisa Experiment

## Example: Jury selection¶

### Swain vs. Alabama, 1965¶

• Robert Swain was a Black man convicted of crime in Talladega County, Alabama.
• He appealed the jury's decision all the way to the Supreme Court, on the grounds that Talladega County systematically excluded Black people from juries.
• At the time, only men 21 years or older were allowed to serve on juries. 26% of this eligible population was Black.
• But of the 100 men on Robert Swain's jury panel, only 8 were Black.
$\substack{\text{eligible} \\ \text{population}} \xrightarrow{\substack{\text{representative} \\ \text{sample}}} \substack{\text{jury} \\ \text{panel}} \xrightarrow{\substack{\text{selection by} \\ \text{judge/attorneys}}} \substack{\text{actual} \\ \text{jury}}$

### Supreme Court ruling¶

• About disparities between the percentages in the eligible population and the jury panel, the Supreme Court wrote:

"... the overall percentage disparity has been small...”

• The Supreme Court denied Robert Swain’s appeal and he was sentenced to life in prison.
• We now have the tools to show quantitatively that the Supreme Court's claim was misguided.
• This "overall percentage disparity" turns out to be not so small, and is an example of racial bias.
• Jury panels were often made up of people in the jury commissioner's professional and social circles.
• Of the 8 Black men on the jury panel, none were selected to be part of the actual jury.

### Our model for simulating Swain's jury panel¶

• We will assume the jury panel consists of 100 men, randomly chosen from a population that is 26% Black.
• Our question: is this model (i.e. assumption) right or wrong?

### Our approach: simulation¶

• We'll start by assuming that this model is true.
• We'll generate many jury panels using this assumption.
• We'll count the number of Black men in each simulated jury panel to see how likely it is for a random panel to contain 8 or fewer Black men.

### Simulating statistics¶

Recall, a statistic is a number calculated from a sample.

1. Run an experiment once to generate one value of a statistic.
• In this case, sample 100 people randomly from a population that is 26% Black, and count the number of Black men (statistic).
1. Run the experiment many times, generating many values of the statistic, and store these statistics in an array.
1. Visualize the resulting empirical distribution of the statistic.

### Step 1 – Running the experiment once¶

• How do we randomly sample a jury panel?
• np.random.choice won't help us, because we don't know how large the eligible population is.
• The function np.random.multinomial helps us sample at random from a categorical distribution.
np.random.multinomial(sample_size, pop_distribution)

• np.random.multinomial samples at random from the population, with replacement, and returns a random array containing counts in each category.
• pop_distribution needs to be an array containing the probabilities of each category.

Aside: Example usage of np.random.multinomial

Halloween is on Monday, and you're getting ready to go trick-or-treating 👻. Suppose you'll visit 35 houses, and that each of the 35 houses you'll visit has the same candy box, containing:

• 30% Starbursts.
• 30% Sour Patch Kids.
• 40% Twix.

At each house, you'll select one candy blindly from the candy box.

To simulate the act of going to 35 houses, we can use np.random.multinomial:

### Step 1 – Running the experiment once¶

In our case, a randomly selected member of our population is Black with probability 0.26 and not Black with probability 1 - 0.26 = 0.74.

Each time we run the following cell, we'll get a new random sample of 100 people from this population.

• The first element of the resulting array is the number of Black men in the sample.
• The second element is the number of non-Black men in the sample.

### Step 1 – Running the experiment once¶

We also need to calculate the statistic, which in this case is the number of Black men in the random sample of 100.

### Step 2 – Repeat the experiment many times¶

• Let's run 10,000 simulations.
• We'll keep track of the number of Black men in each simulated jury panel in the array counts.

### Step 3 – Visualize the resulting distribution¶

Was a jury panel with 8 Black men suspiciously unusual?

### Conclusion¶

• Our simulation shows that there's essentially no chance that a random sample of 100 men drawn from a population in which 26% of men are Black will contain 8 or fewer Black men.
• As a result, it seems that the model we proposed – that the jury panel was drawn at random from the eligible population – is flawed.
• There were likely factors other than chance that explain why there were only 8 Black men on the jury panel.