In [1]:

```
# Set up packages for lecture. Don't worry about understanding this code, but
# make sure to run it if you're following along.
import numpy as np
import babypandas as bpd
import pandas as pd
from matplotlib_inline.backend_inline import set_matplotlib_formats
import matplotlib.pyplot as plt
set_matplotlib_formats("svg")
plt.style.use('ggplot')
np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.set_option("display.max_rows", 7)
pd.set_option("display.max_columns", 8)
pd.set_option("display.precision", 2)
```

- Midterm Exam scores are available. See this post for details.
- Only worth 10%. Take it as a learning experience!

- The Midterm Project is due
**tomorrow at 11:59PM**.- Slip days can be used if needed. Will detract from both partner's allocation.
- Only
**one**partner should submit and "Add Group Member" on Gradescope.

- Lab 5 is due
**Saturday 2/18 at 11:59PM**.

- Statistical models.
- Example: Jury selection.
- Example: Genetics of peas. 🟢
- Viewpoints and test statistics.
- Example: Is our coin fair?

- A model is a set of assumptions about how data was generated.

- We want a way to assess the quality of a given model.

- Robert Swain was a Black man convicted of crime in Talladega County, Alabama.

- But of the 100 men on Robert Swain's jury panel, only 8 were Black.

- About disparities between the percentages in the eligible population and the jury panel, the Supreme Court wrote:

"... the overall percentage disparity has been small...”

- The Supreme Court denied Robert Swain’s appeal and he was sentenced to life in prison.

- We now have the tools to show
**quantitatively**that the Supreme Court's claim was misguided.

- This "overall percentage disparity" turns out to be not so small, and is an example of racial bias.
- Jury panels were often made up of people in the jury commissioner's professional and social circles.
- Of the 8 Black men on the jury panel,
**none**were selected to be part of the actual jury.

- We will
**assume**the jury panel consists of 100 men,**randomly**chosen from a population that is 26% Black.

**Our question: is this model (i.e. assumption) right or wrong?**

- We'll start by assuming that this model is true.

- We'll generate many jury panels using this assumption.

Recall, a *statistic* is a number calculated from a sample.

- Run an experiment once to generate one value of a statistic.
- In this case, sample 100 people randomly from a population that is 26% Black, and count
**the number of Black men (statistic)**.

- In this case, sample 100 people randomly from a population that is 26% Black, and count

- Visualize the resulting
**empirical distribution of the statistic**.

- How do we randomly sample a jury panel?
`np.random.choice`

won't help us, because we don't know how large the eligible population is.

- The function
`np.random.multinomial`

helps us sample at random from a**categorical distribution**.

```
np.random.multinomial(sample_size, pop_distribution)
```

`np.random.multinomial`

samples at random from the population,**with replacement**, and returns a random array containing counts in each category.`pop_distribution`

needs to be an array containing the probabilities of each category.

**Aside: Example usage of np.random.multinomial**

On Halloween 👻 you'll trick-or-treat at 35 houses, each of which has an identical candy box, containing:

- 30% Starbursts.
- 30% Sour Patch Kids.
- 40% Twix.

At each house, you'll select one candy blindly from the candy box.

To simulate the act of going to 35 houses, we can use `np.random.multinomial`

:

In [2]:

```
np.random.multinomial(35, [0.3, 0.3, 0.4])
```

Out[2]:

array([10, 11, 14])

In [3]:

```
demographics = [0.26, 0.74]
```

Each time we run the following cell, we'll get a new random sample of 100 people from this population.

- The first element of the resulting array is the number of Black men in the sample.
- The second element is the number of non-Black men in the sample.

In [4]:

```
np.random.multinomial(100, demographics)
```

Out[4]:

array([26, 74])

We also need to calculate the statistic, which in this case is the number of Black men in the random sample of 100.

In [5]:

```
np.random.multinomial(100, demographics)[0]
```

Out[5]:

22

- Let's run 10,000 simulations.
- We'll keep track of the number of Black men in each simulated jury panel in the array
`counts`

.

In [6]:

```
counts = np.array([])
for i in np.arange(10000):
new_count = np.random.multinomial(100, demographics)[0]
counts = np.append(counts, new_count)
```

In [7]:

```
counts
```

Out[7]:

array([27., 28., 25., ..., 27., 20., 22.])

Was a jury panel with 8 Black men suspiciously unusual?

In [8]:

```
(bpd.DataFrame().assign(count_black_men=counts)
.plot(kind='hist', bins = np.arange(9.5, 45, 1),
density=True, ec='w', figsize=(10, 5),
title='Empiricial Distribution of the Number of Black Men in Simulated Jury Panels of Size 100'));
observed_count = 8
plt.axvline(observed_count, color='black', linewidth=4, label='Observed Number of Black Men in Actual Jury Panel')
plt.legend();
```

In [9]:

```
# In 10,000 random experiments, the panel with the fewest Black men had how many?
counts.min()
```

Out[9]:

11.0

- Our simulation shows that there's essentially no chance that a random sample of 100 men drawn from a population in which 26% of men are Black will contain 8 or fewer Black men.
- As a result, it seems that the model we proposed – that the jury panel was drawn at random from the eligible population – is flawed.
- There were likely factors
**other than chance**that explain why there were only 8 Black men on the jury panel.