# Set up packages for lecture. Don't worry about understanding this code, but # make sure to run it if you're following along. import numpy as np import babypandas as bpd import pandas as pd from matplotlib_inline.backend_inline import set_matplotlib_formats import matplotlib.pyplot as plt set_matplotlib_formats("svg") plt.style.use('ggplot') np.set_printoptions(threshold=20, precision=2, suppress=True) pd.set_option("display.max_rows", 7) pd.set_option("display.max_columns", 8) pd.set_option("display.precision", 2)
"... the overall percentage disparity has been small...”
np.random.choicewon't help us, because we don't know how large the eligible population is.
np.random.multinomialhelps us sample at random from a categorical distribution.
np.random.multinomialsamples at random from the population, with replacement, and returns a random array containing counts in each category.
pop_distributionneeds to be an array containing the probabilities of each category.
Aside: Example usage of
On Halloween 👻 you'll trick-or-treat at 35 houses, each of which has an identical candy box, containing:
At each house, you'll select one candy blindly from the candy box.
To simulate the act of going to 35 houses, we can use
np.random.multinomial(35, [0.3, 0.3, 0.4])
array([10, 11, 14])
In our case, a randomly selected member of our population is Black with probability 0.26 and not Black with probability 1 - 0.26 = 0.74.
demographics = [0.26, 0.74]
Each time we run the following cell, we'll get a new random sample of 100 people from this population.
We also need to calculate the statistic, which in this case is the number of Black men in the random sample of 100.
counts = np.array() for i in np.arange(10000): new_count = np.random.multinomial(100, demographics) counts = np.append(counts, new_count)
array([27., 28., 25., ..., 27., 20., 22.])
Was a jury panel with 8 Black men suspiciously unusual?
(bpd.DataFrame().assign(count_black_men=counts) .plot(kind='hist', bins = np.arange(9.5, 45, 1), density=True, ec='w', figsize=(10, 5), title='Empiricial Distribution of the Number of Black Men in Simulated Jury Panels of Size 100')); observed_count = 8 plt.axvline(observed_count, color='black', linewidth=4, label='Observed Number of Black Men in Actual Jury Panel') plt.legend();
# In 10,000 random experiments, the panel with the fewest Black men had how many? counts.min()