In [1]:

```
# Set up packages for lecture. Don't worry about understanding this code,
# but make sure to run it if you're following along.
import numpy as np
import babypandas as bpd
from matplotlib_inline.backend_inline import set_matplotlib_formats
import matplotlib.pyplot as plt
set_matplotlib_formats("svg")
plt.style.use('ggplot')
```

- Lab 3 is due
**tomorrow at 11:59PM**. - Homework 3 is due on
**Tuesday 5/2 at 11:59PM**. - The Midterm Project is due on
**Tuesday, 5/9 at 11:59PM**.- Start early!
- Post in this thread on Ed if you're still looking for a partner.

- Check out this post on Ed for explanations about common misconceptions from office hours.
- The Midterm Exam is on
**Friday 5/5 during your assigned lecture**.

The Midterm Exam is on **Friday 5/5 during your assigned lecture**.

- It will be a 50 minute, on-paper, closed-notes exam. We will provide you with first 2 pages of the reference sheet.
- It will consist of multiple choice, fill-in-the-blank code, and short answer questions.
- Bring a pen/pencil/eraser and a photo ID. No scantron or blue book needed.
- No calculator, computers, notes, or other aids are allowed.
- You will be assigned a specific seat sometime next week.
**Today's material is on the midterm; next week's is not.**- 🚨
**Look at past midterm exams at practice.dsc10.com!**

Simulations.

- Example: What's the probability of getting 60 or more heads if we flip 100 coins?
- Example: The "Monty Hall" Problem.

- What is the probability of getting 60 or more heads if we flip 100 coins?

- While we
*could*calculate it by hand (and will learn how to in future courses), we can also**estimate**it using the computer:- Figure out how to run the experiment (flipping 100 coins) once.
- Repeat the experiment many times.
- Find the proportion of experiments in which the number of heads was 60 or more.

- This is how we'll use
**simulations**– to**estimate**, or approximate, a probability through computation.- The techniques we will introduce in today's lecture will appear in almost every lecture for the remainder of the quarter!

- To simulate, we need a way to perform a random experiment on the computer (e.g. flipping a coin, rolling a die).

- A helpful function is
`np.random.choice(options)`

.- The input,
`options`

, is a list or array to choose from. - The output is a random element in
`options`

. By default, all elements are equally likely to be chosen.

- The input,

In [2]:

```
# Simulate a fair coin flip.
np.random.choice(['Heads', 'Tails'])
```

Out[2]:

'Heads'

In [3]:

```
# Simulate a roll of a die.
np.random.choice(np.arange(1, 7))
```

Out[3]:

1

`np.random.choice(options, n)`

will return an array of `n`

randomly selected elements from `options`

.

In [4]:

```
# Simulate 10 fair coin flips.
np.random.choice(['Heads', 'Tails'], 10)
```

Out[4]:

array(['Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails'], dtype='<U5')

- By default,
`np.random.choice`

selects**with**replacement. - That is, after making a selection, that option is still available.
- e.g. if every time you draw a marble from a bag, you put it back.

- If an option can only be selected once, select
**without**replacement by specifying`replace=False`

.- e.g. if every time you draw a marble from a bag, you do not put it back.

In [5]:

```
# Choose three colleges to win free HDH swag.
colleges = ['Revelle', 'John Muir', 'Thurgood Marshall',
'Earl Warren', 'Eleanor Roosevelt', 'Sixth', 'Seventh']
np.random.choice(colleges, 3, replace=False)
```

Out[5]:

array(['Thurgood Marshall', 'Sixth', 'Eleanor Roosevelt'], dtype='<U17')

What is the probability of getting 60 or more heads if we flip 100 coins?

**Plan**:

- Figure out how to run the experiment (flipping 100 coins) once.
- Repeat the experiment many times.
- Find the proportion of experiments in which the number of heads was 60 or more.

- Use
`np.random.choice`

to flip 100 coins. - Use
`np.count_nonzero`

to count the number of heads.`np.count_nonzero(array)`

returns the number of entries in`array`

that are`True`

.

In [6]:

```
coins = np.random.choice(['Heads', 'Tails'], 100)
coins
```

Out[6]:

array(['Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads'], dtype='<U5')

In [7]:

```
coins == 'Heads'
```

Out[7]:

array([ True, False, False, False, False, False, False, True, True, False, True, True, False, False, True, True, False, True, True, False, False, False, True, True, True, False, False, False, False, True, False, False, True, True, False, True, False, False, True, True, False, True, False, True, True, False, True, False, True, False, False, False, False, False, False, False, False, False, True, False, True, True, False, False, False, True, True, True, True, False, False, False, True, False, True, False, False, True, True, True, False, False, True, True, True, False, True, True, False, False, False, False, False, True, True, True, True, True, False, True])

In [8]:

```
(coins == 'Heads').sum()
```

Out[8]:

46

In [9]:

```
np.count_nonzero(coins == 'Heads') # Counts the number of Trues in the sequence.
```

Out[9]:

46

In [10]:

```
np.count_nonzero([5, 6, 0, 2])
```

Out[10]:

3

**Question**: Why is it called`count_nonzero`

?**Answer**: In Python,`True == 1`

and`False == 0`

, so counting the non-zero elements counts the number of`True`

s.

This makes it easy to run the experiment repeatedly.

In [11]:

```
def coin_experiment():
coins = np.random.choice(['Heads', 'Tails'], 100)
return np.count_nonzero(coins == 'Heads')
```

In [12]:

```
coin_experiment()
```

Out[12]:

59

- How do we run a piece of code many times?
**Using a**`for`

-loop! - Each time we run the experiment, we'll need to store the results in an array.
- To do this, we'll use
`np.append`

!

- To do this, we'll use

In [13]:

```
head_counts = np.array([])
head_counts
```

Out[13]:

array([], dtype=float64)

In [14]:

```
head_counts = np.append(head_counts, 15)
head_counts
```

Out[14]:

array([15.])

In [15]:

```
head_counts = np.append(head_counts, 25)
head_counts
```

Out[15]:

array([15., 25.])

- Imagine we start with a blank sheet of paper, and each time we run the experiment, we write the number of heads we see down on the sheet of paper.
- The sheet will start off empty, but eventually will have one number for each time we ran the experiment.

In [16]:

```
# Specify the number of repetitions.
repetitions = 10000
# Create an empty array to store the results.
head_counts = np.array([])
for i in np.arange(repetitions):
# For each repetition, run the experiment and add the result to head_counts.
head_count = coin_experiment()
head_counts = np.append(head_counts, head_count)
```

In [17]:

```
len(head_counts)
```

Out[17]:

10000

In [18]:

```
head_counts
```

Out[18]:

array([49., 40., 43., ..., 57., 46., 57.])

In [19]:

```
# In how many experiments was the number of heads >= 60?
at_least_60 = sum(head_counts >= 60)
at_least_60
```

Out[19]:

271

In [20]:

```
# What is this as a proportion?
at_least_60 / repetitions
```

Out[20]:

0.0271

In [21]:

```
# Can also use np.mean()! Why?
np.mean(head_counts >= 60)
```

Out[21]:

0.0271

This is quite close to the true theoretical answer!

In [22]:

```
# The theoretical answer – don't worry about how or why this code works.
import math
sum([math.comb(100, i) * (1 / 2) ** 100 for i in np.arange(60, 101)])
```

Out[22]:

0.028443966820490392

In [23]:

```
head_counts
```

Out[23]:

array([49., 40., 43., ..., 57., 46., 57.])

In [24]:

```
bpd.DataFrame().assign(
Number_of_Heads=head_counts
).plot(kind='hist', bins=np.arange(30, 70), density=True, ec='w', figsize=(10, 5));
plt.axvline(60, color='C1', linewidth=4);
```

- This histogram describes the distribution of the number of heads in each experiment.
- Now we see another reason to use density histograms.
- Using density means that areas are
**probabilities**.

- Using density means that areas are
- Next class, we'll learn more about
*why*it's valid to estimate probabilities using simulations.

Suppose you’re on a game show, and you’re given the choice of three doors. A car 🚗 is behind one of the doors, and goats 🐐🐐 are behind the other two.

You pick a door, say Door #2, and the host,

**who knows what’s behind the doors**, opens another door, say Door #3, which has a goat.The host then says to you, “Do you want to switch to Door #1 or stay with Door #2?”

**Question**: Should you stay or switch?

*(The question was posed in Parade magazine’s "Ask Marilyn" column in 1990. It is called the "Monty Hall problem" because Monty Hall hosted a similar game show called "Let's Make a Deal.")*

In [25]:

```
from IPython.display import IFrame
IFrame('https://montyhall.io/', width=600, height=400)
```

Out[25]:

Suppose you originally selected Door #2. The host reveals Door #3 to have a goat behind it. What should you do?

A. **Stay** with Door #2; it has just as high a chance of winning as Door #1. It doesn't matter whether you switch or not.

B. **Switch** to Door #1; it has a higher chance of winning than Door #2.

- Let's
**estimate**the probability of winning if you switch.

- If it's higher than 50%, then switching is the better strategy, otherwise staying is the better strategy.

**Plan**:

- Figure out how to simulate a single game.
- Play the game many times, switching each time.
- Compute the proportion of wins.

When you pick a door, there are three equally-likely outcomes:

- Car.
- Goat #1.
- Goat #2.

In [26]:

```
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
behind_picked_door
```

Out[26]:

'Car'

When the host opens a different door, they always reveal a goat.

In [27]:

```
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
# This is the case in which you originally picked a car!
revealed = np.random.choice(['Goat #1', 'Goat #2'])
revealed
```

Out[27]:

'Goat #1'

If you always switch, you'll end up winning the prize that is neither `behind_picked_door`

nor `revealed`

.

In [28]:

```
options
```

Out[28]:

array(['Car', 'Goat #1', 'Goat #2'], dtype='<U7')

In [29]:

```
behind_picked_door
```

Out[29]:

'Car'

In [30]:

```
revealed
```

Out[30]:

'Goat #1'

In [31]:

```
your_prize = options[(options != behind_picked_door) & (options != revealed)][0]
your_prize
```

Out[31]:

'Goat #2'

Let's put all of our work into a single function to make it easier to repeat.

In [32]:

```
def simulate_switch_strategy():
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
revealed = np.random.choice(['Goat #1', 'Goat #2'])
your_prize = options[(options != behind_picked_door) & (options != revealed)][0]
#print(behind_picked_door, 'was behind the door.', revealed, 'was revealed by the host. Your prize was:', your_prize)
return your_prize
```

Now, every time we call `simulate_switch_strategy`

, the result is your prize.

In [33]:

```
simulate_switch_strategy()
```

Out[33]:

'Car'

We should save your prize in each game; to do so, we'll use `np.append`

.

In [34]:

```
repetitions = 10000
your_prizes = np.array([])
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
your_prizes = np.append(your_prizes, your_prize)
```

In [35]:

```
your_prizes
```

Out[35]:

array(['Car', 'Car', 'Car', ..., 'Car', 'Goat #1', 'Car'], dtype='<U32')

In [36]:

```
your_prizes
```

Out[36]:

array(['Car', 'Car', 'Car', ..., 'Car', 'Goat #1', 'Car'], dtype='<U32')

In [37]:

```
np.count_nonzero(your_prizes == 'Car')
```

Out[37]:

6684

In [38]:

```
np.count_nonzero(your_prizes == 'Car') / repetitions
```

Out[38]:

0.6684

This is quite close to the true probability of winning if you switch, $\frac{2}{3}$.

- Looking back at our implementation, we kept track of your prize in each game.

- However, all we really needed to keep track of was the
**number of games**in which you won a car.

**💡 Idea**: Keep a*tally*of the number of times you won a car. That is, initialize`car_count`

to 0, and add 1 to it each time your prize is a car.

In [39]:

```
car_count = 0
```

In [40]:

```
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
if your_prize == 'Car':
car_count = car_count + 1
```

In [41]:

```
car_count / repetitions
```

Out[41]:

0.6634

No arrays needed! This strategy won't always work; it depends on the goal of the simulation.

In this case, your prize is always the same as what was behind the picked door.

In [42]:

```
car_count = 0
for i in np.arange(repetitions):
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
your_prize = behind_picked_door
if your_prize == 'Car':
car_count = car_count + 1
car_count / repetitions
```

Out[42]:

0.3364

- This is quite close to the true probability of winning if you stay, $\frac{1}{3}$.

**Conclusion**: It's better to switch.

**Why?**- If you originally choose a goat, Monty will reveal the other goat, and you'll win the car by switching.
- If you originally choose a car, you'll win by staying.
- But there are 2 goats and only 1 car, so you win twice as often by switching.

- When asked this question by a reader, vos Savant stated the correct answer:
*switch*. - She received over 10,000 letters in disagreement, including over 1,000 letters from people with Ph.D.s.
- This became a nationwide controversy, even getting a front-page New York Times article in 1991.

- Calculating probabilities is important, but can be hard!
- You'll learn plenty of formulas in future DSC classes, if you end up taking them.

- Simulations let us find probabilities through code rather than through math.
- Many real-world scenarios are complicated.
- Simulations are much easier than math in many of these cases.

To estimate the probability of an event through simulation:

- Make a function that runs the experiment once.
- Run that function many times (usually 10,000) with a
`for`

-loop, and save the results in an array with`np.append`

. - Compute the proportion of times the event occurs using
`np.count_nonzero`

.

- In the next class, we will start talking about sampling.
- Key idea: We want to learn something about a large population (e.g. all undergraduates at UCSD). However, it's far too difficult to survey everyone. If we collect a sample, what can we infer about the larger population?

**Next week's lecture material is not on the midterm.**