In [1]:

```
# Set up packages for lecture. Don't worry about understanding this code, but
# make sure to run it if you're following along.
import numpy as np
import babypandas as bpd
import pandas as pd
from matplotlib_inline.backend_inline import set_matplotlib_formats
import matplotlib.pyplot as plt
set_matplotlib_formats("svg")
plt.style.use('ggplot')
np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.set_option("display.max_rows", 7)
pd.set_option("display.max_columns", 8)
pd.set_option("display.precision", 2)
```

- Homework 3 is due
**tomorrow at 11:59PM**. - The Midterm Project is due
**Tuesday 2/14 at 11:59PM.** - The Midterm Exam is
**this Friday during your assigned lecture.** **No lab**assignment due on Saturday.

The Midterm Exam is in four days, on **Friday 2/10 during your assigned lecture**.

- It will be a 50 minute, on-paper, closed-notes exam. We will provide you with first 2 pages of the reference sheet.
- It will consist of multiple choice, fill-in-the-blank code, and short answer questions.
- Bring a pen/pencil/eraser and a photo ID. No scantron or blue book needed.
- No calculator, computers, notes, or other aids are allowed.
- We will use assigned seats and multiple exam versions. Seating assignments will be released soon.
**Today's material is on the midterm; Wednesday's is not.**

Simulation.

- Example: What's the probability of getting 60 or more heads if we flip 100 coins?
- Example: The "Monty Hall" Problem.

- What is the probability of getting 60 or more heads if we flip 100 coins?

- While we
*could*calculate it by hand (and will learn how to in future courses), we can also approximate it using the computer:- Figure out how to do one experiment (i.e., flip 100 coins).
- Run the experiment a bunch of times.
- Find the proportion of experiments in which the number of heads was 60 or more.

- This is how we'll use
**simulation**– to approximate a probability through computation.- The techniques we will introduce in today's lecture will appear in almost every lecture for the remainder of the quarter!

- To simulate, we need a way to perform a random experiment on the computer (e.g. flipping a coin, rolling a die).

- A helpful function is
`np.random.choice(options)`

.- The input,
`options`

, is a list or array to choose from. - The output is a random element in
`options`

. By default, all elements are equally likely to be chosen.

- The input,

In [2]:

```
# Simulate a fair coin flip
np.random.choice(['Heads', 'Tails'])
```

Out[2]:

'Heads'

In [3]:

```
# Simulate a roll of a die
np.random.choice(np.arange(1, 7))
```

Out[3]:

3

`np.random.choice(options, n)`

will return an array of `n`

randomly selected elements from `options`

.

In [4]:

```
# Simulate 10 fair coin flips
np.random.choice(['Heads', 'Tails'], 10)
```

Out[4]:

array(['Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails'], dtype='<U5')

- By default,
`np.random.choice`

selects**with**replacement. - That is, after making a selection, that option is still available.
- e.g. if every time you draw a marble from a bag, you put it back.

- If an option can only be selected once, select
**without**replacement by specifying`replace=False`

.- e.g. if every time you draw a marble from a bag, you do not put it back.

In [5]:

```
# Choose three colleges to win free HDH swag
colleges = ['Revelle', 'John Muir', 'Thurgood Marshall',
'Earl Warren', 'Eleanor Roosevelt', 'Sixth', 'Seventh']
np.random.choice(colleges, 3, replace=False)
```

Out[5]:

array(['Seventh', 'Revelle', 'Eleanor Roosevelt'], dtype='<U17')

What is the probability of getting 60 or more heads if we flip 100 coins?

**Strategy:**

- Figure out how to do one experiment (i.e., flip 100 coins).
- Run the experiment a bunch of times.
- Find the proportion of experiments in which the number of heads was 60 or more.

- Use
`np.random.choice`

to flip 100 coins. - Use
`np.count_nonzero`

to count the number of heads.`np.count_nonzero(array)`

returns the number of entries in`array`

that are`True`

.

In [6]:

```
coins = np.random.choice(['Heads', 'Tails'], 100)
coins
```

Out[6]:

array(['Tails', 'Tails', 'Tails', ..., 'Heads', 'Tails', 'Tails'], dtype='<U5')

In [7]:

```
(coins == 'Heads').sum()
```

Out[7]:

44

In [8]:

```
np.count_nonzero(coins == 'Heads') # counts the number of Trues in sequence
```

Out[8]:

44

**Q:**Why is it called`count_nonzero`

?**A:**In Python,`True == 1`

and`False == 0`

, so counting the non-zero elements counts the number of`True`

s.

It's a good idea to do this, as it makes it easier to run the experiment repeatedly.

In [9]:

```
def coin_experiment():
coins = np.random.choice(['Heads', 'Tails'], 100)
return np.count_nonzero(coins == 'Heads')
```

In [10]:

```
coin_experiment()
```

Out[10]:

52

- How do we run the same code many times?
**Using a**`for`

-loop! - Each time we run the experiment, we'll need to store the results in an array.
- To do this, we'll use
`np.append`

!

- To do this, we'll use

In [11]:

```
head_counts = np.array([])
head_counts
```

Out[11]:

array([], dtype=float64)

In [12]:

```
head_counts = np.append(head_counts, 15)
head_counts
```

Out[12]:

array([15.])

In [13]:

```
head_counts = np.append(head_counts, 25)
head_counts
```

Out[13]:

array([15., 25.])

In [14]:

```
# Specify the number of repetitions
repetitions = 10000
# Create an empty array to store the results
head_counts = np.array([])
for i in np.arange(repetitions):
# For each repetition, run the experiment and add the result to head_counts
head_count = coin_experiment()
head_counts = np.append(head_counts, head_count)
```

In [15]:

```
len(head_counts)
```

Out[15]:

10000

In [16]:

```
head_counts
```

Out[16]:

array([57., 44., 55., ..., 48., 44., 49.])

In [17]:

```
# In how many experiments was the number of heads >= 60?
at_least_60 = np.count_nonzero(head_counts >= 60)
at_least_60
```

Out[17]:

278

In [18]:

```
# What is this as a proportion?
at_least_60 / repetitions
```

Out[18]:

0.0278

In [19]:

```
# Can also use np.mean()! Why?
np.mean(head_counts >= 60)
```

Out[19]:

0.0278

This is quite close to the true theoretical answer!

In [20]:

```
# The theoretical answer – don't worry about how or why this code works
import math
sum([math.comb(100, i) * (1 / 2) ** 100 for i in np.arange(60, 101)])
```

Out[20]:

0.028443966820490392

In [21]:

```
bpd.DataFrame().assign(
Number_of_Heads=head_counts
).plot(kind='hist', bins=np.arange(30, 70), density=True, ec='w', figsize=(10, 5));
plt.axvline(60, color='C1');
```

- This histogram describes the distribution of the number of heads in each experiment.
- Now we see another reason to use density histograms.
- Using density means that areas approximate
**probabilities**.

- Using density means that areas approximate

Suppose you’re on a game show, and you’re given the choice of three doors: behind one door is a car 🚗; behind the others, goats 🐐🐐.

You pick a door, say No. 2, and the host,

**who knows what’s behind the doors**, opens another door, say No. 3, which has a goat.He then says to you, “Do you want to pick door No. 1?”

**Question:**Is it to your advantage to switch your choice?

*(The question was posed in Parade magazine’s "Ask Marilyn" column in 1990. It is called the "Monty Hall problem" because Monty Hall hosted a similar game show called "Let's Make a Deal.")*

You originally selected door #2. The host reveals door #3 to have a goat behind it. What should you do?

A. Might as well stick with door number #2; it has just as high a chance of winning as door #1. It doesn't matter whether you switch or not.

B. Switch to door number #1; it has a higher chance of winning than door #2.

- We'll use simulation to compute:
- The probability of winning if we switch.
- The probability of winning if we stay.

- Whichever strategy has the higher probability of winning is better!

Let's **simulate** the Monty Hall problem many times to **estimate** the probability of winning if we switch.

- Figure out how to simulate one game of Monty Hall.
- Play the game many times, switching each time.
- Count the proportion of wins.

Then we'll repeat the process for staying each time.

When you pick a door, there are three equally-likely outcomes:

- Car.
- Goat #1.
- Goat #2.

In [22]:

```
behind_picked_door = np.random.choice(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door
```

Out[22]:

'Goat #2'

When Monty opens a different door, he always reveals a goat.

In [23]:

```
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
revealed = np.random.choice(['Goat #1', 'Goat #2'])
revealed
```

Out[23]:

'Goat #1'

`behind_picked_door`

nor `revealed`

.

In [24]:

```
for prize in ['Car', 'Goat #1', 'Goat #2']:
if prize != behind_picked_door and prize != revealed:
your_prize = prize
your_prize
```

Out[24]:

'Car'

Let's turn this into a function to make it easier to repeat:

In [25]:

```
def simulate_switch_strategy():
behind_picked_door = np.random.choice(['Car', 'Goat #1', 'Goat #2'])
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
revealed = np.random.choice(['Goat #1', 'Goat #2'])
for prize in ['Car', 'Goat #1', 'Goat #2']:
if prize != behind_picked_door and prize != revealed:
your_prize = prize
#print(behind_picked_door, 'was behind the door.', revealed, 'was revealed by the host. Your prize was:', your_prize)
return your_prize
```

In [26]:

```
simulate_switch_strategy()
```

Out[26]:

'Goat #1'

We should save your prize in each game. To do so, let's use `np.append`

:

In [27]:

```
repetitions = 10000
your_prizes = np.array([])
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
your_prizes = np.append(your_prizes, your_prize)
```

In [28]:

```
your_prizes
```

Out[28]:

array(['Goat #2', 'Car', 'Goat #2', ..., 'Goat #1', 'Car', 'Car'], dtype='<U32')

In [29]:

```
your_prizes
```

Out[29]:

array(['Goat #2', 'Car', 'Goat #2', ..., 'Goat #1', 'Car', 'Car'], dtype='<U32')

In [30]:

```
np.count_nonzero(your_prizes == 'Car')
```

Out[30]:

6759

In [31]:

```
np.count_nonzero(your_prizes == 'Car') / repetitions
```

Out[31]:

0.6759

This is quite close to the true probability of winning if you switch, $\frac{2}{3}$.

- Looking back at our implementation, we kept track of your prize in each game.

- However, all we really needed to keep track of was the
**number of games**in which you won a car.

- Idea: Keep a
*tally*of the number of times you won a car. That is, initialize`car_count`

to 0, and add 1 to it each time your prize is a car.

In [32]:

```
car_count = 0
```

In [33]:

```
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
if your_prize == 'Car':
car_count = car_count + 1
```

In [34]:

```
car_count / repetitions
```

Out[34]:

0.6681

No arrays needed! This strategy won't always work; it depends on the goal of the simulation.

- In this case, your prize is always the same as what was behind the picked door.

In [35]:

```
car_count = 0
for i in np.arange(repetitions):
behind_picked_door = np.random.choice(['Car', 'Goat #1', 'Goat #2'])
your_prize = behind_picked_door
if your_prize == 'Car':
car_count = car_count + 1
car_count / repetitions
```

Out[35]:

0.3326

- This is quite close to the true probability of winning if you stay, $\frac{1}{3}$.

**Conclusion:**It's better to switch.

**Explanation:**- If you originally choose a goat, Monty will reveal the other goat, and you'll win the car by switching.
- If you originally choose a car, you'll win by staying.
- But there 2 goats and 1 car, so you win twice as often by switching.

- When asked this question by a reader, vos Savant stated the correct answer:
*switch*. - She received over 10,000 letters in disagreement, including over 1,000 letters from people with Ph.D.s.
- This became a nationwide controversy, even getting a front-page New York Times article in 1991.

- Calculating probabilities is important, but can be hard!
- You'll learn plenty of formulas in future DSC classes.

- Simulation lets us find probabilities through computing instead of math.
- Many real-world scenarios are complicated.
- Simulation is much easier than math in many of these cases.

To estimate the probability of an event through simulation:

- Make a function that runs the experiment once.
- Run that function many, many times (usually 10000) with a
`for`

-loop, and save the results in an array with`np.append`

. - Compute the proportion of times the event occurs using
`np.count_nonzero`

.

- In the next class, we will start talking about sampling.
- Key idea: We want to learn something about a large population (e.g. all undergraduates at UCSD). However, it's far too difficult to survey everyone. If we collect a sample, what can we infer about the larger population?

**Today's material is on the midterm; Wednesday's is not.**