In [1]:
# Set up packages for lecture. Don't worry about understanding this code, but
# make sure to run it if you're following along.
import numpy as np
import babypandas as bpd
import pandas as pd
from matplotlib_inline.backend_inline import set_matplotlib_formats
import matplotlib.pyplot as plt
set_matplotlib_formats("svg")
plt.style.use('ggplot')

np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.set_option("display.max_rows", 7)
pd.set_option("display.max_columns", 8)
pd.set_option("display.precision", 2)

Lecture 12 – Simulation¶

DSC 10, Winter 2023¶

Announcements¶

  • Homework 3 is due tomorrow at 11:59PM.
  • The Midterm Project is due Tuesday 2/14 at 11:59PM.
  • The Midterm Exam is this Friday during your assigned lecture.
  • No lab assignment due on Saturday.

Midterm Exam details¶

The Midterm Exam is in four days, on Friday 2/10 during your assigned lecture.

  • It will be a 50 minute, on-paper, closed-notes exam. We will provide you with first 2 pages of the reference sheet.
  • It will consist of multiple choice, fill-in-the-blank code, and short answer questions.
  • Bring a pen/pencil/eraser and a photo ID. No scantron or blue book needed.
  • No calculator, computers, notes, or other aids are allowed.
  • We will use assigned seats and multiple exam versions. Seating assignments will be released soon.
  • Today's material is on the midterm; Wednesday's is not.

Agenda¶

Simulation.

  • Example: What's the probability of getting 60 or more heads if we flip 100 coins?
  • Example: The "Monty Hall" Problem.

Simulation¶

Simulation¶

  • What is the probability of getting 60 or more heads if we flip 100 coins?
  • While we could calculate it by hand (and will learn how to in future courses), we can also approximate it using the computer:
    1. Figure out how to do one experiment (i.e., flip 100 coins).
    2. Run the experiment a bunch of times.
    3. Find the proportion of experiments in which the number of heads was 60 or more.
  • This is how we'll use simulation – to approximate a probability through computation.
    • The techniques we will introduce in today's lecture will appear in almost every lecture for the remainder of the quarter!

Making a random choice¶

  • To simulate, we need a way to perform a random experiment on the computer (e.g. flipping a coin, rolling a die).
  • A helpful function is np.random.choice(options).
    • The input, options, is a list or array to choose from.
    • The output is a random element in options. By default, all elements are equally likely to be chosen.
In [2]:
# Simulate a fair coin flip
np.random.choice(['Heads', 'Tails'])
Out[2]:
'Heads'
In [3]:
# Simulate a roll of a die
np.random.choice(np.arange(1, 7))
Out[3]:
3

Making multiple random choices¶

np.random.choice(options, n) will return an array of n randomly selected elements from options.

In [4]:
# Simulate 10 fair coin flips
np.random.choice(['Heads', 'Tails'], 10)
Out[4]:
array(['Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads',
       'Tails', 'Tails', 'Tails'], dtype='<U5')

With replacement vs. without replacement¶

  • By default, np.random.choice selects with replacement.
  • That is, after making a selection, that option is still available.
    • e.g. if every time you draw a marble from a bag, you put it back.
  • If an option can only be selected once, select without replacement by specifying replace=False.
    • e.g. if every time you draw a marble from a bag, you do not put it back.
In [5]:
# Choose three colleges to win free HDH swag
colleges = ['Revelle', 'John Muir', 'Thurgood Marshall', 
            'Earl Warren', 'Eleanor Roosevelt', 'Sixth', 'Seventh']

np.random.choice(colleges, 3, replace=False)
Out[5]:
array(['Seventh', 'Revelle', 'Eleanor Roosevelt'], dtype='<U17')

Example: What's the probability of getting 60 or more heads if we flip 100 coins?¶

Flipping coins¶

What is the probability of getting 60 or more heads if we flip 100 coins?

Strategy:

  1. Figure out how to do one experiment (i.e., flip 100 coins).
  2. Run the experiment a bunch of times.
  3. Find the proportion of experiments in which the number of heads was 60 or more.

Step 1: Figure out how to do one experiment¶

  • Use np.random.choice to flip 100 coins.
  • Use np.count_nonzero to count the number of heads.
    • np.count_nonzero(array) returns the number of entries in array that are True.
In [6]:
coins = np.random.choice(['Heads', 'Tails'], 100)
coins
Out[6]:
array(['Tails', 'Tails', 'Tails', ..., 'Heads', 'Tails', 'Tails'],
      dtype='<U5')
In [7]:
(coins == 'Heads').sum()
Out[7]:
44
In [8]:
np.count_nonzero(coins == 'Heads') # counts the number of Trues in sequence
Out[8]:
44
  • Q: Why is it called count_nonzero?
  • A: In Python, True == 1 and False == 0, so counting the non-zero elements counts the number of Trues.

Aside: Putting the experiment in a function¶

It's a good idea to do this, as it makes it easier to run the experiment repeatedly.

In [9]:
def coin_experiment():
    coins = np.random.choice(['Heads', 'Tails'], 100)
    return np.count_nonzero(coins == 'Heads')
In [10]:
coin_experiment()
Out[10]:
52

Step 2: Repeat the experiment¶

  • How do we run the same code many times? Using a for-loop!
  • Each time we run the experiment, we'll need to store the results in an array.
    • To do this, we'll use np.append!
In [11]:
head_counts = np.array([])
head_counts
Out[11]:
array([], dtype=float64)
In [12]:
head_counts = np.append(head_counts, 15)
head_counts
Out[12]:
array([15.])
In [13]:
head_counts = np.append(head_counts, 25)
head_counts
Out[13]:
array([15., 25.])

Step 2: Repeat the experiment¶

In [14]:
# Specify the number of repetitions
repetitions = 10000

# Create an empty array to store the results
head_counts = np.array([])

for i in np.arange(repetitions):
    # For each repetition, run the experiment and add the result to head_counts
    head_count = coin_experiment()
    head_counts = np.append(head_counts, head_count)
In [15]:
len(head_counts)
Out[15]:
10000
In [16]:
head_counts
Out[16]:
array([57., 44., 55., ..., 48., 44., 49.])

Step 3: Find the proportion of experiments in which the number of heads was 60 or more¶

In [17]:
# In how many experiments was the number of heads >= 60?
at_least_60 = np.count_nonzero(head_counts >= 60)
at_least_60
Out[17]:
278
In [18]:
# What is this as a proportion?
at_least_60 / repetitions
Out[18]:
0.0278
In [19]:
# Can also use np.mean()! Why?
np.mean(head_counts >= 60)
Out[19]:
0.0278

This is quite close to the true theoretical answer!

In [20]:
# The theoretical answer – don't worry about how or why this code works
import math
sum([math.comb(100, i) * (1 / 2) ** 100 for i in np.arange(60, 101)])
Out[20]:
0.028443966820490392

Visualizing the distribution¶

In [21]:
bpd.DataFrame().assign(
    Number_of_Heads=head_counts
).plot(kind='hist', bins=np.arange(30, 70), density=True, ec='w', figsize=(10, 5));
plt.axvline(60, color='C1');
2023-02-06T00:15:18.587553 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
  • This histogram describes the distribution of the number of heads in each experiment.
  • Now we see another reason to use density histograms.
    • Using density means that areas approximate probabilities.

Example: The "Monty Hall" Problem¶

The "Monty Hall" Problem¶

Suppose you’re on a game show, and you’re given the choice of three doors: behind one door is a car 🚗; behind the others, goats 🐐🐐.

  • You pick a door, say No. 2, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat.

  • He then says to you, “Do you want to pick door No. 1?”

  • Question: Is it to your advantage to switch your choice?

(The question was posed in Parade magazine’s "Ask Marilyn" column in 1990. It is called the "Monty Hall problem" because Monty Hall hosted a similar game show called "Let's Make a Deal.")

Concept Check ✅ – Answer at cc.dsc10.com¶

You originally selected door #2. The host reveals door #3 to have a goat behind it. What should you do?

A. Might as well stick with door number #2; it has just as high a chance of winning as door #1. It doesn't matter whether you switch or not.

B. Switch to door number #1; it has a higher chance of winning than door #2.

Let's see 🤔¶

  • We'll use simulation to compute:
    • The probability of winning if we switch.
    • The probability of winning if we stay.
  • Whichever strategy has the higher probability of winning is better!

Time to simulate!¶

Let's simulate the Monty Hall problem many times to estimate the probability of winning if we switch.

  1. Figure out how to simulate one game of Monty Hall.
  2. Play the game many times, switching each time.
  3. Count the proportion of wins.

Then we'll repeat the process for staying each time.

Step 1: Simulate a single game¶

When you pick a door, there are three equally-likely outcomes:

  1. Car.
  2. Goat #1.
  3. Goat #2.
In [22]:
behind_picked_door = np.random.choice(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door
Out[22]:
'Goat #2'

Step 1: Simulate a single game¶

When Monty opens a different door, he always reveals a goat.

In [23]:
if behind_picked_door == 'Goat #1':
    revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
    revealed = 'Goat #1'
else:
    revealed = np.random.choice(['Goat #1', 'Goat #2'])

revealed
Out[23]:
'Goat #1'

Step 1: Simulate a single game¶

If you always switch, you'll end up winning the prize that is neither behind_picked_door nor revealed.

In [24]:
for prize in ['Car', 'Goat #1', 'Goat #2']:
    if prize != behind_picked_door and prize != revealed:
        your_prize = prize

your_prize
Out[24]:
'Car'

Step 1: Simulate a single game¶

Let's turn this into a function to make it easier to repeat:

In [25]:
def simulate_switch_strategy():
    behind_picked_door = np.random.choice(['Car', 'Goat #1', 'Goat #2'])
    
    if behind_picked_door == 'Goat #1':
        revealed = 'Goat #2'
    elif behind_picked_door == 'Goat #2':
        revealed = 'Goat #1'
    else:
        revealed = np.random.choice(['Goat #1', 'Goat #2'])
        
    for prize in ['Car', 'Goat #1', 'Goat #2']:
        if prize != behind_picked_door and prize != revealed:
            your_prize = prize
        
    #print(behind_picked_door, 'was behind the door.', revealed, 'was revealed by the host. Your prize was:', your_prize)
    return your_prize
In [26]:
simulate_switch_strategy()
Out[26]:
'Goat #1'

Step 2: Play the game many times¶

We should save your prize in each game. To do so, let's use np.append:

In [27]:
repetitions = 10000

your_prizes = np.array([])

for i in np.arange(repetitions):
    your_prize = simulate_switch_strategy()
    your_prizes = np.append(your_prizes, your_prize)
In [28]:
your_prizes
Out[28]:
array(['Goat #2', 'Car', 'Goat #2', ..., 'Goat #1', 'Car', 'Car'],
      dtype='<U32')

Step 3: Count the proportion of wins for this strategy (switching)¶

In [29]:
your_prizes
Out[29]:
array(['Goat #2', 'Car', 'Goat #2', ..., 'Goat #1', 'Car', 'Car'],
      dtype='<U32')
In [30]:
np.count_nonzero(your_prizes == 'Car')
Out[30]:
6759
In [31]:
np.count_nonzero(your_prizes == 'Car') / repetitions
Out[31]:
0.6759

This is quite close to the true probability of winning if you switch, $\frac{2}{3}$.

Alternate implementation¶

  • Looking back at our implementation, we kept track of your prize in each game.
  • However, all we really needed to keep track of was the number of games in which you won a car.
  • Idea: Keep a tally of the number of times you won a car. That is, initialize car_count to 0, and add 1 to it each time your prize is a car.
In [32]:
car_count = 0
In [33]:
for i in np.arange(repetitions):
    your_prize = simulate_switch_strategy()
    if your_prize == 'Car':
        car_count = car_count + 1
In [34]:
car_count / repetitions
Out[34]:
0.6681

No arrays needed! This strategy won't always work; it depends on the goal of the simulation.

What if you always stay with your original door?¶

  • In this case, your prize is always the same as what was behind the picked door.
In [35]:
car_count = 0

for i in np.arange(repetitions):
    behind_picked_door = np.random.choice(['Car', 'Goat #1', 'Goat #2'])
    your_prize = behind_picked_door
    if your_prize == 'Car':
        car_count = car_count + 1
        
car_count / repetitions
Out[35]:
0.3326
  • This is quite close to the true probability of winning if you stay, $\frac{1}{3}$.
  • Conclusion: It's better to switch.
  • Explanation:
    • If you originally choose a goat, Monty will reveal the other goat, and you'll win the car by switching.
    • If you originally choose a car, you'll win by staying.
    • But there 2 goats and 1 car, so you win twice as often by switching.

Marilyn vos Savant's column in Parade magazine¶

  • When asked this question by a reader, vos Savant stated the correct answer: switch.
  • She received over 10,000 letters in disagreement, including over 1,000 letters from people with Ph.D.s.
  • This became a nationwide controversy, even getting a front-page New York Times article in 1991.

Summary, next time¶

Simulation finds probabilities¶

  • Calculating probabilities is important, but can be hard!
    • You'll learn plenty of formulas in future DSC classes.
  • Simulation lets us find probabilities through computing instead of math.
  • Many real-world scenarios are complicated.
    • Simulation is much easier than math in many of these cases.

The simulation "recipe"¶

To estimate the probability of an event through simulation:

  1. Make a function that runs the experiment once.
  2. Run that function many, many times (usually 10000) with a for-loop, and save the results in an array with np.append.
  3. Compute the proportion of times the event occurs using np.count_nonzero.

What's next?¶

  • In the next class, we will start talking about sampling.
    • Key idea: We want to learn something about a large population (e.g. all undergraduates at UCSD). However, it's far too difficult to survey everyone. If we collect a sample, what can we infer about the larger population?
  • Today's material is on the midterm; Wednesday's is not.