# Run this cell to set up packages for lecture.
from lec12_imports import *
Announcements¶
- Quiz 2 is today in your assigned quiz session.
- Check your email for your seating assignment.
- This is a 20 minute paper-based quiz with no aids allowed.
- The quiz covers Lecture 5 through 10 and related labs and homeworks.
- Lab 3 is due tomorrow. Homework 3 is due on Sunday.
- Make sure to get started on the Midterm Project!
- The Midterm Exam is one week from today, during your scheduled lecture section (including 1PM).
Agenda¶
- Recap: iteration.
- Simulations.
- Example: What's the probability of getting 60 or more heads if we flip 100 coins?
- Example: The "Monty Hall" Problem.
Recap: iteration¶
for
-loops¶
for
-loops are used to repeat the execution of code for every element of a sequence.- Lists, arrays, and strings are examples of sequences.
for x in ["my boyfriend", "a god", "the breeze in my hair on the weekend", "a relaxing thought"]:
print("Karma is " + x)
Karma is my boyfriend Karma is a god Karma is the breeze in my hair on the weekend Karma is a relaxing thought
# Saving the lyrics in a variable.
lyrics = ""
for x in ["my boyfriend", "a god", "the breeze in my hair on the weekend", "a relaxing thought"]:
lyrics = lyrics + "Karma is " + x +"\n"
lyrics
'Karma is my boyfriend\nKarma is a god\nKarma is the breeze in my hair on the weekend\nKarma is a relaxing thought\n'
print(lyrics)
Karma is my boyfriend Karma is a god Karma is the breeze in my hair on the weekend Karma is a relaxing thought
The accumulator pattern¶
- To store our results, we'll typically use an
int
, array, or string.
- If using an
int
, we define anint
variable (usually set to0
) before the loop, then use+
to add to it inside the loop.- Think of this like using a tally.
- If using an array, we create an array (usually empty) before the loop, then use
np.append
to add to it inside the loop.- Think of this like writing the results on a piece of paper.
- If using a string, we define a string variable (usually set to
""
) before the loop, then use string concatenation+
to add to it inside the loop.- Think of this like writing a word, one letter at a time.
- This pattern – of repeatedly adding to an
int
, array, or string – is called the accumulator pattern.
for
-loops in DSC 10¶
Almost every
for
-loop in DSC 10 will use the accumulator pattern.Do not use
for
-loops to perform mathematical operations on every element of an array or Series.- Instead use DataFrame manipulations and built-in array or Series methods.
Helpful video 🎥: For Loops (and when not to use them) in DSC 10.
Working with strings¶
String are sequences, so we can iterate over them, too!
for letter in 'uc san diego':
print(letter.upper())
U C S A N D I E G O
'california'.count('a')
2
Example: Vowel count¶
Below, complete the implementation of the function vowel_count
, which returns the number of vowels in the input string s
(including repeats). Example behavior is shown below.
>>> vowel_count('king triton')
3
>>> vowel_count('i go to uc san diego')
8
✅ Click here to see the solution after you've tried it yourself.
def vowel_count(s): # We need to keep track of the number of vowels seen so far. Before we start, we've seen zero vowels. number = 0 # For each of the 5 vowels: for vowel in 'aeiou': # Count the number of occurrences of this vowel in s. num_vowel = s.count(vowel) # Add on to the running total. number = number + num_vowel # Once we've gotten through all 5 vowels, return the answer. return number
def vowel_count(s):
# We need to keep track of the number of vowels seen so far.
# Before we start, we've seen zero vowels.
# For each of the 5 vowels:
# Count the number of occurrences of this vowel in s.
# Add on to the running total.
# Once we've gotten through all 5 vowels, return the answer.
return ...
vowel_count('king triton')
Ellipsis
vowel_count('i go to uc san diego')
Ellipsis
Simulations¶
Simulations to estimate probabilities¶
- What is the probability of getting 60 or more heads if we flip 100 coins?
- While we could calculate it by hand (and will learn how to in future courses), we can also estimate it using the computer:
- Figure out how to run the experiment (flipping 100 coins) once.
- Repeat the experiment many times.
- Find the proportion of experiments in which the number of heads was 60 or more.
- This is how we'll use simulations – to estimate, or approximate, probabilities through computation.
- The techniques we will introduce in today's lecture will appear in almost every lecture for the remainder of the quarter!
Making a random choice¶
- To simulate, we need a way to perform a random experiment on the computer (e.g. flipping a coin, rolling a die).
- A helpful function is
np.random.choice(options)
.- The input,
options
, is a list, array, or Series to choose from. - The output is a random element in
options
. By default, all elements are equally likely to be chosen.
- The input,
# Simulate a fair coin flip.
np.random.choice(['Heads', 'Tails'])
'Heads'
# Simulate a roll of a die.
np.random.choice(np.arange(1, 7))
2
Making multiple random choices¶
np.random.choice(options, n)
will return an array of n
randomly selected elements from options
.
# Simulate 10 fair coin flips.
np.random.choice(['Heads', 'Tails'], 10)
array(['Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads'], dtype='<U5')
With replacement vs. without replacement¶
- By default,
np.random.choice
selects with replacement. - That is, after making a selection, that option is still available.
- e.g. if every time you draw a marble from a bag, you put it back.
- If an option can only be selected once, select without replacement by specifying
replace=False
.- e.g. if every time you draw a marble from a bag, you do not put it back.
# Choose three colleges to win free HDH swag.
colleges = ['Revelle', 'John Muir', 'Thurgood Marshall',
'Earl Warren', 'Eleanor Roosevelt', 'Sixth', 'Seventh', 'Eighth']
np.random.choice(colleges, 3, replace=False)
array(['Sixth', 'John Muir', 'Revelle'], dtype='<U17')
Example: What's the probability of getting 60 or more heads if we flip 100 coins?¶
Flipping coins¶
What's the probability of getting 60 or more heads if we flip 100 coins?
Plan:
- Figure out how to run the experiment (flipping 100 coins) once.
- Repeat the experiment many times.
- Find the proportion of experiments in which the number of heads was 60 or more.
Step 1: Figure out how to run the experiment once¶
- Use
np.random.choice
to flip 100 coins. - Use
np.count_nonzero
to count the number of heads.np.count_nonzero(array)
returns the number of entries inarray
that areTrue
.
coins = np.random.choice(['Heads', 'Tails'], 100)
coins
array(['Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads'], dtype='<U5')
coins == 'Heads'
array([ True, False, True, False, True, False, False, False, False, True, True, True, False, False, True, True, True, False, True, True, False, True, True, False, True, False, False, False, False, False, True, False, False, True, True, True, True, False, True, False, False, False, True, False, False, False, True, True, True, True, True, False, False, True, True, False, True, True, False, False, False, False, True, True, False, True, True, False, True, True, True, False, True, False, True, True, False, False, True, True, False, False, True, True, False, True, True, True, False, True, True, True, False, True, True, True, True, False, True, True])
(coins == 'Heads').sum()
56
np.count_nonzero(coins == 'Heads') # Counts the number of Trues in the sequence.
56
np.count_nonzero([5, 6, 0, 2])
3
- Question: Why is it called
count_nonzero
? - Answer: In Python,
True == 1
andFalse == 0
, so counting the non-zero elements counts the number ofTrue
s.
Aside: Defining a function to run the experiment¶
This makes it easy to run the experiment repeatedly.
def coin_experiment():
coins = np.random.choice(['Heads', 'Tails'], 100)
return np.count_nonzero(coins == 'Heads')
coin_experiment()
52
Step 2: Repeat the experiment many times¶
- How do we run a piece of code many times? Using a
for
-loop! - Each time we run the experiment, we'll need to store the results in an array.
- To do this, we'll use
np.append
!
- To do this, we'll use
head_counts = np.array([])
head_counts
array([], dtype=float64)
head_counts = np.append(head_counts, 15)
head_counts
array([15.])
head_counts = np.append(head_counts, 25)
head_counts
array([15., 25.])
Step 2: Repeat the experiment many times¶
- Imagine we start with a blank sheet of paper, and each time we run the experiment, we write the number of heads we see down on the sheet of paper.
- The sheet will start off empty, but eventually will have one number for each time we ran the experiment.
# Specify the number of repetitions.
repetitions = 10000
# Create an empty array to store the results.
head_counts = np.array([])
for i in np.arange(repetitions):
# For each repetition, run the experiment and add the result to head_counts.
head_count = coin_experiment()
head_counts = np.append(head_counts, head_count)
len(head_counts)
10000
head_counts
array([54., 48., 43., ..., 53., 47., 45.])
Step 3: Find the proportion of experiments in which the number of heads was 60 or more¶
# In how many experiments was the number of heads >= 60?
at_least_60 = np.count_nonzero(head_counts >= 60)
at_least_60
293
# What is this as a proportion?
at_least_60 / repetitions
0.0293
# Can also use np.mean()! Why?
np.mean(head_counts >= 60)
0.0293
This is quite close to the true theoretical answer!
# The theoretical answer – don't worry about how or why this code works.
import math
sum([math.comb(100, i) * (1 / 2) ** 100 for i in np.arange(60, 101)])
0.028443966820490392
Visualizing the distribution¶
head_counts
array([54., 48., 43., ..., 53., 47., 45.])
bpd.DataFrame().assign(
Number_of_Heads=head_counts
).plot(kind='hist', bins=np.arange(30, 70), density=True, ec='w', figsize=(10, 5));
plt.axvline(60, color='C1', linewidth=4);
- This histogram describes the distribution of the number of heads in each experiment.
- Now we see another reason to use density histograms.
- Using density means that areas are probabilities.
- Next class, we'll learn more about why it's valid to estimate probabilities using simulations.
Example: The "Monty Hall" Problem¶
The "Monty Hall" Problem¶
Suppose you’re on a game show, and you’re given the choice of three doors. A car 🚗 is behind one of the doors, and goats 🐐🐐 are behind the other two.
You pick a door, say Door #2, and the host, who knows what’s behind the doors, opens another door, say Door #3, which has a goat.
The host then says to you, “Do you want to switch to Door #1 or stay with Door #2?”
Question: Should you stay or switch?
(The question was posed in Parade magazine’s "Ask Marilyn" column in 1990. It is called the "Monty Hall problem" because Monty Hall hosted a similar game show called "Let's Make a Deal.")
from IPython.display import IFrame
IFrame('https://montyhall.io/', width=600, height=400)
Concept Check ✅ – Answer at cc.dsc10.com¶
Suppose you originally selected Door #2. The host reveals Door #3 to have a goat behind it. What should you do?
A. Stay with Door #2; it has just as high a chance of winning as Door #1. It doesn't matter whether you switch or not.
B. Switch to Door #1; it has a higher chance of winning than Door #2.
Time to simulate!¶
- Let's estimate the probability of winning if you switch.
- If it's higher than 50%, then switching is the better strategy, otherwise staying is the better strategy.
Plan:
- Figure out how to simulate a single game.
- Play the game many times, switching each time.
- Compute the proportion of wins.
Step 1: Simulate a single game¶
When you pick a door, there are three equally-likely outcomes for what is behind the door you picked:
- Car.
- Goat #1.
- Goat #2.
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
behind_picked_door
'Car'
Step 1: Simulate a single game¶
When the host opens a different door, they always reveal a goat.
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
# This is the case in which you originally picked a car!
revealed = np.random.choice(['Goat #1', 'Goat #2'])
revealed
'Goat #2'
If you always switch, you'll end up winning the prize that is neither behind_picked_door
nor revealed
.
options
array(['Car', 'Goat #1', 'Goat #2'], dtype='<U7')
behind_picked_door
'Car'
revealed
'Goat #2'
your_prize = options[(options != behind_picked_door) & (options != revealed)][0]
your_prize
'Goat #1'
Step 1: Simulate a single game¶
Let's put all of our work into a single function to make it easier to repeat.
def simulate_switch_strategy():
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
revealed = np.random.choice(['Goat #1', 'Goat #2'])
your_prize = options[(options != behind_picked_door) & (options != revealed)][0]
#print(behind_picked_door, 'was behind the door.', revealed, 'was revealed by the host. Your prize was:', your_prize)
return your_prize
Now, every time we call simulate_switch_strategy
, the result is your prize.
simulate_switch_strategy()
'Car'
Step 2: Play the game many times¶
We should save your prize in each game; to do so, we'll use np.append
.
repetitions = 10000
your_prizes = np.array([])
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
your_prizes = np.append(your_prizes, your_prize)
your_prizes
array(['Car', 'Car', 'Car', ..., 'Car', 'Car', 'Goat #1'], dtype='<U32')
Step 3: Count the proportion of wins for this strategy (switching)¶
your_prizes
array(['Car', 'Car', 'Car', ..., 'Car', 'Car', 'Goat #1'], dtype='<U32')
np.count_nonzero(your_prizes == 'Car')
6716
np.count_nonzero(your_prizes == 'Car') / repetitions
0.6716
This is quite close to the true probability of winning if you switch, $\frac{2}{3}$.
Alternate implementation¶
- Looking back at our implementation, we kept track of your prize in each game.
- However, all we really needed to keep track of was the number of games in which you won a car.
- 💡 Idea: Keep a tally of the number of times you won a car. That is, initialize
car_count
to 0, and add 1 to it each time your prize is a car.
car_count = 0
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
if your_prize == 'Car':
car_count = car_count + 1
car_count / repetitions
0.6647
No arrays needed! This strategy won't always work; it depends on the goal of the simulation.
What if you always stay with your original door?¶
In this case, your prize is always the same as what was behind the picked door.
car_count = 0
for i in np.arange(repetitions):
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
your_prize = behind_picked_door
if your_prize == 'Car':
car_count = car_count + 1
car_count / repetitions
0.3313
- This is quite close to the true probability of winning if you stay, $\frac{1}{3}$.
- Conclusion: It's better to switch.
- Why?
- If you originally choose a goat, Monty will reveal the other goat, and you'll win the car by switching.
- If you originally choose a car, you'll win by staying.
- But there are 2 goats and only 1 car, so you win twice as often by switching.
Marilyn vos Savant's column in Parade magazine¶
- When asked this question by a reader, vos Savant stated the correct answer: switch.
- She received over 10,000 letters in disagreement, including over 1,000 letters from people with Ph.D.s.
- This became a nationwide controversy, even getting a front-page New York Times article in 1991.
Summary¶
Simulations find probabilities¶
- Calculating probabilities is important, but can be hard!
- You'll learn plenty of formulas in future DSC classes, if you end up taking them.
- Simulations let us find probabilities through code rather than through math.
- Many real-world scenarios are complicated.
- Simulations are much easier than math in many of these cases.
The simulation "recipe"¶
To estimate the probability of an event through simulation:
- Make a function that runs the experiment once.
- Run that function many times (usually 10,000) with a
for
-loop, and save the results in an array withnp.append
. - Compute the proportion of times the event occurs using
np.count_nonzero
.