In [1]:

```
# Run this cell to set up packages for lecture.
from lec12_imports import *
```

- Quiz 2 is
**today**in your assigned quiz session.- Check your email for your seating assignment.
- Bring your ID and a pencil.
- This is a 20 minute paper-based quiz with no aids allowed.
- The quiz covers Lecture 5 through 9 and related labs and homeworks.
- Quiz 2 is
**more challenging**than Quiz 1, and next week's Midterm Exam will be more challenging than Quiz 2. 📈

- Homework 3 is due on
**Tuesday**.- Problems 1 and 2 only are relevant to Quiz 2.

- Make sure to get started on the Midterm Project!

- Recap: iteration.
- Simulations.
- Example: What's the probability of getting 60 or more heads if we flip 100 coins?
- Example: The "Monty Hall" Problem.

`for`

-loops¶`for`

-loops are used to repeat the execution of code for every element of a sequence.- Lists, arrays, and strings are examples of sequences.

In [2]:

```
for x in ["my boyfriend", "a god", "the breeze in my hair on the weekend", "a relaxing thought"]:
print("Karma is " + x)
```

In [3]:

```
# Saving the lyrics in a variable.
lyrics = ""
for x in ["my boyfriend", "a god", "the breeze in my hair on the weekend", "a relaxing thought"]:
lyrics = lyrics + "Karma is " + x +"\n"
```

In [4]:

```
lyrics
```

Out[4]:

'Karma is my boyfriend\nKarma is a god\nKarma is the breeze in my hair on the weekend\nKarma is a relaxing thought\n'

In [5]:

```
print(lyrics)
```

- To store our results, we'll typically use an
`int`

, array, or string.

- If using an
`int`

, we define an`int`

variable (usually set to`0`

) before the loop, then use`+`

to add to it inside the loop.- Think of this like using a tally.

- If using an array, we create an array (usually empty) before the loop, then use
`np.append`

to add to it inside the loop.- Think of this like writing the results on a piece of paper.

- If using a string, we define a string variable (usually set to
`""`

) before the loop, then use string concatenation`+`

to add to it inside the loop.- Think of this like writing a word, one letter at a time.

- This pattern – of repeatedly adding to an
`int`

, array, or string – is called the accumulator pattern.

`for`

-loops in DSC 10¶Almost every

`for`

-loop in DSC 10 will use the accumulator pattern.Do

**not**use`for`

-loops to perform mathematical operations on every element of an array or Series.- Instead use DataFrame manipulations and built-in array or Series methods.

Helpful video 🎥: For Loops (and when not to use them) in DSC 10.

String are sequences, so we can iterate over them, too!

In [6]:

```
for letter in 'uc san diego':
print(letter.upper())
```

U C S A N D I E G O

In [7]:

```
'california'.count('a')
```

Out[7]:

2

Below, complete the implementation of the function `vowel_count`

, which returns the number of vowels in the input string `s`

(including repeats). Example behavior is shown below.

```
>>> vowel_count('king triton')
3
>>> vowel_count('i go to uc san diego')
8
```

def vowel_count(s): # We need to keep track of the number of vowels seen so far. Before we start, we've seen zero vowels. number = 0 # For each of the 5 vowels: for vowel in 'aeiou': # Count the number of occurrences of this vowel in s. num_vowel = s.count(vowel) # Add this count to the variable number. number = number + num_vowel # Once we've gotten through all 5 vowels, return the answer. return number

In [8]:

```
def vowel_count(s):
# We need to keep track of the number of vowels seen so far.
# Before we start, we've seen zero vowels.
# For each of the 5 vowels:
# Count the number of occurrences of this vowel in s.
# Add this count to the variable number.
# Once we've gotten through all 5 vowels, return the answer.
return ...
```

In [9]:

```
vowel_count('king triton')
```

Out[9]:

Ellipsis

In [10]:

```
vowel_count('i go to uc san diego')
```

Out[10]:

Ellipsis

- What is the probability of getting 60 or more heads if we flip 100 coins?

- While we
*could*calculate it by hand (and will learn how to in future courses), we can also**estimate**it using the computer:- Figure out how to run the experiment (flipping 100 coins) once.
- Repeat the experiment many times.
- Find the proportion of experiments in which the number of heads was 60 or more.

- This is how we'll use
**simulations**– to**estimate**, or approximate, probabilities through computation.- The techniques we will introduce in today's lecture will appear in almost every lecture for the remainder of the quarter!

- A helpful function is
`np.random.choice(options)`

.- The input,
`options`

, is a list or array to choose from. - The output is a random element in
`options`

. By default, all elements are equally likely to be chosen.

- The input,

In [11]:

```
# Simulate a fair coin flip.
np.random.choice(['Heads', 'Tails'])
```

Out[11]:

'Heads'

In [12]:

```
# Simulate a roll of a die.
np.random.choice(np.arange(1, 7))
```

Out[12]:

3

`np.random.choice(options, n)`

will return an array of `n`

randomly selected elements from `options`

.

In [13]:

```
# Simulate 10 fair coin flips.
np.random.choice(['Heads', 'Tails'], 10)
```

Out[13]:

array(['Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails'], dtype='<U5')

- By default,
`np.random.choice`

selects**with**replacement. - That is, after making a selection, that option is still available.
- e.g. if every time you draw a marble from a bag, you put it back.

- If an option can only be selected once, select
**without**replacement by specifying`replace=False`

.- e.g. if every time you draw a marble from a bag, you do not put it back.

In [14]:

```
# Choose three colleges to win free HDH swag.
colleges = ['Revelle', 'John Muir', 'Thurgood Marshall',
'Earl Warren', 'Eleanor Roosevelt', 'Sixth', 'Seventh', 'Eighth']
np.random.choice(colleges, 3, replace=False)
```

Out[14]:

array(['Earl Warren', 'Revelle', 'Seventh'], dtype='<U17')

What's the probability of getting 60 or more heads if we flip 100 coins?

**Plan**:

- Figure out how to run the experiment (flipping 100 coins) once.
- Repeat the experiment many times.
- Find the proportion of experiments in which the number of heads was 60 or more.

- Use
`np.random.choice`

to flip 100 coins. - Use
`np.count_nonzero`

to count the number of heads.`np.count_nonzero(array)`

returns the number of entries in`array`

that are`True`

.

In [15]:

```
coins = np.random.choice(['Heads', 'Tails'], 100)
coins
```

Out[15]:

array(['Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads'], dtype='<U5')

In [16]:

```
coins == 'Heads'
```

Out[16]:

array([False, True, True, True, False, False, False, True, False, True, True, False, False, False, False, True, True, False, True, False, False, False, False, True, False, True, True, True, True, True, False, True, False, False, False, False, False, False, True, True, True, False, True, False, False, True, True, False, True, True, True, False, False, True, False, False, True, False, False, False, True, False, False, True, False, False, False, True, True, True, True, True, True, True, False, False, False, True, True, False, False, True, False, True, True, False, False, True, True, False, True, True, False, False, True, True, False, True, False, True])

In [17]:

```
(coins == 'Heads').sum()
```

Out[17]:

49

In [18]:

```
np.count_nonzero(coins == 'Heads') # Counts the number of Trues in the sequence.
```

Out[18]:

49

In [19]:

```
np.count_nonzero([5, 6, 0, 2])
```

Out[19]:

3

**Question**: Why is it called`count_nonzero`

?**Answer**: In Python,`True == 1`

and`False == 0`

, so counting the non-zero elements counts the number of`True`

s.

This makes it easy to run the experiment repeatedly.

In [20]:

```
def coin_experiment():
coins = np.random.choice(['Heads', 'Tails'], 100)
return np.count_nonzero(coins == 'Heads')
```

In [21]:

```
coin_experiment()
```

Out[21]:

48

- How do we run a piece of code many times?
**Using a**`for`

-loop! - Each time we run the experiment, we'll need to store the results in an array.
- To do this, we'll use
`np.append`

!

- To do this, we'll use

In [22]:

```
head_counts = np.array([])
head_counts
```

Out[22]:

array([], dtype=float64)

In [23]:

```
head_counts = np.append(head_counts, 15)
head_counts
```

Out[23]:

array([15.])

In [24]:

```
head_counts = np.append(head_counts, 25)
head_counts
```

Out[24]:

array([15., 25.])

- Imagine we start with a blank sheet of paper, and each time we run the experiment, we write the number of heads we see down on the sheet of paper.
- The sheet will start off empty, but eventually will have one number for each time we ran the experiment.

In [25]:

```
# Specify the number of repetitions.
repetitions = 10000
# Create an empty array to store the results.
head_counts = np.array([])
for i in np.arange(repetitions):
# For each repetition, run the experiment and add the result to head_counts.
head_count = coin_experiment()
head_counts = np.append(head_counts, head_count)
```

In [26]:

```
len(head_counts)
```

Out[26]:

10000

In [27]:

```
head_counts
```

Out[27]:

array([60., 52., 45., ..., 44., 53., 47.])

In [28]:

```
# In how many experiments was the number of heads >= 60?
at_least_60 = np.count_nonzero(head_counts >= 60)
at_least_60
```

Out[28]:

270

In [29]:

```
# What is this as a proportion?
at_least_60 / repetitions
```

Out[29]:

0.027

In [30]:

```
# Can also use np.mean()! Why?
np.mean(head_counts >= 60)
```

Out[30]:

0.027

This is quite close to the true theoretical answer!

In [31]:

```
# The theoretical answer – don't worry about how or why this code works.
import math
sum([math.comb(100, i) * (1 / 2) ** 100 for i in np.arange(60, 101)])
```

Out[31]:

0.028443966820490392

In [32]:

```
head_counts
```

Out[32]:

array([60., 52., 45., ..., 44., 53., 47.])

In [33]:

```
bpd.DataFrame().assign(
Number_of_Heads=head_counts
).plot(kind='hist', bins=np.arange(30, 70), density=True, ec='w', figsize=(10, 5));
plt.axvline(60, color='C1', linewidth=4);
```

- This histogram describes the distribution of the number of heads in each experiment.
- Now we see another reason to use density histograms.
- Using density means that areas are
**probabilities**.

- Using density means that areas are
- Next class, we'll learn more about
*why*it's valid to estimate probabilities using simulations.

Suppose you’re on a game show, and you’re given the choice of three doors. A car 🚗 is behind one of the doors, and goats 🐐🐐 are behind the other two.

You pick a door, say Door #2, and the host,

**who knows what’s behind the doors**, opens another door, say Door #3, which has a goat.The host then says to you, “Do you want to switch to Door #1 or stay with Door #2?”

**Question**: Should you stay or switch?

*(The question was posed in Parade magazine’s "Ask Marilyn" column in 1990. It is called the "Monty Hall problem" because Monty Hall hosted a similar game show called "Let's Make a Deal.")*

In [34]:

```
from IPython.display import IFrame
IFrame('https://montyhall.io/', width=600, height=400)
```

Out[34]:

Suppose you originally selected Door #2. The host reveals Door #3 to have a goat behind it. What should you do?

A. **Stay** with Door #2; it has just as high a chance of winning as Door #1. It doesn't matter whether you switch or not.

B. **Switch** to Door #1; it has a higher chance of winning than Door #2.

- Let's
**estimate**the probability of winning if you switch.

**Plan**:

- Figure out how to simulate a single game.
- Play the game many times, switching each time.
- Compute the proportion of wins.

When you pick a door, there are three equally-likely outcomes for what is behind the door you picked:

- Car.
- Goat #1.
- Goat #2.

In [35]:

```
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
behind_picked_door
```

Out[35]:

'Goat #1'

When the host opens a different door, they always reveal a goat.

In [36]:

```
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
# This is the case in which you originally picked a car!
revealed = np.random.choice(['Goat #1', 'Goat #2'])
revealed
```

Out[36]:

'Goat #2'

`behind_picked_door`

nor `revealed`

.

In [37]:

```
options
```

Out[37]:

array(['Car', 'Goat #1', 'Goat #2'], dtype='<U7')

In [38]:

```
behind_picked_door
```

Out[38]:

'Goat #1'

In [39]:

```
revealed
```

Out[39]:

'Goat #2'

In [40]:

```
your_prize = options[(options != behind_picked_door) & (options != revealed)][0]
your_prize
```

Out[40]:

'Car'

Let's put all of our work into a single function to make it easier to repeat.

In [41]:

```
def simulate_switch_strategy():
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
revealed = np.random.choice(['Goat #1', 'Goat #2'])
your_prize = options[(options != behind_picked_door) & (options != revealed)][0]
#print(behind_picked_door, 'was behind the door.', revealed, 'was revealed by the host. Your prize was:', your_prize)
return your_prize
```

Now, every time we call `simulate_switch_strategy`

, the result is your prize.

In [42]:

```
simulate_switch_strategy()
```

Out[42]:

'Car'

We should save your prize in each game; to do so, we'll use `np.append`

.

In [43]:

```
repetitions = 10000
your_prizes = np.array([])
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
your_prizes = np.append(your_prizes, your_prize)
```

In [44]:

```
your_prizes
```

Out[44]:

array(['Car', 'Goat #2', 'Car', ..., 'Car', 'Car', 'Car'], dtype='<U32')

In [45]:

```
your_prizes
```

Out[45]:

array(['Car', 'Goat #2', 'Car', ..., 'Car', 'Car', 'Car'], dtype='<U32')

In [46]:

```
np.count_nonzero(your_prizes == 'Car')
```

Out[46]:

6604

In [47]:

```
np.count_nonzero(your_prizes == 'Car') / repetitions
```

Out[47]:

0.6604

This is quite close to the true probability of winning if you switch, $\frac{2}{3}$.

- Looking back at our implementation, we kept track of your prize in each game.

- However, all we really needed to keep track of was the
**number of games**in which you won a car.

**💡 Idea**: Keep a*tally*of the number of times you won a car. That is, initialize`car_count`

to 0, and add 1 to it each time your prize is a car.

In [48]:

```
car_count = 0
```

In [49]:

```
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
if your_prize == 'Car':
car_count = car_count + 1
```

In [50]:

```
car_count / repetitions
```

Out[50]:

0.667

No arrays needed! This strategy won't always work; it depends on the goal of the simulation.

In this case, your prize is always the same as what was behind the picked door.

In [51]:

```
car_count = 0
for i in np.arange(repetitions):
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
your_prize = behind_picked_door
if your_prize == 'Car':
car_count = car_count + 1
car_count / repetitions
```

Out[51]:

0.334

- This is quite close to the true probability of winning if you stay, $\frac{1}{3}$.

**Conclusion**: It's better to switch.

**Why?**- If you originally choose a goat, Monty will reveal the other goat, and you'll win the car by switching.
- If you originally choose a car, you'll win by staying.
- But there are 2 goats and only 1 car, so you win twice as often by switching.

- When asked this question by a reader, vos Savant stated the correct answer:
*switch*. - She received over 10,000 letters in disagreement, including over 1,000 letters from people with Ph.D.s.
- This became a nationwide controversy, even getting a front-page New York Times article in 1991.

- Calculating probabilities is important, but can be hard!
- You'll learn plenty of formulas in future DSC classes, if you end up taking them.

- Simulations let us find probabilities through code rather than through math.
- Many real-world scenarios are complicated.
- Simulations are much easier than math in many of these cases.

To estimate the probability of an event through simulation:

- Make a function that runs the experiment once.
- Run that function many times (usually 10,000) with a
`for`

-loop, and save the results in an array with`np.append`

. - Compute the proportion of times the event occurs using
`np.count_nonzero`

.