# Lecture 22 – The Normal Distribution, The Central Limit Theorem¶

## DSC 10, Fall 2022¶

### Announcements¶

• Lab 7 is due Saturday 11/19 at 11:59pm.
• The Final Project is released, and has two deadlines:
• The checkpoint is due tomorrow at 11:59pm. No slip days!
• The final submission is due Tuesday 11/29 at 11:59pm. Slip days allowed.
• See the calendar for the latest office hours schedule.

### Agenda¶

• The normal distribution.
• The Central Limit Theorem.

### Recap: Standard units¶

SAT scores range from 0 to 1600. The distribution of SAT scores has a mean of 950 and a standard deviation of 300. Your friend tells you that their SAT score, in standard units, is 2.5. What do you conclude?

## The normal distribution¶

### Recap: The standard normal distribution¶

• The standard normal distribution can be thought of as a "continuous histogram."
• Like a histogram:
• The area between $a$ and $b$ is the proportion of values between $a$ and $b$.
• The total area underneath the normal curve is is 1.
• The standard normal distribution's cumulative density function (CDF) describes the proportion of values in the distribution less than or equal to $z$, for all values of $z$.
• In Python, we use the function scipy.stats.norm.cdf.

### Areas under the standard normal curve¶

What does scipy.stats.norm.cdf(0) evaluate to? Why?

### Areas under the standard normal curve¶

Suppose we want to find the area to the right of 2 under the standard normal curve.

The following expression gives us the area to the left of 2.

However, since the total area under the standard normal curve is 1:

$$\text{area right of 2} = 1 - (\text{area left of 2})$$

### Areas under the standard normal curve¶

How might we use stats.norm.cdf to compute the area between -1 and 0?

Strategy:

$$\text{area from -1 to 0} = (\text{area left of 0}) - (\text{area left of -1})$$

### General strategy for finding area¶

The area under the standard normal curve in the interval $[a, b]$ is

stats.norm.cdf(b) - stats.norm.cdf(a)


What can we do with this? We're about to see!

## Using the normal distribution¶

As we saw before, both variables are roughly normal. What benefit is there to knowing that the two distributions are roughly normal?

### Standard units and the normal distribution¶

• Key idea: The $x$-axis in a plot of the standard normal distribution is in standard units.
• For instance, the area between -1 and 1 is the proportion of values within 1 standard deviation of the mean.
• Suppose a distribution is roughly normal. Then, these are two are approximately equal:
• The proportion of values in the distribution between $a$ and $b$.
• The area between $z(a)$ and $z(b)$ under the standard normal curve. (Recall, $z(x_i) = \frac{x_i - \text{mean of$x$}}{\text{SD of$x$}}$.)

### Example: Proportion of weights between 200 and 225 pounds¶

Let's suppose, as is often the case, that we don't have access to the entire distribution of weights, just the mean and SD.

Using just this information, we can estimate the proportion of weights between 200 and 225 pounds:

1. Convert 200 to standard units.
2. Convert 225 to standard units.
3. Use stats.norm.cdf to find the area between (1) and (2).

### Checking the approximation¶

Since we have access to the entire set of weights, we can compute the true proportion of weights between 200 and 225 pounds.

Pretty good for an approximation! 🤩

### Warning: Standardization doesn't make a distribution normal!¶

Consider the distribution of delays from earlier in the lecture.

The distribution above does not look normal. It won't look normal even if we standardize it. By standardizing a distribution, all we do is move it horizontally and stretch it vertically – the shape itself doesn't change.

### Special cases¶

• As we just discovered, the $x$-axis in the standard normal curve represents standard units.
• Often times, we want to know the proportion of values within $z$ standard deviations of the mean.
Percent in Range Normal Distribution
$\text{mean} \pm 1 \: \text{SD}$ $\approx 68\%$
$\text{mean} \pm 2 \: \text{SDs}$ $\approx 95\%$
$\text{mean} \pm 3 \: \text{SDs}$ $\approx 99.73\%$

### 68% of values are within 1 SD of the mean¶

This means that if a variable follows a normal distribution, approximately 68% of values will be within 1 SD of the mean.