Lecture 22 – The Normal Distribution, The Central Limit Theorem

DSC 10, Fall 2022

Announcements

Agenda

Recap: Standard units

SAT scores range from 0 to 1600. The distribution of SAT scores has a mean of 950 and a standard deviation of 300. Your friend tells you that their SAT score, in standard units, is 2.5. What do you conclude?

The normal distribution

Recap: The standard normal distribution

Areas under the standard normal curve

What does scipy.stats.norm.cdf(0) evaluate to? Why?

Areas under the standard normal curve

Suppose we want to find the area to the right of 2 under the standard normal curve.

The following expression gives us the area to the left of 2.

However, since the total area under the standard normal curve is 1:

$$\text{area right of $2$} = 1 - (\text{area left of $2$})$$

Areas under the standard normal curve

How might we use stats.norm.cdf to compute the area between -1 and 0?

Strategy:

$$\text{area from $-1$ to $0$} = (\text{area left of $0$}) - (\text{area left of $-1$})$$

General strategy for finding area

The area under the standard normal curve in the interval $[a, b]$ is

stats.norm.cdf(b) - stats.norm.cdf(a)

What can we do with this? We're about to see!

Using the normal distribution

Let's return to our data set of heights and weights.

As we saw before, both variables are roughly normal. What benefit is there to knowing that the two distributions are roughly normal?

Standard units and the normal distribution

Example: Proportion of weights between 200 and 225 pounds

Let's suppose, as is often the case, that we don't have access to the entire distribution of weights, just the mean and SD.

Using just this information, we can estimate the proportion of weights between 200 and 225 pounds:

  1. Convert 200 to standard units.
  2. Convert 225 to standard units.
  3. Use stats.norm.cdf to find the area between (1) and (2).

Checking the approximation

Since we have access to the entire set of weights, we can compute the true proportion of weights between 200 and 225 pounds.

Pretty good for an approximation! 🤩

Warning: Standardization doesn't make a distribution normal!

Consider the distribution of delays from earlier in the lecture.

The distribution above does not look normal. It won't look normal even if we standardize it. By standardizing a distribution, all we do is move it horizontally and stretch it vertically – the shape itself doesn't change.

Center and spread, revisited

Special cases

Percent in Range Normal Distribution
$\text{mean} \pm 1 \: \text{SD}$ $\approx 68\%$
$\text{mean} \pm 2 \: \text{SDs}$ $\approx 95\%$
$\text{mean} \pm 3 \: \text{SDs}$ $\approx 99.73\%$

68% of values are within 1 SD of the mean

This means that if a variable follows a normal distribution, approximately 68% of values will be within 1 SD of the mean.

95% of values are within 2 SDs of the mean