- Lab 0 is out and is now due on
**Saturday, October 7 at 11:59PM**.- It's worthwhile to watch the 🎥 video towards the end on how to navigate DataHub and Jupyter Notebooks.

**Discussion starts today**. Your assigned discussion section is based on the lecture section you are enrolled in, unless you were approved to change to a different section. Post privately on Ed with any questions about which section to attend.- Section A: Wednesday 3-3:50PM in Pepper Canyon Hall 109.
- Section B: Wednesday 4-4:50PM in Pepper Canyon Hall 109.
- Section C: Wednesday 5-5:50PM in Mandeville B-210.
- The first quiz is next week.

- The
**HDSI Undergraduate Social**is tomorrow from 4-6PM on the HDSI Patio.

- We're covering
**a lot**of content very quickly. If you're overwhelmed, just know that we're here to support you!- Ed and office hours are your friends!

- Check the Resources tab of the course website for programming resources.

- Data types.
- Strings. 🧶
- Means and medians.
- Lists.
- Arrays.

`int`

and `float`

¶- Every value in Python has a
**type**. - Use the
`type`

function to check a value's type. - There are two numeric data types:
`int`

: An integer of any size.`float`

: A number with a decimal point.

In [1]:

```
# int.
6 + 4
```

Out[1]:

10

In [2]:

```
# float.
20 / 2
```

Out[2]:

10.0

`int`

¶- If you add (
`+`

), subtract (`-`

), multiply (`*`

), or exponentiate (`**`

)`int`

s, the result will be another`int`

. `int`

s have arbitrary precision in Python, meaning that your calculations will always be exact.

In [3]:

```
7 - 15
```

Out[3]:

-8

In [4]:

```
type(7 - 15)
```

Out[4]:

int

In [5]:

```
2 ** 300
```

Out[5]:

2037035976334486086268445688409378161051468393665936250636140449354381299763336706183397376

In [6]:

```
2 ** 3000
```

Out[6]:

1230231922161117176931558813276752514640713895736833715766118029160058800614672948775360067838593459582429649254051804908512884180898236823585082482065348331234959350355845017413023320111360666922624728239756880416434478315693675013413090757208690376793296658810662941824493488451726505303712916005346747908623702673480919353936813105736620402352744776903840477883651100322409301983488363802930540482487909763484098253940728685132044408863734754271212592471778643949486688511721051561970432780747454823776808464180697103083861812184348565522740195796682622205511845512080552010310050255801589349645928001133745474220715013683413907542779063759833876101354235184245096670042160720629411581502371248008430447184842098610320580417992206662247328722122088513643683907670360209162653670641130936997002170500675501374723998766005827579300723253474890612250135171889174899079911291512399773872178519018229989376

`float`

¶- A
`float`

is specified using a**decimal**point. - A
`float`

might be printed using scientific notation.

In [7]:

```
3.2 + 2.5
```

Out[7]:

5.7

In [8]:

```
type(3.2 + 2.5)
```

Out[8]:

float

In [9]:

```
# The result is in scientific notation: e+90 means "times 10^90".
2.0 ** 300
```

Out[9]:

2.037035976334486e+90

`float`

¶`floats`

have limited precision; after arithmetic, the final few decimal places can be wrong in unexpected ways.`float`

s have limited size, though the limit is huge.

In [10]:

```
1 + 0.2
```

Out[10]:

1.2

In [11]:

```
1 + 0.1 + 0.1
```

Out[11]:

1.2000000000000002

In [12]:

```
2.0 ** 3000
```

`int`

and `float`

¶- If you mix
`int`

s and`float`

s in an expression, the result will always be a`float`

.- Note that when you divide two
`int`

s, you get a`float`

back.

- Note that when you divide two
- A value can be explicity
**coerced**(i.e. converted) using the`int`

and`float`

functions.

In [13]:

```
2.0 + 3
```

Out[13]:

5.0

In [14]:

```
12 / 2
```

Out[14]:

6.0

In [15]:

```
# Want an integer back.
int(12 / 2)
```

Out[15]:

6

In [16]:

```
# int chops off the decimal point!
int(-2.9)
```

Out[16]:

-2

- A string is a snippet of text of any length.
- In Python, strings are enclosed by either single quotes or double quotes.

In [17]:

```
'woof'
```

Out[17]:

'woof'

In [18]:

```
type('woof')
```

Out[18]:

str

In [19]:

```
"woof"
```

Out[19]:

'woof'

In [20]:

```
# A string, not an int!
"1998"
```

Out[20]:

'1998'

When using the `+`

symbol between two strings, the operation is called "concatenation".

In [21]:

```
s1 = 'baby'
s2 = '🐼'
```

In [22]:

```
s1 + s2
```

Out[22]:

'baby🐼'

In [23]:

```
s1 + ' ' + s2
```

Out[23]:

'baby 🐼'

In [24]:

```
s2 * 3
```

Out[24]:

'🐼🐼🐼'

In [25]:

```
my_cool_string = 'data science is super cool!'
```

In [26]:

```
my_cool_string.title()
```

Out[26]:

'Data Science Is Super Cool!'

In [27]:

```
my_cool_string.upper()
```

Out[27]:

'DATA SCIENCE IS SUPER COOL!'

In [28]:

```
my_cool_string.replace('super cool', '💯' * 3)
```

Out[28]:

'data science is 💯💯💯!'

In [29]:

```
# len is not a method, since it doesn't use dot notation.
len(my_cool_string)
```

Out[29]:

27

- Any value can be converted to a string using
`str`

. - Some strings can be converted to
`int`

and`float`

.

In [30]:

```
str(3)
```

Out[30]:

'3'

In [31]:

```
float('3')
```

Out[31]:

3.0

In [32]:

```
int('4')
```

Out[32]:

4

In [33]:

```
int('baby panda')
```

In [34]:

```
int('4.3')
```

Assume you have run the following statements:

```
x = 3
y = '4'
z = '5.6'
```

Choose the expression that will be evaluated **without** an error.

A. `x + y`

B. `x + int(y + z)`

C. `str(x) + int(y)`

D. `str(x) + z`

E. All of them have errors

- We now know how to store individual numbers (as
`int`

s or`float`

s) and pieces of text (as strings). But we often we'll work with**sequences**, or ordered collections, of several data values.

- For any collection of numbers, say temperatures, it can be helpful to summarize the data by its
**mean**(i.e. average) or**median**.

- Both mean and median are measures of
**central tendency**– that is, they tell us roughly where the "center" of the data falls.

The mean is a one-number summary of a collection of numbers.

For example, the mean of $1$, $4$, $7$, and $12$ is $\frac{1 + 4 + 7 + 12}{4} = 6$.

Observe that the mean:

- Doesn't have to be equal to one of the data points.

- Doesn't have to be an integer, even if all of the data points are integers.

- Is somewhere between the min and max, but not necessarily halfway between.

- Has the same units as the data.

Like the mean, the median is a one-number summary of a collection of numbers.

- To calculate it,
**sort the data points and pick the number in the middle**.- If there are two middle numbers, we usually pick the number halfway between (i.e. the mean of the middle two).

**Example:**- $\text{Median}(1, 4, 7, 12, 32) = 7$
- $\text{Median}(1, 4, 7, 12) = 5.5$

- The mean and median of a dataset can be the same, but they don't need to be. They measure the center of a dataset in two different ways.

- Two different datasets can have the same mean without having the same median, and vice versa.

Find two different datasets that have the same mean and different medians.

Find two different datasets that have the same median and different means.

Find two different datasets that have the same median and the same mean.

Means and medians are just summaries; they don't tell the whole story about a dataset!

In a few weeks, we'll learn about how to visualize the **distribution** of a collection of numbers using a **histogram**.

These two distributions have different means but the same median!

How would we store the temperatures for a week to compute the average temperature?

Our best solution right now is to create a separate variable for each day of the week.

In [35]:

```
temp_sunday = 68
temp_monday = 73
temp_tuesday = 70
temp_wednesday = 74
temp_thursday = 76
temp_friday = 72
temp_saturday = 74
```

This *technically* allows us to do things like compute the average temperature:

```
avg_temperature = 1/7 * (
temp_sunday
+ temp_monday
+ temp_tuesday
+ ...)
```

Imagine a whole month's data, or a whole year's data. It seems like we need a better solution.

In Python, a list is used to store multiple values within a single value. To create a new list from scratch, we use `[`

square brackets`]`

.

In [36]:

```
temperature_list = [68, 73, 70, 74, 76, 72, 74]
```

In [37]:

```
len(temperature_list)
```

Out[37]:

7

Notice that the elements in a list don't need to be unique!

To find the average temperature, we just need to divide the **sum of the temperatures** by the **number of temperatures recorded**:

In [38]:

```
temperature_list
```

Out[38]:

[68, 73, 70, 74, 76, 72, 74]

In [39]:

```
sum(temperature_list) / len(temperature_list)
```

Out[39]:

72.42857142857143

The `type`

of a list is... `list`

.

In [40]:

```
temperature_list
```

Out[40]:

[68, 73, 70, 74, 76, 72, 74]

In [41]:

```
type(temperature_list)
```

Out[41]:

list

Within a list, you can store elements of different types.

In [42]:

```
mixed_list = [-2, 2.5, 'ucsd', [1, 3]]
mixed_list
```

Out[42]:

[-2, 2.5, 'ucsd', [1, 3]]

- Lists are
**very slow**. - This is not a big deal when there aren't many entries, but it's a big problem when there are millions or billions of entries.

NumPy (pronounced "num pie") is a Python library (module) that provides support for

**arrays**and operations on them.The

`babypandas`

library, which you will learn about soon, goes hand-in-hand with NumPy.- NumPy is used heavily in the real world.

To use

`numpy`

, we need to import it. It's usually imported as`np`

(but doesn't have to be!)

In [43]:

```
import numpy as np
```

Think of NumPy arrays (just "arrays" from now on) as fancy, faster lists.

To create an array, we pass a list as input to the `np.array`

function.

In [44]:

```
np.array([4, 9, 1, 2])
```

Out[44]:

array([4, 9, 1, 2])

In [45]:

```
temperature_array = np.array([68, 73, 70, 74, 76, 72, 74])
temperature_array
```

Out[45]:

array([68, 73, 70, 74, 76, 72, 74])

In [46]:

```
temperature_list
```

Out[46]:

[68, 73, 70, 74, 76, 72, 74]

In [47]:

```
# No square brackets, because temperature_list is already a list!
np.array(temperature_list)
```

Out[47]:

array([68, 73, 70, 74, 76, 72, 74])

When people wait in line, each person has a position.

Similarly, each element of an array (and list) has a position.

- Python, like most programming languages, is "0-indexed."
- This means that the position of the first element in an array is 0, not 1.
- One interpretation is that
**an element's position represents the number of elements in front of it**.

- To access the element in array
`arr_name`

at position`pos`

, we use the syntax`arr_name[pos]`

.

In [48]:

```
temperature_array
```

Out[48]:

array([68, 73, 70, 74, 76, 72, 74])

In [49]:

```
temperature_array[0]
```

Out[49]:

68

In [50]:

```
temperature_array[1]
```

Out[50]:

73

In [51]:

```
temperature_array[3]
```

Out[51]:

74

In [52]:

```
# Access the last element.
temperature_array[6]
```

Out[52]:

74

In [53]:

```
# Doesn't work!
temperature_array[7]
```

In [54]:

```
# If a position is negative, count from the end!
temperature_array[-1]
```

Out[54]:

74

Earlier in the lecture, we saw that lists can store elements of multiple types.

In [55]:

```
nums_and_strings_lst = ['uc', 'sd', 1961, 3.14]
nums_and_strings_lst
```

Out[55]:

['uc', 'sd', 1961, 3.14]

**This is not true of arrays – all elements in an array must be of the same type.**

In [56]:

```
# All elements are converted to strings!
np.array(nums_and_strings_lst)
```

Out[56]:

array(['uc', 'sd', '1961', '3.14'], dtype='<U32')

Arrays make it easy to perform the same operation to every element. This behavior is formally known as "broadcasting".

In [57]:

```
temperature_array
```

Out[57]:

array([68, 73, 70, 74, 76, 72, 74])

In [58]:

```
# Increase all temperatures by 3 degrees.
temperature_array + 3
```

Out[58]:

array([71, 76, 73, 77, 79, 75, 77])

In [59]:

```
# Halve all temperatures.
temperature_array / 2
```

Out[59]:

array([34. , 36.5, 35. , 37. , 38. , 36. , 37. ])

In [60]:

```
# Convert all temperatures to Celsius.
(5 / 9) * (temperature_array - 32)
```

Out[60]:

array([20. , 22.77777778, 21.11111111, 23.33333333, 24.44444444, 22.22222222, 23.33333333])

**Note**: In none of the above cells did we actually modify `temperature_array`

! Each of those expressions created a new array.

In [61]:

```
temperature_array
```

Out[61]:

array([68, 73, 70, 74, 76, 72, 74])

To actually change `temperature_array`

, we need to reassign it to a new array.

In [62]:

```
temperature_array = (5 / 9) * (temperature_array - 32)
```

In [63]:

```
# Now in Celsius!
temperature_array
```

Out[63]:

array([20. , 22.77777778, 21.11111111, 23.33333333, 24.44444444, 22.22222222, 23.33333333])

- We can apply arithmetic operations to multiple arrays, provided they have the same length.
- The result is computed
**element-wise**, which means that the arithmetic operation is applied to one pair of elements from each array at a time.

In [64]:

```
a = np.array([4, 5, -1])
b = np.array([2, 3, 2])
```

In [65]:

```
a + b
```

Out[65]:

array([6, 8, 1])

In [66]:

```
a / b
```

Out[66]:

array([ 2. , 1.66666667, -0.5 ])

In [67]:

```
a ** 2 + b ** 2
```

Out[67]:

array([20, 34, 5])

- Strings are used to store text. Enclose them in single or double quotes.
- Lists and arrays are used to store
**sequences**.- Arrays are faster and more convenient for numerical operations.
- Access elements by position, starting at position 0.

- Remember to refer to the resources from the start of lecture!

We'll learn more about arrays and we'll see how to use Python to work with real-world tabular data.