- Lab 0 is out and is due on
**Tuesday, April 11th at 11:59PM**.- It contains a video 🎥 towards the end: Navigating DataHub and Jupyter Notebooks. Watching it should be a worthwhile investment of your time!

- Please fill out the Welcome Survey!
- You must be present when attendance is taken in discussion to get credit, even if you have a conflicting class.

- We're covering
**a lot**of content very quickly. If you're overwhelmed, just know that we're here to support you!- Ed and office hours are your friends! 🫂

- Remember to check the Resources tab of the course website for programming resources.

- Recap: Data types.
- Strings. 🧶
- Lists.
- Arrays.
- Ranges.

`int`

and `float`

¶- Every value in Python has a
**type**. - There are two numeric data types:
`int`

: An integer of any size.`float`

: A number with a decimal point.

In [1]:

```
# int.
15 - 4
```

Out[1]:

11

In [2]:

```
# float.
6 * 0.2
```

Out[2]:

1.2000000000000002

`int`

and `float`

¶- If you mix
`int`

s and`float`

s in an expression, the result will always be a`float`

.- Note that when you divide two
`int`

s, you get a`float`

back.

- Note that when you divide two
- A value can be explicity
**coerced**(i.e. converted) using the`int`

and`float`

functions.

In [3]:

```
2.0 + 3
```

Out[3]:

5.0

In [4]:

```
12 / 2
```

Out[4]:

6.0

In [5]:

```
# Want an integer back.
int(12 / 2)
```

Out[5]:

6

In [6]:

```
# int chops off the decimal point!
int(-2.9)
```

Out[6]:

-2

- A string is a snippet of text of any length.
- In Python, strings are enclosed by either single quotes or double quotes.

In [7]:

```
'woof'
```

Out[7]:

'woof'

In [8]:

```
type('woof')
```

Out[8]:

str

In [9]:

```
"woof"
```

Out[9]:

'woof'

In [10]:

```
# A string, not an int!
"1998"
```

Out[10]:

'1998'

When using the `+`

symbol between two strings, the operation is called "concatenation".

In [11]:

```
s1 = 'baby'
s2 = '🐼'
```

In [12]:

```
s1 + s2
```

Out[12]:

'baby🐼'

In [13]:

```
s1 + ' ' + s2
```

Out[13]:

'baby 🐼'

In [14]:

```
s2 * 3
```

Out[14]:

'🐼🐼🐼'

- Associated with strings are special functions, called
**string methods**. - Access string methods with a
`.`

after the string ("dot notation").- For instance, to use the
`upper`

method on string`s`

, we write`s.upper()`

.

- For instance, to use the
- Examples include
`upper`

,`title`

, and`replace`

.

In [15]:

```
my_cool_string = 'data science is super cool!'
```

In [16]:

```
my_cool_string.title()
```

Out[16]:

'Data Science Is Super Cool!'

In [17]:

```
my_cool_string.upper()
```

Out[17]:

'DATA SCIENCE IS SUPER COOL!'

In [18]:

```
my_cool_string.replace('super cool', '💯' * 3)
```

Out[18]:

'data science is 💯💯💯!'

In [19]:

```
# len is not a method, since it doesn't use dot notation.
len(my_cool_string)
```

Out[19]:

27

`print`

¶- By default, Jupyter Notebooks display the "raw" value of the expression of the last line in a cell.
- The
`print`

function displays the value in human readable text when it's evaluated.

In [20]:

```
12 # 12 won't be displayed, since Python only shows the value of the last expression.
23
```

Out[20]:

23

In [21]:

```
# Note, there is no Out[number] to the left! That only appears when displaying a non-printed value.
# But both 12 and 23 are displayed.
print(12)
print(23)
```

12 23

In [22]:

```
# '\n' inserts a new line.
my_newline_str = 'Here is a string with two lines.\nHere is the second line!'
my_newline_str
```

Out[22]:

'Here is a string with two lines.\nHere is the second line!'

In [23]:

```
# The quotes disappeared and the newline is rendered!
print(my_newline_str)
```

Here is a string with two lines. Here is the second line!

- Any value can be converted to a string using
`str`

. - Some strings can be converted to
`int`

and`float`

.

In [24]:

```
str(3)
```

Out[24]:

'3'

In [25]:

```
float('3')
```

Out[25]:

3.0

In [26]:

```
int('4')
```

Out[26]:

4

In [27]:

```
int('baby panda')
```

In [28]:

```
int('4.3')
```

Assume you have run the following statements:

```
x = 3
y = '4'
z = '5.6'
```

Choose the expression that will be evaluated **without** an error.

A. `x + y`

B. `x + int(y + z)`

C. `str(x) + int(y)`

D. `str(x) + z`

E. All of them have errors

How would we store today's high temperature in several different cities?

Our best solution right now is to create a separate variable for each city.

In [29]:

```
temp_sandiego = 68
temp_losangeles = 73
temp_sanfrancisco = 60
temp_chicago = 50
temp_newyorkcity = 76
temp_boston = 50
```

This *technically* allows us to do things like compute the average temperature:

```
avg_temperature = 1/6 * (
temp_sandiego
+ temp_losangeles
+ temp_sanfrancisco
+ ...)
```

Imagine we had 10 or 100 cities – there must be a better way!

In Python, a list is used to store multiple values within a single value. To create a new list from scratch, we use `[`

square brackets`]`

.

In [30]:

```
temperature_list = [68, 73, 60, 50, 76, 50]
```

In [31]:

```
len(temperature_list)
```

Out[31]:

6

Notice that the elements in a list don't need to be unique!

To find the average temperature, we just need to divide the **sum of the temperatures** by the **number of temperatures recorded**:

In [32]:

```
temperature_list
```

Out[32]:

[68, 73, 60, 50, 76, 50]

In [33]:

```
sum(temperature_list) / len(temperature_list)
```

Out[33]:

62.833333333333336

The `type`

of a list is... `list`

.

In [34]:

```
temperature_list
```

Out[34]:

[68, 73, 60, 50, 76, 50]

In [35]:

```
type(temperature_list)
```

Out[35]:

list

Within a list, you can store elements of different types.

In [36]:

```
mixed_list = [-2, 2.5, 'ucsd', [1, 3]]
mixed_list
```

Out[36]:

[-2, 2.5, 'ucsd', [1, 3]]

- Lists are
**very slow**. - This is not a big deal when there aren't many entries, but it's a big problem when there are millions or billions of entries.

NumPy (pronounced "num pie") is a Python library (module) that provides support for

**arrays**and operations on them.The

`babypandas`

library, which you will learn about next week, goes hand-in-hand with NumPy.- NumPy is used heavily in the real world.

To use

`numpy`

, we need to import it. It's usually imported as`np`

(but doesn't have to be!)

In [37]:

```
import numpy as np
```

Think of NumPy arrays (just "arrays" from now on) as fancy, faster lists.

To create an array, we pass a list as input to the `np.array`

function.

In [38]:

```
np.array([4, 9, 1, 2])
```

Out[38]:

array([4, 9, 1, 2])

In [39]:

```
temperature_array = np.array([68, 73, 60, 50, 76, 50])
temperature_array
```

Out[39]:

array([68, 73, 60, 50, 76, 50])

In [40]:

```
temperature_list
```

Out[40]:

[68, 73, 60, 50, 76, 50]

In [41]:

```
# No square brackets, because temperature_list is already a list!
np.array(temperature_list)
```

Out[41]:

array([68, 73, 60, 50, 76, 50])

When people stand in a line, each person has a position.

Similarly, each element of an array (and list) has a position.

- Python, like most programming languages, is "0-indexed."
- This means that the position of the first element in an array is 0, not 1.
- One interpretation is that
**an element's position represents the number of elements in front of it**.

- To access the element in array
`arr_name`

at position`pos`

, we use the syntax`arr_name[pos]`

.

In [42]:

```
temperature_array
```

Out[42]:

array([68, 73, 60, 50, 76, 50])

In [43]:

```
temperature_array[0]
```

Out[43]:

68

In [44]:

```
temperature_array[1]
```

Out[44]:

73

In [45]:

```
temperature_array[3]
```

Out[45]:

50

In [46]:

```
# Access the last element.
temperature_array[5]
```

Out[46]:

50

In [47]:

```
# Doesn't work!
temperature_array[6]
```

In [48]:

```
# If a position is negative, count from the end!
temperature_array[-1]
```

Out[48]:

50

Earlier in the lecture, we saw that lists can store elements of multiple types.

In [49]:

```
nums_and_strings_lst = ['uc', 'sd', 1961, 3.14]
nums_and_strings_lst
```

Out[49]:

['uc', 'sd', 1961, 3.14]

**This is not true of arrays – all elements in an array must be of the same type.**

In [50]:

```
# All elements are converted to strings!
np.array(nums_and_strings_lst)
```

Out[50]:

array(['uc', 'sd', '1961', '3.14'], dtype='<U32')

Arrays make it easy to perform the same operation to every element. This behavior is formally known as "broadcasting".

In [51]:

```
temperature_array
```

Out[51]:

array([68, 73, 60, 50, 76, 50])

In [52]:

```
# Increase all temperatures by 3 degrees.
temperature_array + 3
```

Out[52]:

array([71, 76, 63, 53, 79, 53])

In [53]:

```
# Halve all temperatures.
temperature_array / 2
```

Out[53]:

array([34. , 36.5, 30. , 25. , 38. , 25. ])

In [54]:

```
# Convert all temperatures to Celsius.
(5 / 9) * (temperature_array - 32)
```

Out[54]:

array([20. , 22.77777778, 15.55555556, 10. , 24.44444444, 10. ])

**Note**: In none of the above cells did we actually modify `temperature_array`

! Each of those expressions created a new array.

In [55]:

```
temperature_array
```

Out[55]:

array([68, 73, 60, 50, 76, 50])

To actually change `temperature_array`

, we need to reassign it to a new array.

In [56]:

```
temperature_array = (5 / 9) * (temperature_array - 32)
```

In [57]:

```
# Now in Celsius!
temperature_array
```

Out[57]:

array([20. , 22.77777778, 15.55555556, 10. , 24.44444444, 10. ])

- We can apply arithmetic operations to multiple arrays, provided they have the same length.
- The result is computed
**element-wise**, which means that the arithmetic operation is applied to one pair of elements from each array at a time. - For example,
`a + b`

is an array whose first element is the sum of the first element of`a`

and first element of`b`

.

In [58]:

```
a = np.array([4, 5, -1])
b = np.array([2, 3, 2])
```

In [59]:

```
a + b
```

Out[59]:

array([6, 8, 1])

In [60]:

```
a / b
```

Out[60]:

array([ 2. , 1.66666667, -0.5 ])

In [61]:

```
a ** 2 + b ** 2
```

Out[61]:

array([20, 34, 5])

We decided to make a Series of TikToks called "A Day in the Life of a Data Scientist". The number of views we've received on these videos are stored in the array `views`

below.

In [62]:

```
views = np.array([158, 352, 195, 1423916, 46])
```

Some questions:

What was our average view count?

In [63]:

```
views
```

Out[63]:

array([ 158, 352, 195, 1423916, 46])

In [64]:

```
sum(views) / len(views)
```

Out[64]:

284933.4

In [65]:

```
# The mean method exists for arrays (but not for lists).
views.mean()
```

Out[65]:

284933.4

How many views did our most and least popular videos receive?

In [66]:

```
views
```

Out[66]:

array([ 158, 352, 195, 1423916, 46])

In [67]:

```
views.max()
```

Out[67]:

1423916

In [68]:

```
views.min()
```

Out[68]:

46

**above average** did each of our videos receive? How many views above average did our most viewed video receive?

In [69]:

```
views
```

Out[69]:

array([ 158, 352, 195, 1423916, 46])

In [70]:

```
views - views.mean()
```

Out[70]:

array([-284775.4, -284581.4, -284738.4, 1138982.6, -284887.4])

In [71]:

```
(views - views.mean()).max()
```

Out[71]:

1138982.6

In [72]:

```
views
```

Out[72]:

array([ 158, 352, 195, 1423916, 46])

In [73]:

```
views.max() * 0.03 / 1000
```

Out[73]:

42.717479999999995

We often find ourselves needing to make arrays like this:

In [74]:

```
months_in_year = np.array([
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
])
```

There needs to be an easier way to do this!

- A
**range**is an array of evenly spaced numbers. We create ranges using`np.arange`

. - The most general way to create a range is
`np.arange(start, end, step)`

. This returns an array such that:- The first number is
`start`

.**By default,**`start`

is 0. - All subsequent numbers are spaced out by
`step`

, until (but excluding)`end`

.**By default,**`step`

is 1.

- The first number is

In [75]:

```
# Start at 0, end before 8, step by 1.
# This will be our most common use-case!
np.arange(8)
```

Out[75]:

array([0, 1, 2, 3, 4, 5, 6, 7])

In [76]:

```
# Start at 5, end before 10, step by 1.
np.arange(5, 10)
```

Out[76]:

array([5, 6, 7, 8, 9])

In [77]:

```
# Start at 3, end before 32, step by 5.
np.arange(3, 32, 5)
```

Out[77]:

array([ 3, 8, 13, 18, 23, 28])

In [78]:

```
# Steps can be fractional!
np.arange(-3, 2, 0.5)
```

Out[78]:

array([-3. , -2.5, -2. , -1.5, -1. , -0.5, 0. , 0.5, 1. , 1.5])

In [79]:

```
# If step is negative, we count backwards.
np.arange(1, -10, -3)
```

Out[79]:

array([ 1, -2, -5, -8])

🎉 Congrats! 🎉 You won the lottery 💰. Here's how your payout works: on the first day of January, you are paid \$0.01. Every day thereafter, your pay doubles, so on the second day you're paid \\$0.02, on the third day you're paid \$0.04, on the fourth day you're paid \\$0.08, and so on.

January has 31 days.

Write a **one-line expression** that uses the numbers `2`

and `31`

, along with the function `np.arange`

and the method `.sum()`

, that computes the total amount **in dollars** you will be paid in January.

In [80]:

```
...
```

Out[80]:

Ellipsis

- Strings are used to store text. Enclose them in single or double quotes.
- Lists and arrays are used to store
**sequences**.- Arrays are faster and more convenient for numerical operations.
- You can easily perform numerical operations on all elements of an array and perform operations on multiple arrays.

- Ranges are arrays of equally-spaced numbers.
- Remember to refer to the resources from the start of lecture!

We'll learn about how to use Python to work with real-world tabular data.