- Lab 0 is released and is due
**Saturday at 11:59PM**.- It contains a video 🎥 towards the end, Navigating DataHub and Jupyter Notebooks. Watching this video should be a worthwhile investment of your time! ⌚

- Fill out the Beginning of Quarter Survey.
- Have a question? Please contact us on EdStem instead of email.

- We're covering
**a lot**of content very quickly. If you're overwhelmed, just know that we're here to support you!- Office hours and EdStem are your friends!

- Remember to check the Resources tab of the course website for programming resources.

- Strings. 🧶
- Lists.
- Arrays.
- Ranges.

- A string is a snippet of text of any length.
- In Python, strings are enclosed by either single quotes or double quotes.

In [1]:

```
'woof'
```

Out[1]:

'woof'

In [2]:

```
type('woof')
```

Out[2]:

str

In [3]:

```
"woof"
```

Out[3]:

'woof'

In [4]:

```
# A string, not an int!
"1998"
```

Out[4]:

'1998'

When using the `+`

symbol between two strings, the operation is called "concatenation".

In [5]:

```
s1 = 'baby'
s2 = '🐼'
```

In [6]:

```
s1 + s2
```

Out[6]:

'baby🐼'

In [7]:

```
s1 + ' ' + s2
```

Out[7]:

'baby 🐼'

In [8]:

```
s2 * 3
```

Out[8]:

'🐼🐼🐼'

- Strings are associated with certain functions called
**string methods**. - Access string methods with a
`.`

after the string ("dot notation").- For instance, to use the
`upper`

method on string`s`

, we write`s.upper()`

.

- For instance, to use the
- Examples include
`upper`

,`title`

, and`replace`

.

In [9]:

```
my_cool_string = 'data science is super cool!'
```

In [10]:

```
my_cool_string.title()
```

Out[10]:

'Data Science Is Super Cool!'

In [11]:

```
my_cool_string.upper()
```

Out[11]:

'DATA SCIENCE IS SUPER COOL!'

In [12]:

```
my_cool_string.replace('super cool', '💯' * 3)
```

Out[12]:

'data science is 💯💯💯!'

In [13]:

```
# len is not a method, since it doesn't use dot notation
len(my_cool_string)
```

Out[13]:

27

Single quotes and double quotes are usually interchangeable, except when the string itself contains a single or double quote.

In [14]:

```
'my string's full of apostrophes!'
```

In [15]:

```
"my string's full of apostrophes!"
```

Out[15]:

"my string's full of apostrophes!"

In [16]:

```
# escape the apostrophe with a backslash!
'my string\'s "full" of apostrophes!'
```

Out[16]:

'my string\'s "full" of apostrophes!'

In [17]:

```
print('my string\'s "full" of apostrophes!')
```

my string's "full" of apostrophes!

`print`

¶- By default Jupyter notebooks display the "raw" value of the expression of the last line in a cell.
- The
`print`

function displays the value in human readable text when it's evaluated.

In [18]:

```
12 # 12 won't be displayed, since Python only shows the value of the last expression
23
```

Out[18]:

23

In [19]:

```
# Note, there is no Out[number] to the left! That only appears when displaying a non-printed value.
# But both 12 and 23 are displayed.
print(12)
print(23)
```

12 23

In [20]:

```
# '\n' inserts a new line
my_newline_str = 'here is a string with two lines.\nhere is the second line'
my_newline_str
```

Out[20]:

'here is a string with two lines.\nhere is the second line'

In [21]:

```
# The quotes disappeared and the newline is rendered!
print(my_newline_str)
```

here is a string with two lines. here is the second line

- Any value can be converted to a string using
`str`

. - Some strings can be converted to
`int`

and`float`

.

In [22]:

```
str(3)
```

Out[22]:

'3'

In [23]:

```
float('3')
```

Out[23]:

3.0

In [24]:

```
int('4')
```

Out[24]:

4

In [25]:

```
int('baby panda')
```

Assume you have run the following statements:

```
x = 3
y = '4'
z = '5.6'
```

Choose the expression that will be evaluated **without** an error.

A. `x + y`

B. `x + int(y + z)`

C. `str(x) + int(y)`

D. `str(x) + z`

E. All of them have errors

In [ ]:

```
```

How would we store the temperatures for each of the first 6 days in the month of September?

Our best solution right now is to create a separate variable for each day.

In [26]:

```
temperature_on_sept_01 = 84
temperature_on_sept_02 = 78
temperature_on_sept_03 = 81
temperature_on_sept_04 = 75
temperature_on_sept_05 = 79
temperature_on_sept_06 = 75
```

*technically* allows us to do things like compute the average temperature through the first 6 days:

```
avg_temperature = 1/6 * (
temperature_on_sept_01
+ temperature_on_sept_02
+ temperature_on_sept_03
+ ...)
```

Imagine a whole month's data, or a whole year's data. It seems like we need a better solution.

In Python, a list is used to store multiple values in a single value/variable. To create a new list from scratch, we use `[`

square brackets`]`

.

In [27]:

```
temperature_list = [84, 78, 81, 75, 79, 75]
```

In [28]:

```
len(temperature_list)
```

Out[28]:

6

Notice that the elements in a list don't need to be unique!

To find the average temperature, we just need to divide the **sum of the temperatures** by the **number of temperatures recorded**:

In [29]:

```
temperature_list
```

Out[29]:

[84, 78, 81, 75, 79, 75]

In [30]:

```
sum(temperature_list) / len(temperature_list)
```

Out[30]:

78.66666666666667

The `type`

of a list is... `list`

.

In [31]:

```
temperature_list
```

Out[31]:

[84, 78, 81, 75, 79, 75]

In [32]:

```
type(temperature_list)
```

Out[32]:

list

Within a list, you can store elements of different types.

In [33]:

```
mixed_list = [-2, 2.5, 'ucsd', [1, 3]]
mixed_list
```

Out[33]:

[-2, 2.5, 'ucsd', [1, 3]]

- Lists are
**very slow**. - This is not a big deal when there aren't many entries, but it's a big problem when there are millions or billions of entries.

NumPy (pronounced "num pie") is a Python library (module) that provides support for

**arrays**and operations on them.The

`babypandas`

library, which you will learn about next week, goes hand-in-hand with NumPy.- NumPy is used heavily in the real world.

To use

`numpy`

, we need to import it. It's usually imported as`np`

(but doesn't have to be!)

In [34]:

```
import numpy as np
```

Think of NumPy arrays (just "arrays" from now on) as fancy, faster lists.

To create an array, we pass a list as input to the `np.array`

function.

In [35]:

```
np.array([4, 9, 1, 2])
```

Out[35]:

array([4, 9, 1, 2])

In [36]:

```
temperature_array = np.array([84, 78, 81, 75, 79, 75])
temperature_array
```

Out[36]:

array([84, 78, 81, 75, 79, 75])

In [37]:

```
temperature_list
```

Out[37]:

[84, 78, 81, 75, 79, 75]

In [38]:

```
# No square brackets, because temperature_list is already a list!
np.array(temperature_list)
```

Out[38]:

array([84, 78, 81, 75, 79, 75])

When people stand in a line, each person has a position.

Similarly, each element of an array (and list) has a position.

- Python, like most programming languages, is "0-indexed."
- This means that the position of the first element in an array is 0, not 1.
- One reason: an element's position represents the number of elements in front of it.

- To access the element in array
`arr_name`

at position`pos`

, we use the syntax`arr_name[pos]`

.

In [39]:

```
temperature_array
```

Out[39]:

array([84, 78, 81, 75, 79, 75])

In [40]:

```
temperature_array[0]
```

Out[40]:

84

In [41]:

```
temperature_array[1]
```

Out[41]:

78

In [42]:

```
temperature_array[3]
```

Out[42]:

75

In [43]:

```
# Access last element
temperature_array[5]
```

Out[43]:

75

In [44]:

```
temperature_array[6]
```

In [45]:

```
# If a position is negative, count from the end!
temperature_array[-1]
```

Out[45]:

75

Earlier in the lecture, we saw that lists can store elements of multiple types.

In [46]:

```
nums_and_strings_lst = ['uc', 'sd', 1961, 3.14]
nums_and_strings_lst
```

Out[46]:

['uc', 'sd', 1961, 3.14]

**This is not true of arrays – all elements in an array must be of the same type.**

In [47]:

```
# All elements are converted to strings!
np.array(nums_and_strings_lst)
```

Out[47]:

array(['uc', 'sd', '1961', '3.14'], dtype='<U32')

Arrays make it easy to perform the same operation to every element. This behavior is formally known as "broadcasting".

In [48]:

```
temperature_array
```

Out[48]:

array([84, 78, 81, 75, 79, 75])

In [49]:

```
# Increase all temperatures by 3 degrees
temperature_array + 3
```

Out[49]:

array([87, 81, 84, 78, 82, 78])

In [50]:

```
# Halve all temperatures
temperature_array / 2
```

Out[50]:

array([42. , 39. , 40.5, 37.5, 39.5, 37.5])

In [51]:

```
# Convert all temperatures to Celsius
(5 / 9) * (temperature_array - 32)
```

Out[51]:

array([28.88888889, 25.55555556, 27.22222222, 23.88888889, 26.11111111, 23.88888889])

**Note:** In none of the above cells did we actually modify `temperature_array`

! Each of those expressions created a new array.

In [52]:

```
temperature_array
```

Out[52]:

array([84, 78, 81, 75, 79, 75])

To actually change `temperature_array`

, we need to reassign it to a new array.

In [53]:

```
temperature_array = (5 / 9) * (temperature_array - 32)
```

In [54]:

```
# Now in Celsius!
temperature_array
```

Out[54]:

array([28.88888889, 25.55555556, 27.22222222, 23.88888889, 26.11111111, 23.88888889])

- We can apply arithmetic operations to multiple arrays, provided they have the same length.
- The result is computed
**element-wise**, which means that the arithmetic operation is applied to one pair of elements from each array at a time. - For example,
`a + b`

is an array whose first element is the sum of the first element of`a`

and first element of`b`

.

In [55]:

```
a = np.array([1, 2, 3])
b = np.array([-4, 5, 9])
```

In [56]:

```
a + b
```

Out[56]:

array([-3, 7, 12])

In [57]:

```
a / b
```

Out[57]:

array([-0.25 , 0.4 , 0.33333333])

In [58]:

```
a ** 2 + b ** 2
```

Out[58]:

array([17, 29, 90])

Baby Panda made a series five TikTok videos called "A Day in the Life of a Data Science Mascot". The number of views they've received on these videos are stored in the array `views`

below.

In [59]:

```
views = np.array([158, 352, 195, 1423916, 46])
```

Some questions:

What was their average view count?

In [60]:

```
views
```

Out[60]:

array([ 158, 352, 195, 1423916, 46])

In [61]:

```
sum(views) / len(views)
```

Out[61]:

284933.4

In [62]:

```
# The mean method exists for arrays (but not for lists)
views.mean()
```

Out[62]:

284933.4

How many views did their most and least popular videos receive?

In [63]:

```
views
```

Out[63]:

array([ 158, 352, 195, 1423916, 46])

In [64]:

```
views.max()
```

Out[64]:

1423916

In [65]:

```
views.min()
```

Out[65]:

46

**above average** did each of their videos receive? How many views above average did their most viewed video receive?

In [66]:

```
views
```

Out[66]:

array([ 158, 352, 195, 1423916, 46])

In [67]:

```
views - views.mean()
```

Out[67]:

array([-284775.4, -284581.4, -284738.4, 1138982.6, -284887.4])

In [68]:

```
(views - views.mean()).max()
```

Out[68]:

1138982.6

In [69]:

```
views
```

Out[69]:

array([ 158, 352, 195, 1423916, 46])

In [70]:

```
views.max() * 0.03 / 1000
```

Out[70]:

42.717479999999995

We often find ourselves needing to make arrays like this:

In [71]:

```
days_in_september = np.array([
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, 26, 27, 28, 29, 30
])
```

There needs to be an easier way to do this!

- A
**range**is an array of evenly spaced numbers. We create ranges using`np.arange`

. - The most general way to create a range is
`np.arange(start, end, step)`

. This returns an array such that:- The first number is
`start`

.**By default,**`start`

is 0. - All subsequent numbers are spaced out by
`step`

, until (but excluding)`end`

.**By default,**`step`

is 1.

- The first number is

In [72]:

```
# Start at 0, end before 8, step by 1
# This will be our most common use-case!
np.arange(8)
```

Out[72]:

array([0, 1, 2, 3, 4, 5, 6, 7])

In [73]:

```
# Start at 5, end before 10, step by 1
np.arange(5, 10)
```

Out[73]:

array([5, 6, 7, 8, 9])

In [74]:

```
# Start at 3, end before 32, step by 5
np.arange(3, 32, 5)
```

Out[74]:

array([ 3, 8, 13, 18, 23, 28])

In [75]:

```
# Steps can be fractional!
np.arange(-3, 2, 0.5)
```

Out[75]:

array([-3. , -2.5, -2. , -1.5, -1. , -0.5, 0. , 0.5, 1. , 1.5])

In [76]:

```
# If step is negative, we count backwards.
np.arange(1, -10, -3)
```

Out[76]:

array([ 1, -2, -5, -8])

🎉 Congrats! 🎉 You won the lottery 💰. Here's how your payout works: on the first day of September, you are paid \$0.01. Every day thereafter, your pay doubles, so on the second day you're paid \\$0.02, on the third day you're paid \$0.04, on the fourth day you're paid \\$0.08, and so on.

September has 30 days.

Write a **one-line expression** that uses the numbers `2`

and `30`

, along with the function `np.arange`

and the method `.sum()`

, that computes the total amount **in dollars** you will be paid in September.

In [77]:

```
...
```

Out[77]:

Ellipsis

- Strings are used to store text. Enclose them in single or double quotes.
- Lists and arrays are used to store
**sequences**.- Arrays are faster and more convenient for numerical operations.
- You can easily perform numerical operations on all elements of an array and perform operations on multiple arrays.

- Ranges are arrays of equally-spaced numbers.
- Remember to refer to the resources from the start of lecture!

We'll learn about how to use Python to work with real-world tabular data.