Skip to main content Link Search Menu Expand Document (external link)

Principles of Data Science

DSC 10, Fall 2024 at UC San Diego

Janine Tiefenbruck
she/her

jlobue@ucsd.edu

Lecture(s): (A) MWF 9-9:50AM in Center 109, (B) MWF 10-10:50AM in Center 113, (C) MF 1-1:50PM in Solis 104 (No live lectures at 1PM on Wednesdays)

Tip: When working on assignments, use Ctrl+F on this page to search for a keyword and quickly find the relevant lecture. Click the β€œβœοΈ write” button to open a static version of the lecture for reference, which is much faster than loading it on DataHub. Also, make sure to use the reference sheet to quickly look up babypandas methods and see examples of how they work.

Jump to the current week

Week 0 – Welcome to DSC 10!

Fri Sep 27

LEC 1 Introduction   

CIT 1, BPD 1-3

Keywords: course logistics, syllabus, Little Women demo, Jupyter notebooks, expressions
Sun Sep 29

SUR Welcome Survey

SYL Syllabus Check

PRE Pretest

Week 1 – Python Basics

Mon Sep 30

LEC 2 Variables and Data Types   

BPD 3-5

Keywords: variables, assignment, functions, import, methods, int, float, string

DISC 1 Getting Started with Jupyter Notebooks

Wed Oct 2

LEC 3 Lists and Arrays   

BPD 7-8, CIT 14.1

Keywords: mean, median, lists, arrays, array arithmetic, array methods, np.arange
Thu Oct 3

LAB 0 Expressions and Data Types

Fri Oct 4

LEC 4 DataFrames   

BPD 9

Keywords: read_csv, .get, .assign, .sort_values, .iloc, .loc, .set_index, US states

Week 2 – DataFrames and Visualization

Mon Oct 7

LEC 5 Querying and Grouping   

BPD 10-11

Keywords: Booleans, querying, .shape, &, |, .take, .groupby, aggregation, .drop

DISC 2 Arrays and DataFrames

Wed Oct 9

LEC 6 Data Visualization   

CIT 7.0-7.1

Keywords: numerical vs. categorical, scatter plot, line plot, bar chart, exoplanets

QUIZ 1 Quiz 1 covers Lectures 1-4

Thu Oct 10

LAB 1 Arrays and DataFrames

Fri Oct 11

LEC 7 Distributions and Histograms   

CIT 7.2-7.3

Keywords: distributions, density histograms, binning, total area, overlaid plots
Sun Oct 13

HW 1 Basic Python, Arrays, and DataFrames

Week 3 – Functions and Control Flow

Mon Oct 14

LEC 8 Functions and Applying   

BPD 6, 12

Keywords: functions, arguments, print vs. return, .apply, .reset_index

DISC 3 Querying, Grouping, and Plotting

Wed Oct 16

LEC 9 Grouping on Multiple Columns, Merging

BPD 11, 13

Keywords: .groupby([col_1, col_2, …]), subgroups, MultiIndex, .merge, number of rows
Thu Oct 17

LAB 2 Data Visualizations and Python Functions

Fri Oct 18

LEC 10 Conditional Statements and Iteration

CIT 9.0-9.2

Keywords: in, not, and, or, if, else, elif, for-loops, np.append, accumulator pattern
Sun Oct 20

HW 2 DataFrames, Data Visualization, and Functions

Week 4 – Probability and Simulation

Mon Oct 21

LEC 11 Probability

CIT 9.5

Keywords: event, conditional prob., multiplication and addition rules, independence

DISC 4 Functions, DataFrames, and Control Flow

Wed Oct 23

LEC 12 Simulation

CIT 9.3-9.4

Keywords: np.random.choice, replacement, np.count_nonzero, coin flipping, Monty Hall

QUIZ 2 Quiz 2 covers Lectures 5-10

Thu Oct 24

LAB 3 DataFrames, Control Flow, and Probability

SUR Mid-Quarter Survey

Fri Oct 25

LEC 13 Distributions and Sampling

CIT 10.0-10.4

Keywords: probability vs. empirical distribution, SRS, .sample, parameter, statistic
Sun Oct 27

HW 3 DataFrames, Control Flow, and Probability

Week 5 – Midterm Exam

Mon Oct 28

LEC 14 Midterm Review

DISC 5 Probability and Simulation

Wed Oct 30

EXAM Midterm Exam covers Lectures 1-12

Fri Nov 1

LEC 15 Bootstrapping and Confidence Intervals

CIT 13.0-13.2

Keywords: inference, bootstrapping, resample, np.percentile, confidence interval
Sun Nov 3

PROJ Midterm Project

Week 6 – Confidence Intervals and the Normal Distribution

Mon Nov 4

LEC 16 Confidence Intervals, Center, and Spread

CIT 13.3-13.4

Keywords: interpreting CIs, robust vs. sensitive, center, standard deviation, Chebyshev

DISC 6 Sampling, Bootstrapping, and Confidence Intervals

Wed Nov 6

LEC 17 Standardization and the Normal Distribution

CIT 14.2-14.3

Keywords: Chebyshev, standard units, normal distribution, CDF, inflection points
Thu Nov 7

LAB 4 Simulation, Sampling, & Bootstrapping

Fri Nov 8

LEC 18 The Central Limit Theorem

CIT 14.4-14.5

Keywords: distribution of the sample mean, square root law, CLT-based CIs
Sun Nov 10

HW 4 Simulation, Sampling, Bootstrapping

Week 7 – Central Limit Theorem

Mon Nov 11

No Lecture (Veterans Day πŸŽ–οΈ)

Wed Nov 13

LEC 19 Choosing Sample Sizes, Statistical Models

CIT 14.6, 11.1

Keywords: standard deviation of 0s and 1s, np.random.multinomial, Robert Swain jury

QUIZ 3 Quiz 3 covers Lectures 13, 15-18

Thu Nov 14

LAB 5 Variability and the Normal Distribution

Fri Nov 15

LEC 20 Hypothesis Testing

CIT 11.3

Keywords: null and alternative hypotheses, test statistic, fair or unfair coin
Sun Nov 17

HW 5 The Normal Distribution and the Central Limit Theorem

Week 8 – Hypothesis and Permutation Testing

Mon Nov 18

LEC 21 Hypothesis Testing and Total Variation Distance

CIT 11.2, 11.4

Keywords: fair or unfair coin, p-value, midterm exam scores, Alameda County jury, TVD

DISC 7 The Normal Distribution and the CLT

Wed Nov 20

LEC 22 TVD, Hypothesis Testing, and Permutation Testing

CIT 12.0-12.1

Keywords: confidence intervals for hypothesis testing, body temperature, smoking/babies
Thu Nov 21

LAB 6 Hypothesis Testing

Fri Nov 22

LEC 23 Permutation Testing

CIT 12.3

Keywords: smoking/babies, np.random.permutation, shuffling, Deflategate

Week 9 – Prediction

Mon Nov 25

LEC 24 Correlation

CIT 15.0-15.2

Keywords: association, correlation coefficient (r), predicting heights, regression line (su)

QUIZ 4 Quiz 4 covers Lectures 19-23

Tue Nov 26

HW 6 Hypothesis Testing and Permutation Testing

Wed Nov 27

LEC 25 Regression and Least Squares

CIT 15.2-15.4

Keywords: regression line in original units, outliers, errors, RMSE, best fit, least squares

DISC 8 Hypothesis Testing and Permutation Testing

Fri Nov 29

No Lecture (Thanksgiving πŸ¦ƒ)

Week 10 – Review

Mon Dec 2

LEC 26 Residuals and Inference

CIT 15.5-16.3

Keywords: residuals, residual plots, patterns, datasaurus dozen, prediction intervals

DISC 9 Regression

Tue Dec 3

PROJ Final Project

Wed Dec 4

LEC 27 Review

Thu Dec 5

LAB 7 Regression

Fri Dec 6

LEC 28 Review, Conclusion

Sun Dec 8

EXAM Final Exam (11:30-2:30PM)

SUR SETs and End-of-Quarter Survey (due 8AM)