Announcements¶
- Research Assessment 2 happening during lab section today (for those getting the extra credit).
- SETs, End-of-Quarter Survey, and Research Post-Survey due tomorrow at 8am.
- Final exam tomorrow at 8am. You should have received an email with your seat assignment now. Contact us ASAP if you haven't.
Agenda¶
- Working on personal projects.
- Some parting thoughts.
- More review of old exam problems.
Personal projects¶
Using Jupyter Notebooks after DSC 10¶
- You may be interested in working on data science projects of your own.
- In this video, we show you how to make blank notebooks and upload datasets of your own to DataHub.
- After this quarter, depending on the classes you're enrolled in, you may not have access to DataHub. Eventually, you'll want to install Jupyter Notebooks on your computer.
- Anaconda is a great way to do that, as it also installs many commonly used packages.
- You'll install Jupyter locally in DSC 80 as well, if you take that class.
- You may want to download your work from DataHub so you can refer to it after the course ends (though you can look at it on Gradescope too).
- Remember, all
babypandascode is regularpandascode, too!
Finding data¶
These sites allow you to search for datasets (in CSV format) from a variety of different domains. Some may require you to sign up for an account; these are generally reputable sources.
Note that all of these links are also available at rampure.org/find-datasets.
- Data is Plural
- FiveThirtyEight.
- CORGIS.
- Kaggle Datasets.
- Google’s dataset search.
- DataHub.io.
- Data.world.
- R datasets.
- Wikipedia. (Use this site to extract and download tables as CSVs.)
- Awesome Public Datasets GitHub repo.
- Links to even more sources.
Domain-specific sources of data¶
- Sports: Basketball Reference, Baseball Reference, etc.
- US Government Sources: census.gov, data.gov, data.ca.gov, data.sfgov.org, FBI’s Crime Data Explorer, Centers for Disease Control and Prevention.
- Global Development: data.worldbank.org, databank.worldbank.org, WHO.
- Transportation: New York Taxi trips, Bureau of Transportation Statistics, SFO Air Traffic Statistics.
- Music: Spotify Charts.
- COVID: Johns Hopkins.
- Any Google Forms survey you’ve administered! (Go to the results spreadsheet, then go to “File > Download > Comma-separated values”.)
Tip: if a site only allows you to download a file as an Excel file, not a CSV file, you can download it, open it in a spreadsheet viewer (Excel, Numbers, Google Sheets), and export it to a CSV.
Parting thoughts¶
From Lecture 1: What is "data science"?¶
Data science is about drawing useful conclusions from data using computation. Throughout the quarter, we touched on several aspects of data science:
- In the first 4 weeks, we used Python to explore data.
- Lots of visualization 📈📊 and "data manipulation", using industry-standard tools.
- In the next 4 weeks, we used data to infer about a population, given just a sample.
- Rely heavily on simulation, rather than formulas.
- In the last 2 weeks, we used data from the past to predict what may happen in the future.
- A taste of machine learning 🤖.
- In future DSC courses – including DSC 20 and 40A – you'll revisit all three of these aspects of data science.
- But you're already prepared to start doing cool things with data!
(Sam's) Career Advice (aka: Grades Matter Less Than You Think)¶
(Adapted from a talk originally by Kayvon Fatahalian)
Every year I have many students ask me how to stand out (so that they can get an internship / research lab position / job).
- The question underneath the question is: how do I make the most out of undergrad?
- This advice is my response to that question.
Take everything I say with a grain of salt since this is n=1.
What many students believe about doing well in undergrad¶
- Good undergrad student
- Works really hard to get As
- High GPA looks good on resume
- Hand out resume at job fair
- Get interview
- Do well in interview
- Good job!
What actually happens¶
- Good undergrad student
- Works really hard to get As in ALL of their CS/DS classes and takes on a DOUBLE / TRIPLE major.
- (Most waking hours spent on coursework, no sleep.)
- Hand out MANY resumes at job fair with almost 4.0 GPA
- Get 1% interview invitation rate
- Do well in interviews
- Good job (eventually, but man, that was brutal).
My central claim: maximizing grades will not help you make the most of your limited time at UCSD.¶
Imagine you're at a company reviewing resumes. Which resume would you prefer?
| Resume A | Resume B | |
|---|---|---|
| Major | Data Science + Econ double major | Data Science |
| Time to Graduate | 3 years | 4.5 years |
| School | Stanford | UCSD |
| GPA | 4.0 | 3.2 |
| Experience | Class projects, a few club projects listed on resume | Built and deployed an AI tutor for 200 students; first author on a published research paper; has a high-quality personal website describing their projects. |
(I would choose Resume B. Every other smart person I know would do the same.)
A brutally honest window into how my lab reviews applications:¶
- Major and minor: we don't look at it.
- GPA: don't look at it if it's above 3.0.
- Class projects: skip past these.
The main questions we ask are:
- Did you do at least one interesting thing outside of class?
- Could we imagine ourselves nerding out together about that thing you did? (Basically, is what you did relevant to our research?)
A brutally honest window into cold emailing¶
I (and most other profs I know) get many emails a week that look something like this:
- I saw your website and your work looks really interesting.
- I'm interested in exploring research and want to join your lab.
- Here's my resume.
Consider this email instead (which is a summary of an email I actually got from a student who joined my lab). My reactions are in parentheses right below.
- I really like teaching, I've tutored a bunch of times already.
- (Nice, tutoring means he's actually invested in teaching.)
- I've spent the last year creating math animations for a class in the Math department, and the instructor liked them so much he let me guest lecture in his class.
- (Wow, animations are pretty hard to make so I wonder how he built them. They must be pretty good if another faculty was willing to share class time.)
- I know applications for your group are already closed, but I just wanted to chat about what you're working on.
- (Same here, it seems like I would have something to learn from his experiences!)
So here's the main takeaway for cold emailing: give before you take.
There’s only one way to start building this kind of network, which is to act with generosity towards others before you know whether it will be reciprocated. Generosity can be in the form of attention, e.g. being an unusually responsive conversational partner. It can be in the form of time, e.g. going out of your way to meet somebody where it’s convenient for them, or helping them out with small tasks. Or it can be in the form of resources — it’s really hard to go wrong with buying someone lunch.
https://usefulfictions.substack.com/p/how-to-increase-your-surface-area
How I got started in undergrad¶
- Took an intro to CS class at UC Berkeley
- Liked it a lot, stopped going to all of my other classes (that didn't require attendance).
- Spent a LOT of time outside of class learning more CS just for fun.
- Talked with the TAs a bunch. Realized that they were grading assignments by hand (!) which took a long time.
- Built a crappy autograder that only worked for one question. But it worked!
- TAs told professor: "You have to hire this guy, he's literally contributing more to the course than I am."
Okay, so what should I do?¶
- Meet your basic needs: health, food, sleep.
- Find something that 1) you love. 2) you're good at. 3) helps others. If you don't know what that is yet, then you need to explore more options. Take classes that just seem interesting. Talk to other students doing stuff you think is cool.
- Find a small problem to solve. Solve it as best as you can. It doesn't have to work! E.g. my crappy autograder, which no one ever used.
- Find someone who knows more about the problem than you (e.g. PhD student, TA, prof, etc.) and share about what you worked on.
- Repeat steps 1-3 until you get lucky.
And yes, it takes luck, but this is how you increase your luck.
Want more advice?¶
Watch this video: The actual reason you can't get a job
More review¶
Ask some questions! 🙋♀️🙋♂️¶
We'll take some time to work on past exam questions. Feel free to ask about specific topics you want more practice with, or specific problems you've tried that you want me to explain.
Thank you!¶
- This course would not have been possible without...
- Graduate TA: Hemanth Bodala.
- Graduate Reader: Minchan Kim.
- Undergraduate tutors: Austin Flippo, Avi Mehta, Bianca Grunbaum, Ella Li, Jeffrey Kang, Kate Feng, Michelle Hong, Raymond Williams, Sofia Tkachenko.
- Learn more about tutoring – it's fun, and you can be a tutor as early as your 3rd quarter at UCSD!
- Keep in touch! dsc10.com/staff
- After grades are released, we'll make a post on Ed where you can ask course staff for advice on courses, data science, and UCSD more generally