This course introduces descriptive statistics and probability, including measures of location and spread, random variables, distributions, parameters, categorical variables, and uncertainty.
By the end of the course, students are expected to:
- Provide fundamental concepts in probability, including conditional, joint, and marginal distributions.
- Develop a statistical view of data coming from a probability distribution.
- Compute summary statistics, such as expected value and variance, of simple discrete and continuous probability distributions.
- Compare/contrast location summary statistics such as mean/median/mode/quantiles.
- Estimate summary statistics such as mean/median/variance from a plot of a distribution's PDF or CDF.
- Identify common continuous distributions such as Gaussian/Poisson/uniform from a plot of a distribution's PDF or CDF.
- Match common discrete distributions such as Bernoulli/binomial/multinomial to descriptions.
- Compare/contrast conditional, joint and marginal distributions.
- Explain the notion of "marginalizing out" a random variable.
- Identify independence between random variables from plots/tables of conditional/joint/marginal distributions.
- Connect conditional distributions to the notion of supervised learning.
- Explain the concept of maximum likelihood estimation.
- Identify the units of various quantities such as mean/variance/density for continuous distributions.
- Simulate sample generation from probability distributions, and interpret the results.
This course occurs during Block 1 in the 2024/25 school year. The course notes can be accessed here.
Lecture Topic/Notes | Required Readings | Optional Readings |
---|---|---|
Depicting Uncertainty | lecture1 notes |
Part 1: Core Probability |
Parametric Families | lecture2 notes |
Part 2: Random Variables |
Joint Probability | lecture3 notes |
Part 3: Probabilistic Models, Chapter 5.1, Covariance and correlation (video), How would you explain covariance ... |
Conditional Probabilities | lecture4 notes |
Part 3: Probabilistic Models, Chapter 5.3 |
Continuous Distributions | lecture5 notes |
Chapter 4 |
Common Distribution Families and Conditioning | lecture6 notes |
Part 2: Random Variables |
Maximum Likelihood Estimation | lecture7 notes |
Part 5: Machine Learning,Beyond Multiple Linear Regression, sections 2.1 to 2.4, Chapter 7.1 & 7.2 |
Simulation and Empirical Distributions | lecture8 notes |
Chapter 9: Applications to Computing |
Here is a cheat sheet we created to summarize the main formulas and concepts covered in DSCI 551.
This is an assignment-based course. The following deliverables will determine your course grade:
Assessment | Weight |
---|---|
Lab Assignment 1 | 12% |
Lab Assignment 2 | 12% |
Lab Assignment 3 | 12% |
Lab Assignment 4 | 12% |
Quiz 1 | 25% |
Quiz 2 | 25% |
Lecture Attendance (iClicker) | 2% |
LLMs, such as ChatGPT, can be helpful tools if we use them responsibly. In this course, students are permitted to use these tools to gather more information, review concepts, or brainstorm, and students must cite these tools if they use them for assignment. Having said all this, it is not permitted to write any given assignment via copying and pasting AI-generated responses.
Note: Some of these resources cover much more material than DSCI 551.
- Introduction to Probability for Data Science
- Course Reader CS109 Stanford by Chris Piech
- Chapter 3: Probability and Information Theory,from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016)
- Probability & Statistics with Applications to Computing by Alex Tsun, Stanford
- JBstatistics (also on YouTube)
- Introduction to Probability, Statistics, and Random Processes
- Harvard STAT 110 course, YouTube videos
- Probability Cheatsheet
- Word problems for conditional probability
See the general MDS policies.
The course is built upon previous years' materials developed by previous instructors.
© 2024 Vincenzo Coia, Mike Gelbart, Aaron Berk, Alexi Rodríguez-Arelis, and Vincent Liu.
Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.