Skip to main content
Back to top
Ctrl
+
K
Introduction
Part I: Exploring Data
1. What is Data Science?
2. Data Science Case Study
3. Programming in Python
3.1. Operations
3.2. Assignment Statements
3.3. Data Types
3.4. Comparisons
3.5. Functions
Built-In Functions and Methods
User-Defined Functions
4. Collections of Data
4.1. Lists
4.2. Dictionaries
4.3. Arrays
4.4. Assignment for Mutable Data Types
5. Randomness and Control Statements
5.1. Random Choice
5.2. Conditional Statements
5.3. Iteration and Simulation
5.4. While Loops
6. DataFrames
6.1. Creating a DataFrame
6.2. Accessing Columns
6.3. Column Operations
6.4. Accessing Rows
6.5. Selection by Label
6.6. Selection by Condition
7. DataFrame Methods and Operations
7.1. Applying Functions
7.2. Merging Data
7.3. Grouping Data
7.4. Pivot Tables
8. String Data and Fuzzy Matching
8.1. Set-Based (Jaccard) Similarity
8.2. Sequence-Based Similarlity
8.3. Canonicalization
8.4. Reduced Alphabet Similarity
8.5. Example: Building Inspection reports
8.6. Encoding and Unicode
9. Data Visualization
9.1. Introduction to Matplotlib
9.2. Numerical Data
9.3. Categorical Data
9.4. Other Visualization Techniques
10. Data Collection
10.1. Causality versus Association
10.2. Observation versus Experimental Studies
10.3. Sampling
10.4. Biases
11. Probability
11.1. Definitions and Rules
11.2. A Simulation-Based Solution
11.3. Mathematical Derivation vs Computational Estimation
11.4. The Birthday Problem: Relaxed Assumptions
12. Empirical and Probability Distributions
12.2. Distributions Overview
12.3. Uniform Distribution
12.4. Normal Distribution
12.5. Binomial Distribution
13. Hypothesis Testing
13.1. Evaluating Consistency Between Data and a Model
13.2. Hypothesis Testing
13.3. Two-Sample Testing
13.4. Categorical Data
13.5. Connections with Classical Statistical Methods
14. Estimation and Confidence Intervals
14.1. Theoretical Justification for Confidence Intervals
14.2. The Bootstrap
14.3. Percentile Bootstrap Confidence Intervals
15. Ethics and Pitfalls in Data Science
15.1. Data Ethics and the Law
15.2. Pillar 1: Data Transparency & Accountability
15.3. Pillar 2: Data Privacy
15.4. Pillar 3: Informed Consent
15.5. Pillar 4: Mitigating Unintended Consequences
16. Traffic Stops Case Study
16.1. Study Background
16.2. Investigating Traffic Stops
Part II: Using Data to Understand Our World
17. Prediction and Correlation
17.1. Prediction
17.2. Correlation
18. 18. Simple Linear Regression
18.1. Correlation vs. Regression
18.3. Regression and Objective Functions
18.6. Hypothesis Testing and Confidence Intervals
18.15. Linear Regression and Prediction
19. Multiple Linear Regression
20. Feature Engineering and Feature Selection
20.1. Feature Engineering
20.2. Feature Selection
21. Regularization and Cross Validation
22. Classification with Logistic Regression
23. Classification with Nearest Neighbors
23.1. Nearest Neighbor
23.2. K Nearest Neighbors
23.3. Features to Consider
23.4. Multiple Classes
24. Neural Networks and Deep Learning
25. Tree Models
26. Unsupervised Learning and Clustering
26.1. K-Means Clustering
26.2. K-Means Clustering Example
26.3. Hierarchical Clustering
26.4. Hierarchical Clustering Examples
27. Data Wrangling
28. Relational Databases and SQL
29. Reproducibility
30. Case Study
.ipynb
.pdf
Relational Databases and SQL
28.
Relational Databases and SQL
#
Forthcoming…