Introducing BrightSpace!
HW solutions will be posted on BrightSpace
HW late tokens will be tracked on BrightSpace
Tutorial solutions will be posted on BrightSpace
Tutorial attendance will be shown on BrightSpace
GitHub lecture demos, course videos, course notes are now released for the whole term so you can look at things in advance!
Note: slides will have some slight changes (mostly announcements).
| Model | Parameters and hyperparameters | Strengths | Weaknesses |
|---|---|---|---|
| Decision Trees | |||
| KNNs | |||
| SVM RBF |
| Aspect | Decision Trees | K-Nearest Neighbors (KNN) | Support Vector Machines (SVM) with RBF Kernel |
|---|---|---|---|
| Sensitivity to Outliers | |||
| Memory Usage | |||
| Training Time | |||
| Prediction Time |
You’re trying to find a suitable date based on:
| Person | Age | #FB Friends | Euclidean Distance Calculation | Distance |
|---|---|---|---|---|
| A | 25 | 400 | \(\sqrt{5^2 + 150^2}\) | 150.08 |
| B | 27 | 300 | \(\sqrt{3^2 + 50^2}\) | 50.09 |
| C | 30 | 500 | \(\sqrt{0^2 + 250^2}\) | 250.00 |
| D | 60 | 250 | \(\sqrt{30^2 + 0^2}\) | 30.00 |
Based on the distances, the two nearest neighbors (2-NN) are:
What’s the problem here?
Participate using Agora (code: canvas)
Take a guess: In your machine learning project, how much time will you typically spend on data preparation and transformation?
The question is adapted from here.
Participate using Agora (code: canvas)
Select all of the following statements which are TRUE.
StandardScaler ensures a fixed range (i.e., minimum and maximum values) for the features.StandardScaler calculates mean and standard deviation for each feature separately.SimpleImputer The transformed data has a different shape than the original data.Participate using Agora (code: canvas)
Select all of the following statements which are TRUE.
scikit-learn pipeline object with an estimator as the last step, you can call fit, predict, and score on it.scikit-learn pipeline.Fill in missing data using a chosen strategy:
Imputation is like filling in your average or median or most frequent grade for an assessment you missed.
Ensure all features have a comparable range.
Scaling is like adjusting the number of everyone’s Facebook friends so that both the number of friends and their age are on a comparable scale. This way, one feature doesn’t dominate the other when making comparisons.
Convert categorical features into binary columns.
Turn “Apple, Banana, Orange” into binary columns:
| Fruit | 🍎 | 🍌 | 🍊 |
|---|---|---|---|
| Apple 🍎 | 1 | 0 | 0 |
| Banana 🍌 | 0 | 1 | 0 |
| Orange 🍊 | 0 | 0 | 1 |
Convert categories into integer values that have a meaningful order.
Turn “Poor, Average, Good” into 1, 2, 3:
| Rating | Ordinal |
|---|---|
| Poor | 1 |
| Average | 2 |
| Good | 3 |
sklearn Transformers vs Estimatorsfit and transform methods.
fit(X): Learns parameters from the data.transform(X): Applies the learned transformation to the data.SimpleImputer): Fills missing values.StandardScaler): Standardizes features.fit and predict methods.
fit(X, y): Learns from labeled data.predict(X): Makes predictions on new data.DecisionTreeClassifier, SVC, KNeighborsClassifiersklearn PipelinesChaining a StandardScaler with a KNeighborsClassifier model.
Let’s take a break!
/