Three slides left over from last class:
C
and gamma
in SVM RBFC
and gamma
in SVM RBFC
(Regularization): Controls the trade-off between perfect training accuracy and having a simpler decision boundary.
Gamma
(Kernel Width): Controls the influence of individual data points.
C
and gamma
is crucial for avoiding overfitting or underfitting.Aspect | Decision Trees | K-Nearest Neighbors (KNN) | Support Vector Machines (SVM) with RBF Kernel |
---|---|---|---|
Main hyperparameters | Max depth, min samples split | Number of neighbors (\(k\)) | C (regularization), Gamma (RBF kernel width) |
Interpretability | |||
Handling of Non-linearity | |||
Scalability |
Aspect | Decision Trees | K-Nearest Neighbors (KNN) | Support Vector Machines (SVM) with RBF Kernel |
---|---|---|---|
Sensitivity to Outliers | |||
Memory Usage | |||
Training Time | |||
Prediction Time | |||
Multiclass support |
iClicker cloud join link: https://join.iclicker.com/VYFJ
Take a guess: In your machine learning project, how much time will you typically spend on data preparation and transformation?
The question is adapted from here.
iClicker cloud join link: https://join.iclicker.com/VYFJ
Select all of the following statements which are TRUE.
After applying SimpleImputer The transformed data has a different shape than the original data.
iClicker cloud join link: https://join.iclicker.com/VYFJ
Select all of the following statements which are TRUE.
Let’s take a break!
/
You’re trying to find a suitable date based on:
Person | Age | #FB Friends | Euclidean Distance Calculation | Distance |
---|---|---|---|---|
A | 25 | 400 | √(5² + 150²) | 150.08 |
B | 27 | 300 | √(3² + 50²) | 50.09 |
C | 30 | 500 | √(0² + 250²) | 250.00 |
D | 60 | 250 | √(30² + 0²) | 30.00 |
Based on the distances, the two nearest neighbors (2-NN) are:
What’s the problem here?
Fill in missing data using a chosen strategy:
Fill in missing values like filling empty seats in a classroom with the average student.
Ensure all features have a comparable range.
Rescaling everyone’s height to make basketball players and gymnasts comparable.
Convert categorical features into binary columns.
Turn “Apple, Banana, Orange” into binary columns:
Fruit | 🍎 | 🍌 | 🍊 |
---|---|---|---|
Apple 🍎 | 1 | 0 | 0 |
Banana 🍌 | 0 | 1 | 0 |
Orange 🍊 | 0 | 0 | 1 |
Convert categories into integer values that have a meaningful order.
Turn “Poor, Average, Good” into 1, 2, 3:
Rating | Ordinal |
---|---|
Poor | 1 |
Average | 2 |
Good | 3 |
sklearn
Transformers vs Estimatorsfit
and transform
methods.
fit(X)
: Learns parameters from the data.transform(X)
: Applies the learned transformation to the data.SimpleImputer
): Fills missing values.StandardScaler
): Standardizes features.fit
and predict
methods.
fit(X, y)
: Learns from labeled data.predict(X)
: Makes predictions on new data.DecisionTreeClassifier
, SVC
, KNeighborsClassifier
sklearn
PipelinesChaining a StandardScaler
with a KNeighborsClassifier
model.