| Aspect | Decision Trees | K-Nearest Neighbors (KNN) | Support Vector Machines (SVM) with RBF Kernel |
|---|---|---|---|
| Main hyperparameters | Max depth, min samples split | Number of neighbors (\(k\)) | C (regularization), Gamma (RBF kernel width) |
| Interpretability | |||
| Handling of Non-linearity | |||
| Scalability |
| Aspect | Decision Trees | K-Nearest Neighbors (KNN) | Support Vector Machines (SVM) with RBF Kernel |
|---|---|---|---|
| Sensitivity to Outliers | |||
| Memory Usage | |||
| Training Time | |||
| Prediction Time | |||
| Multiclass support |
iClicker cloud join link: https://join.iclicker.com/YJHS
Take a guess: In your machine learning project, how much time will you typically spend on data preparation and transformation?
The question is adapted from here.
iClicker cloud join link: https://join.iclicker.com/YJHS
Select all of the following statements which are TRUE.
iClicker cloud join link: https://join.iclicker.com/YJHS
Select all of the following statements which are TRUE.
cross_validate with a pipeline object, it will call fit and transform on the training fold and only transform on the validation fold.Let’s take a break!
/
You’re trying to find a suitable date based on:
| Person | Age | #FB Friends | Euclidean Distance Calculation | Distance |
|---|---|---|---|---|
| A | 25 | 400 | √(5² + 150²) | 150.08 |
| B | 27 | 300 | √(3² + 50²) | 50.09 |
| C | 30 | 500 | √(0² + 250²) | 250.00 |
| D | 60 | 250 | √(30² + 0²) | 30.00 |
Based on the distances, the two nearest neighbors (2-NN) are:
What’s the problem here?
Fill in missing data using a chosen strategy:
Fill in missing values like filling empty seats in a classroom with the average student.
Ensure all features have a comparable range.
Rescaling everyone’s height to make basketball players and gymnasts comparable.
Convert categorical features into binary columns.
Turn “Apple, Banana, Orange” into binary columns:
| Fruit | 🍎 | 🍌 | 🍊 |
|---|---|---|---|
| Apple 🍎 | 1 | 0 | 0 |
| Banana 🍌 | 0 | 1 | 0 |
| Orange 🍊 | 0 | 0 | 1 |
Convert categories into integer values that have a meaningful order.
Turn “Poor, Average, Good” into 1, 2, 3:
| Rating | Ordinal |
|---|---|
| Poor | 1 |
| Average | 2 |
| Good | 3 |
sklearn Transformers vs Estimatorsfit and transform methods.
fit(X): Learns parameters from the data.transform(X): Applies the learned transformation to the data.SimpleImputer): Fills missing values.StandardScaler): Standardizes features.fit and predict methods.
fit(X, y): Learns from labeled data.predict(X): Makes predictions on new data.DecisionTreeClassifier, SVC, KNeighborsClassifiersklearn PipelinesChaining a StandardScaler with a KNeighborsClassifier model.