| Model | Parameters and hyperparameters | Strengths | Weaknesses |
|---|---|---|---|
| Decision Trees | |||
| KNNs | |||
| SVM RBF |
| Aspect | Decision Trees | K-Nearest Neighbors (KNN) | Support Vector Machines (SVM) with RBF Kernel |
|---|---|---|---|
| Sensitivity to Outliers | |||
| Memory Usage | |||
| Training Time | |||
| Prediction Time |
You’re trying to find a suitable date based on:
| Person | Age | #FB Friends | Euclidean Distance Calculation | Distance |
|---|---|---|---|---|
| A | 25 | 400 | \(\sqrt{5^2 + 150^2}\) | 150.08 |
| B | 27 | 300 | \(\sqrt{3^2 + 50^2}\) | 50.09 |
| C | 30 | 500 | \(\sqrt{0^2 + 250^2}\) | 250.00 |
| D | 60 | 250 | \(\sqrt{30^2 + 0^2}\) | 30.00 |
Based on the distances, the two nearest neighbors (2-NN) are:
What’s the problem here?
Participate using Agora (code: agentic)
Take a guess: In your machine learning project, how much time will you typically spend on data preparation and transformation?
The question is adapted from here.
Participate using Agora (code: agentic)
Select all of the following statements which are TRUE.
StandardScaler ensures a fixed range (i.e., minimum and maximum values) for the features.StandardScaler calculates mean and standard deviation for each feature separately.SimpleImputer The transformed data has a different shape than the original data.Participate using Agora (code: agentic)
Select all of the following statements which are TRUE.
scikit-learn pipeline object with an estimator as the last step, you can call fit, predict, and score on it.scikit-learn pipeline.Fill in missing data using a chosen strategy:
Imputation is like filling in your average or median or most frequent grade for an assessment you missed.
Ensure all features have a comparable range.
Scaling is like adjusting the number of everyone’s Facebook friends so that both the number of friends and their age are on a comparable scale. This way, one feature doesn’t dominate the other when making comparisons.
Convert categorical features into binary columns.
Turn “Apple, Banana, Orange” into binary columns:
| Fruit | 🍎 | 🍌 | 🍊 |
|---|---|---|---|
| Apple 🍎 | 1 | 0 | 0 |
| Banana 🍌 | 0 | 1 | 0 |
| Orange 🍊 | 0 | 0 | 1 |
Convert categories into integer values that have a meaningful order.
Turn “Poor, Average, Good” into 1, 2, 3:
| Rating | Ordinal |
|---|---|
| Poor | 1 |
| Average | 2 |
| Good | 3 |
sklearn Transformers vs Estimatorsfit and transform methods.
fit(X): Learns parameters from the data.transform(X): Applies the learned transformation to the data.SimpleImputer): Fills missing values.StandardScaler): Standardizes features.fit and predict methods.
fit(X, y): Learns from labeled data.predict(X): Makes predictions on new data.DecisionTreeClassifier, SVC, KNeighborsClassifiersklearn PipelinesChaining a StandardScaler with a KNeighborsClassifier model.
Let’s take a break!
/