| Scenario | Data Imbalance | Main Concern | Best Metric(s) / Curve |
|---|---|---|---|
| Email Spam Detection | 10% spam | Avoid false positives | |
| Disease Screening | 1 in 10,000 | Avoid false negatives | |
| Credit Card Fraud | 0.1% fraud | Focus on rare positive class | |
| Customer Churn | 20% churn | Balance FP & FN | |
| Sentiment Analysis | 50/50 balanced | Overall correctness | |
| Face Recognition | Balanced pairs | Trade-off FP vs FN |
| Metric / Plot | When to Use | Why |
|---|---|---|
| Precision, Recall, F1 | When you care about specific error types (FP vs FN) or a fixed threshold. | Focus on particular tradeoffs. |
| PR Curve & AP Score | When the dataset is highly imbalanced (rare positives). | Ignores TNs; focuses on positives. |
| ROC Curve & AUC | When classes are moderately imbalanced. | Measures ranking ability across thresholds. |
AUC–ROC measures the probability that a randomly chosen positive example receives a higher score than a randomly chosen negative example.

class weight="balanced" (preferred method for this course)alpha hyperparameter controls model complexity.alpha.alpha hyperparameteralpha:
alpha: Simpler model, smaller coefficients.alpha: Complex model, larger coefficients.TransformedTargetRegressor
iClicker cloud join link: https://join.iclicker.com/YJHS
Select all of the following statements which are TRUE.
X.alpha hyperparameter of Ridge has similar interpretation of C hyperparameter of LogisticRegression; higher alpha means more complex model.Ridge, smaller alpha means bigger coefficients whereas bigger alpha means smaller coefficients.For this demo, each student should click this link to create a new repo in their accounts, then clone that repo locally to follow along with the demo from today.
| Scenario | What matters most? | Best metric(s)? |
|---|---|---|
| Predicting house prices ranging from $60K–$800K. | A $30K error is huge for a $60K house but small for a $500K house. | |
| Predicting exam scores (0–100). | You want an interpretable measure of average error in points. | |
| Predicting energy consumption in a large industrial system. | Large errors are very costly and should be penalized heavily. | |
| Predicting insurance claim amounts. | You want to compare how well different models explain the variation in claims. |
iClicker cloud join link: https://join.iclicker.com/YJHS
Select all of the following statements which are TRUE.
sklearn for regression problems, using r2_score() and .score() (with default values) will produce the same results.GridSearchCV or RandomizedSearchCV for regression as well as classification problems.| Scenario | What matters most? | Best metric(s)? |
|---|---|---|
| Predicting house prices ranging from $60K–$800K. | A $30K error is huge for a $60K house but small for a $500K house. | |
| Predicting exam scores (0–100). | You want an interpretable measure of average error in points. | |
| Predicting energy consumption in a large industrial system. | Large errors are very costly and should be penalized heavily. | |
| Predicting insurance claim amounts. | You want to compare how well different models explain the variation in claims. |