CPSC 330 Lecture 12: Feature importances

Announcements

  • Midterm 1 grades are released on PrairieLearn
  • Please remember to view your midterm in the CBTF!

Midterm Debrief

How did you feel about the first midterm?

    1. I felt well-prepared and it went smoothly
    1. I think it went okay. We’ll see when grades come back
    1. I struggled and didn’t feel fully prepared
    1. I noticed some gaps between what we practiced and what appeared on the exam
    1. It was a stressful experience for me 😔

Motivating Feature importances

Consider the following two scenarios:

Scenario 1: Which model would you pick

Predicting whether a patient is likely to develop diabetes based on features such as age, blood pressure, glucose levels, and BMI. You have two models:

  • LGBM which results in 0.9 f1 score
  • Logistic regression which results in 0.84 f1 score

Which model would you pick? Why?

Scenario 2

Predicting whether a user will purchase a product next based on their browsing history, previous purchases, and click behavior. You have two models:

  • LGBM which results in 0.9 F1 score
  • Logistic regression which results in 0.84 F1 score

Which model would you pick? Why?

Transparency

  • In many domains understanding the relationship between features and predictions is critical for trust and regulatory compliance.

Feature importances

  • How does the output depend upon the input?
  • How do the predictions change as a function of a particular feature?

How to get feature importances?

Correlations

  • What are some limitations of correlations?

Interpreting coefficients

  • Linear models are interpretable because you get coefficients associated with different features.
  • Each coefficient represents the estimated impact of a feature on the target variable, assuming all other features are held constant.
  • In a Ridge model,
    • A positive coefficient indicates that as the feature’s value increases, the predicted value also increases.
      • A negative coefficient indicates that an increase in the feature’s value leads to a decrease in the predicted value.

Interpreting coefficients

  • When we have different types of preprocessed features, what challenges you might face in interpreting them?
    • Ordinally encoded features
    • One-hot encoded features
    • Scaled numeric features

Break

Let’s take a break!

Pause and Reflect

We are now just over half-way through CPSC 330!

You had a midterm already a couple of weeks ago, I’d like some feedback on how things are going in class (as the instructor).

Class Survey

I’d love to hear how you think lectures are going, and how the course is going overall: bit.ly/cpsc330_2025W2.

Let’s take a couple of minutes to complete this before we get started on today’s content.

Group Work: Class Demo & Live Coding

For this demo, each student should click this link to create a new repo in their accounts, then clone that repo locally to follow along with the demo from today.

Group Work: Class Demo & Live Coding (if time permits)

For this demo, each student should click this link to create a new repo in their accounts, then clone that repo locally to follow along with the demo from today.

SHAP

SHAP

You might need to install SHAP in your conda environment

conda install -c conda-forge shap