Principles of good communicaiton
Grid search activity
Go to this Google doc: https://tinyurl.com/5n8xf5yj
Explanation 1: https://tinyurl.com/msk2cfkb
![]()
Explanation 2: https://tinyurl.com/mt2z9ey5
![]()
Discussion questions
- What do you like about each explanation?
- What do you dislike about each explanation?
- What do you think is the intended audience for each explanation?
- Which explanation do you think is more effective overall for someone on Day 1 of CPSC 330?
- Each explanation has an image. Which one is more effective? What are the pros/cons?
- Each explanation has some sample code. Which one is more effective? What are the pros/cons?
Concepts then labels, not the other way around
Explanation 1: Machine learning algorithms, like an airplane’s cockpit, typically involve a bunch of knobs and switches that need to be set.
Explanation 2: Grid search is the process of performing hyper parameter tuning in order to determine the optimal values for a given model.
The effectiveness of these different statements depend on your audience.
Concepts then labels, not the other way around
Top down vs. bottom up
- Start with the big picture
- Then gradually reveal the structure and key components
![]()
- Start with the details
- Build to the big picture
![]()
In the previous explanations, which one represented a bottom-up explanation and which one a top-down explanation?
New ideas in small chunks
The hidden structure in the first explanation
- The concept of setting a bunch of values.
- Random forest example.
- The problem / pain point.
- The solution.
- How it works - high level.
- How it works - written example.
- How it works - code example.
- The name of what we were discussing all this time.
Reuse running examples
Effective explanations often use the same example throughout the text and code. This helps readers follow the line of reasoning.
Approach from all angles
- When we’re trying to draw mental boundaries around a concept, it’s helpful to see examples on all sides of those boundaries
- It would have been nice to include
- Performance with and without hyperparameter tuning.
- Other types of hyperparameter tuning (e.g.
RandomizedSearchCV).
When experimenting, show the results asap
The first explanation shows the output of the code, whereas the second does not. This is easy to do and makes a big difference.
It’s not about you
- Interesting to you != useful to the reader (aka it’s not about you)
- Examine the hidden intention of wanting to include something that’s not important
- Am I trying to sound smart or prove I know something?
- Am I afraid that leaving it out makes the work look too simple?
- Am I adding it because I spent time on it and want that effort to be visible?
- Am I overexplaining because I’m worried the audience will judge me?
If it doesn’t serve the audience, it’s noise.
Core questions you must be ready to answer
- What does this result mean (in plain language)?
- When does the model work? When does it fail? (failure modes)
- Why did it make this prediction? (explainability path)
- What are the risks & consequences of using it?
- How does it compare to doing nothing or current practice?
- What is the cost to maintain / retrain / monitor?
Poor vs. Effective communication
Which one is poor and which one is effective? Why?
Communication 1
“I built a model to predict next week’s avocado prices. The ridge model had an RMSE of 0.79, but the random forest performed better with tuned hyperparameters. The cross-validation score improved after adding lag features. We should use the random forest.”
Communication 2
“Our avocado price forecast reduces weekly price uncertainty by 15%. This lets the procurement team lock in contracts earlier and avoid overpaying during high-volatility weeks, saving an estimated $45k per month.
To deploy: we need 2 days to automate data updates and a weekly accuracy review.
Risk: model performance drops during holiday spikes. Here’s our mitigation plan.”
Poor vs. Effective communication
❌ Poor communication:
“I built a model to predict next week’s avocado prices. The ridge model had an RMSE of 0.79, but the random forest performed better with tuned hyperparameters. The cross-validation score improved after adding lag features. We should use the random forest.”
Result: The manager doesn’t know why this matters, how it affects decisions, or what to do next. No adoption.
✅ Effective reframe:
“Our avocado price forecast reduces weekly price uncertainty by 15%. This lets the procurement team lock in contracts earlier and avoid overpaying during high-volatility weeks, saving an estimated $45k per month.
To deploy: we need 2 days to automate data updates and a weekly accuracy review.
Risk: model performance drops during holiday spikes. Here’s our mitigation plan.”
Result: Clear value, operational impact, required effort, and risks. Enables decision-making.
Key difference: Shift from model-centric communication → decision-ready communication.
Break
Let’s take a 5-min break
Confidence and predict_proba
- What does it mean to be “confident” in your results?
- When you perform analysis, you are responsible for many judgment calls.
- Your results will be different than others.
- As you make these judgments and start to form conclusions, how can you recognize your own uncertainties about the data so that you can communicate confidently?
Let’s imagine that the following claim is true:
Vancouver has the highest cost of living of all cities in Canada.
Now let’s consider a few beliefs we could hold:
- Vancouver has the highest cost of living of all cities in Canada. I am 95% sure of this.
- Vancouver has the highest cost of living of all cities in Canada. I am 55% sure of this.
The part is bold is called a credence. Which belief is better?
But what if it’s actually Toronto that has the highest cost of living in Canada?
- Vancouver has the highest cost of living of all cities in Canada. I am 95% sure of this.
- Vancouver has the highest cost of living of all cities in Canada. I am 55% sure of this.
Which belief is better now?
We don’t just want to be right. We want to be confident when we’re right and hesitant when we’re wrong.
In our final exam, imagine if, along with your answers, we ask you to also provide a confidence score for each. This would involve rating how sure you are about each answer, perhaps on a percentage scale from 0% (completely unsure) to 100% (completely sure). This method not only assesses your knowledge but also your awareness of your own understanding, potentially impacting the grading process and highlighting areas for improvement. Who supports this idea 😉?
Loss in machine learning
When you call fit for LogisticRegression it has similar preferences:
correct and confident
> correct and hesitant
> incorrect and hesitant
> incorrect and confident
- This is a “loss” or “error” function like mean squared error, so lower values are better.
- When you call
fit it tries to minimize this metric.
Logistic regression loss
- confident and correct \(\rightarrow\) smaller loss
- hesitant and correct \(\rightarrow\) a bit higher loss
- hesitant and incorrect \(\rightarrow\) even higher loss
- confident and incorrect \(\rightarrow\) high loss
Misleding visualizations
This chart is attempting to suggest a relationship between childhood MMR vaccination rates and the prevalence of autism spectrum disorders (AD/ASD) across several countries.
Do you see any problems with this visualization?
Visualizing your data and results could be very powerful but at the same time can be misleading if not done properly.
Things to watch out for
- Chopping off the x-axis
- the practice of starting the x-axis (or sometimes the y-axis) at a value other than zero to exaggerate the changes in the data
- Saturate the axes
- where the axes are set to ranges that are too narrow or too wide for the data being presented making it difficult to identify patterns
- Bar chart for a cherry-picked values
- Different y-axes
What did we learn today?
Principles of effective communication
- Concepts then labels, not the other way around
- Bottom-up explanations
- New ideas in small chunks
- Reuse your running examples
- Approaches from all angles
- When experimenting, show the results asap
- It’s not about you.
- Decision variables, objectives, and context.
- Expressing your confidence about the results
- Misleading visualizations.
Have a great weekend!