Firas Moosvi (Slides adapted from Varada Kolhatkar)
🤝 Introductions ! 🤝
About your instructor
About my research interests
Group work in this class
This term we will try to work in “Pods” of 3-5 …
Research shows that there is tremendous benefits in students working (and struggling) together!
Students ask better and more insightful questions, engage more deeply with the work, and it adds a social element to class.
We will try this in CPSC 330 this term!
Group work in this class
Understandably, not everyone is a fan of group work - I understand that!
So you will never be forced to work in groups If you would like to opt-out, move to the far left and far right sides of the room so we know you prefer to work individually.
If everyone moves to the side of the room, we will re-evaluate this approach 😂
There are no marks or points associated with these groups, and everyone should work on their own laptops as well
Group work: Pods
Form a Pod of 3-5 people sitting close to you.
Each person should answer the following questions:
Preferred Name,
Year,
(intended) Major
Why are you taking CPSC 330?
Then, as a group, answer the following question:
What is the most interesting (good or bad) example of Machine Learning in society?
Meet Eva (a fictitious persona)!
Eva is among one of you. She has some experience in Python programming. She knows machine learning as a buzz word. During her recent internship, she has developed some interest and curiosity in the field. She wants to learn what is it and how to use it. She is a curious person and usually has a lot of questions!
Learning outcomes
From this lecture, you will be able to
Explain the motivation behind study machine learning.
Briefly describe supervised learning.
Differentiate between traditional programming and machine learning.
Assess whether a given problem is suitable for a machine learning solution.
Navigate through the course material.
Be familiar with the policies and how the class is going to run.
Become familiar with CPSC 330 and how the course works
You can find the source code for everything we do here: https://ubc-cs.github.io/cpsc330-2024W1.
Important
Make sure you go through the syllabus thoroughly and complete the syllabus quiz before Monday, Sept 19th at 11:59pm.
Asking questions during class
You are welcome to ask questions by raising your hand!
If you would prefer to write notes and ask questions later, you are more than welcome to do that also! Use Piazza.
Registration, waitlist and prerequisites
Important
Please go through this document carefully before contacting your instructors about these issues. Even then, we are very unlikely to be able to help with registration, waitlist or prerequisite issues.
If you are on waitlist and if you’d like to try your chances, you should be able to access Canvas and Piazza.
If you’re unable to make it this time, there will be two sections of this course offered next semester and then again in the summer.
Lecture format
In person lectures T/Th.
Sometimes there will be videos to watch before lecture. You will find the list of pre-watch videos in the schedule on the course webpage.
We will also try to work on some questions and exercises together during the class.
All materials will be posted in this GitHub repository.
Weekly tutorials will be office hour format run by the TAs and are completely optional.
You do not need to be registered in a tutorial.
You can attend whatever tutorials or office hours your want, regardless of in which/whether you’re registered.
Home work assignments
First homework assignment is due this coming Tuesday, September 10, midnight. This is a relatively straightforward assignment on Python. If you struggle with this assignment then that could be a sign that you will struggle later on in the course.
You must do the first two homework assignments on your own.
Exams
We’ll have two self-scheduled midterms and one final in Computer-based Testing Facility (CBTF).
Course calendar
Here is our course Calendar. Make sure you check it on a regular basis:
LookAtMe!: Thanks for your purchase of a video clip from LookAtMe!, you've been charged 35p. Think you can do better? Why not send a video in a MMSto 32323.
ham
Aight, I'll hit you up when I get some cash
ham
Don no da:)whats you plan?
ham
Going to take your babe out ?
ham
No need lar. Jus testing e phone card. Dunno network not gd i thk. Me waiting 4 my sis 2 finish bathing so i can bathe. Dun disturb u liao u cleaning ur room.
Traditional programming vs. ML
Imagine writing a Python program for spam identification, i.e., whether a text message or an email is spam or non-spam.
Traditional programming
Come up with rules using human understanding of spam messages.
Time consuming and hard to come up with robust set of rules.
Machine learning
Collect large amount of data of spam and non-spam emails and let the machine learning algorithm figure out rules.
Let’s train a model
There are several packages that help us perform machine learning.
X_train, y_train = train_df["sms"], train_df["target"]X_test, y_test = test_df["sms"], test_df["target"]clf = make_pipeline(CountVectorizer(max_features=5000), LogisticRegression(max_iter=5000))clf.fit(X_train, y_train);# Training the model
Unseen messages
Now use the trained model to predict targets of unseen messages:
sms
3245
Funny fact Nobody teaches volcanoes 2 erupt, tsunamis 2 arise, hurricanes 2 sway aroundn no 1 teaches hw 2 choose a wife Natural disasters just happens
944
I sent my scores to sophas and i had to do secondary application for a few schools. I think if you are thinking of applying, do a research on cost also. Contact joke ogunrinde, her school is one m...
1044
We know someone who you know that fancies you. Call 09058097218 to find out who. POBox 6, LS15HB 150p
2484
Only if you promise your getting out as SOON as you can. And you'll text me in the morning to let me know you made it in ok.
Predicting on unseen data
The model is accurately predicting labels for the unseen text messages above!
sms
spam_predictions
3245
Funny fact Nobody teaches volcanoes 2 erupt, tsunamis 2 arise, hurricanes 2 sway aroundn no 1 teaches hw 2 choose a wife Natural disasters just happens
ham
944
I sent my scores to sophas and i had to do secondary application for a few schools. I think if you are thinking of applying, do a research on cost also. Contact joke ogunrinde, her school is one me the less expensive ones
ham
1044
We know someone who you know that fancies you. Call 09058097218 to find out who. POBox 6, LS15HB 150p
spam
2484
Only if you promise your getting out as SOON as you can. And you'll text me in the morning to let me know you made it in ok.
ham
A different way to solve problems
Machine learning uses computer programs to model data. It can be used to extract hidden patterns, make predictions in new situation, or generate novel content.
A field of study that gives computers the ability to learn without being explicitly programmed. – Arthur Samuel (1959)
ML vs. traditional programming
With machine learning, you’re likely to
Save time
Customize and scale products
Prevalence of ML
Let’s look at some examples.
Activity: For what type of problems ML is appropriate? (~5 mins)
Discuss with your neighbour for which of the following problems you would use machine learning
Finding a list of prime numbers up to a limit
Given an image, automatically identifying and labeling objects in the image
Predict the “rating” or “preference” a user would give to an item.
What is supervised learning?
Training data comprises a set of observations (\(X\)) and their corresponding targets (\(y\)).
We wish to find a model function \(f\) that relates \(X\) to \(y\).
We use the model function to predict targets of new examples.
🤔 Eva’s questions
At this point, Eva is wondering about many questions.
How are we exactly “learning” whether a message is spam and ham?
Are we expected to get correct predictions for all possible messages? How does it predict the label for a message it has not seen before?
What if the model mis-labels an unseen example? For instance, what if the model incorrectly predicts a non-spam as a spam? What would be the consequences?
How do we measure the success or failure of spam identification?
If you want to use this model in the wild, how do you know how reliable it is?
Would it be useful to know how confident the model is about the predictions rather than just a yes or a no?
It’s great to think about these questions right now. But Eva has to be patient. By the end of this course you’ll know answers to many of these questions!