Introduction to Python#
In this class, we will watch the first of four lectures by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.
Attribution#
The original version of these Python lectures were by Patrick Walls.
These lectures were delivered by Mike Gelbart and are available publicly here.
About this course (5 min)#
High-level overview:#
The MDS program has a programming prerequisite.
Therefore, this course does not start from “no programming knowledge”.
You should know what an
if
statement is.You should know what a
for
loop is.You should know what a function is.
However, not all of you have used Python/R.
So, this course is about Python-specific and R-specific syntax/knowledge.
We will cover things like loops, but just the syntax, not the concept of a loop.
Weeks 1&2: Python, lectures by Mike Gelbart
Weeks 3&4: R, lectures by Tiffany Timbers
Lecture Outline:#
Basic datatypes (20 min)
Lists and tuples (20 min)
Break (5 min)
String methods (5 min)
Dictionaries (10 min)
Conditionals (10 min)
Basic datatypes (20 min)#
A value is a piece of data that a computer program works with such as a number or text.
There are different types of values:
42
is an integer and"Hello!"
is a string.A variable is a name that refers to a value.
In mathematics and statistics, we usually use variables names like \(x\) and \(y\).
In Python, we can use any word as a variable name (as long as it starts with a letter and is not a reserved word in Python such as
for
,while
,class
,lambda
, etc.).
And we use the assignment operator
=
to assign a value to a variable.
See the Python 3 documentation for a summary of the standard built-in Python datatypes. See Think Python (Chapter 2) for a discussion of variables, expressions and statements in Python.
Common built-in Python data types#
English name |
Type name |
Description |
Example |
---|---|---|---|
integer |
|
positive/negative whole numbers |
|
floating point number |
|
real number in decimal form |
|
boolean |
|
true or false |
|
string |
|
text |
|
list |
|
a collection of objects - mutable & ordered |
|
tuple |
|
a collection of objects - immutable & ordered |
|
dictionary |
|
mapping of key-value pairs |
|
none |
|
represents no value |
|
Numeric Types#
x = 42
type(x)
int
print(x)
42
x # in Jupyter/IPython we don't need to explicitly print for the last line of a cell
42
pi = 3.14159
print(pi)
3.14159
type(pi)
float
λ = 2
Arithmetic Operators#
The syntax for the arithmetic operators are:
Operator |
Description |
---|---|
|
addition |
|
subtraction |
|
multiplication |
|
division |
|
exponentiation |
|
integer division |
|
modulo |
Let’s apply these operators to numeric types and observe the results.
1 + 2 + 3 + 4 + 5
15
0.1 + 0.2
0.30000000000000004
Tip
From Firas: This is floating point arithmetic. For an explanation of what’s going on, see this tutorial.
2 * 3.14159
6.28318
2**10
1024
type(2**10)
int
2.0**10
1024.0
int_2 = 2
float_2 = 2.0
float_2_again = 2.
101 / 2
50.5
101 // 2 # "integer division" - always rounds down
50
101 % 2 # "101 mod 2", or the remainder when 101 is divided by 2
1
None#
NoneType
is its own type in Python.It only has one possible value,
None
x = None
print(x)
None
type(x)
NoneType
You may have seen similar things in other languages, like null
in Java, etc.
Strings#
Text is stored as a type called a string.
We think of a string as a sequence of characters.
We write strings as characters enclosed with either:
single quotes, e.g.,
'Hello'
double quotes, e.g.,
"Goodbye"
triple single quotes, e.g.,
'''Yesterday'''
triple double quotes, e.g.,
"""Tomorrow"""
my_name = "Mike Gelbart"
print(my_name)
Mike Gelbart
type(my_name)
str
course = 'DSCI 511'
print(course)
DSCI 511
type(course)
str
If the string contains a quotation or apostrophe, we can use double quotes or triple quotes to define the string.
sentence = "It's a rainy day."
print(sentence)
It's a rainy day.
type(sentence)
str
saying = '''They say:
"It's a rainy day!"'''
print(saying)
They say:
"It's a rainy day!"
Boolean#
The Boolean (
bool
) type has two values:True
andFalse
.
the_truth = True
print(the_truth)
True
type(the_truth)
bool
lies = False
print(lies)
False
type(lies)
bool
Comparison Operators#
Compare objects using comparison operators. The result is a Boolean value.
Operator |
Description |
---|---|
|
is |
|
is |
|
is |
|
is |
|
is |
|
is |
|
is |
2 < 3
True
"Data Science" != "Deep Learning"
True
2 == "2"
False
2 == 2.0
True
Note: we will discuss is
next week.
Operators on Boolean values.
Operator |
Description |
---|---|
|
are |
|
is at least one of |
|
is |
True and True
True
True and False
False
False or False
False
("Python 2" != "Python 3") and (2 <= 3)
True
not True
False
not not True
True
Casting#
Sometimes (but rarely) we need to explicitly cast a value from one type to another.
Python tries to do something reasonable, or throws an error if it has no ideas.
x = int(5.0)
x
5
type(x)
int
x = str(5.0)
x
'5.0'
type(x)
str
str(5.0) == 5.0
False
# list(5.0) # there is no reasonable thing to do here
int(5.3)
5
Lists and Tuples (20 min)#
Lists and tuples allow us to store multiple things (“elements”) in a single object.
The elements are ordered.
my_list = [1, 2, "THREE", 4, 0.5]
print(my_list)
[1, 2, 'THREE', 4, 0.5]
type(my_list)
list
You can get the length of the list with len
:
len(my_list)
5
today = (1, 2, "THREE", 4, 0.5)
print(today)
(1, 2, 'THREE', 4, 0.5)
type(today)
tuple
len(today)
5
Indexing and Slicing Sequences#
We can access values inside a list, tuple, or string using the backet syntax.
Python uses zero-based indexing, which means the first element of the list is in position 0, not position 1.
Sadly, R uses one-based indexing, so get ready to be confused.
my_list
[1, 2, 'THREE', 4, 0.5]
my_list[0]
1
my_list[4]
0.5
my_list[5]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-75-075ca585e721> in <module>
----> 1 my_list[5]
IndexError: list index out of range
today[4]
0.5
We use negative indices to count backwards from the end of the list.
my_list
[1, 2, 'THREE', 4, 0.5]
my_list[-1]
0.5
We use the colon :
to access a subsequence. This is called “slicing”.
my_list[1:4]
[2, 'THREE', 4]
Above: note that the start is inclusive and the end is exclusive.
So
my_list[1:3]
fetches elements 1 and 2, but not 3.In other words, it gets the 2nd and 3rd elements in the list.
We can omit the start or end:
my_list[:3]
[1, 2, 'THREE']
my_list[3:]
[4, 0.5]
my_list[:] # *almost* same as my_list - more details next week
[1, 2, 'THREE', 4, 0.5]
Strings behave the same as lists and tuples when it comes to indexing and slicing.
alphabet = "abcdefghijklmnopqrstuvwxyz"
alphabet[0]
'a'
alphabet[-1]
'z'
alphabet[-3]
'x'
alphabet[:5]
'abcde'
alphabet[12:20]
'mnopqrst'
List Methods#
A list is an object and it has methods for interacting with its data.
For example,
list.append(item)
appends an item to the end of the list.See the documentation for more list methods.
primes = [2,3,5,7,11]
primes
[2, 3, 5, 7, 11]
len(primes)
5
primes.append(13)
primes
[2, 3, 5, 7, 11, 13]
len(primes)
6
max(primes)
13
min(primes)
2
sum(primes)
41
[1,2,3] + ["Hello", 7]
[1, 2, 3, 'Hello', 7]
Sets#
Another built-in Python data type is the
set
, which stores an un-ordered list of unique items.More on sets in DSCI 512.
s = {2,3,5,11}
s
{2, 3, 5, 11}
{1,2,3} == {3,2,1}
True
[1,2,3] == [3,2,1]
False
s.add(2) # does nothing
s
{2, 3, 5, 11}
s[0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-125-c9c96910e542> in <module>
----> 1 s[0]
TypeError: 'set' object is not subscriptable
Above: throws an error because elements are not ordered.
Mutable vs. Immutable Types#
Strings and tuples are immutable types which means they cannot be modified.
Lists are mutable and we can assign new values for its various entries.
This is the main difference between lists and tuples.
names_list = ["Indiana","Fang","Linsey"]
names_list
['Indiana', 'Fang', 'Linsey']
names_list[0] = "Cool guy"
names_list
['Cool guy', 'Fang', 'Linsey']
names_tuple = ("Indiana","Fang","Linsey")
names_tuple
('Indiana', 'Fang', 'Linsey')
names_tuple[0] = "Not cool guy"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-130-bd6a1b77b220> in <module>
----> 1 names_tuple[0] = "Not cool guy"
TypeError: 'tuple' object does not support item assignment
Same goes for strings. Once defined we cannot modifiy the characters of the string.
my_name = "Mike"
my_name[-1] = 'q'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-132-94c4564b18e3> in <module>
----> 1 my_name[-1] = 'q'
TypeError: 'str' object does not support item assignment
x = ([1,2,3],5)
x[1] = 7
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-138-415ce6bd0126> in <module>
----> 1 x[1] = 7
TypeError: 'tuple' object does not support item assignment
x
([1, 2, 3], 5)
x[0][1] = 4
x
([1, 4, 3], 5)
Break (5 min)#
String Methods (5 min)#
There are various useful string methods in Python.
MDS-CL students will soon be the experts we can go to for help!
all_caps = "HOW ARE YOU TODAY?"
print(all_caps)
HOW ARE YOU TODAY?
new_str = all_caps.lower()
new_str
'how are you today?'
Note that the method lower doesn’t change the original string but rather returns a new one.
all_caps
'HOW ARE YOU TODAY?'
There are many string methods. Check out the documentation.
all_caps.split()
['HOW', 'ARE', 'YOU', 'TODAY?']
all_caps.count("O")
3
One can explicitly cast a string to a list:
caps_list = list(all_caps)
caps_list
['H',
'O',
'W',
' ',
'A',
'R',
'E',
' ',
'Y',
'O',
'U',
' ',
'T',
'O',
'D',
'A',
'Y',
'?']
len(all_caps)
18
len(caps_list)
18
String formatting#
Python has ways of creating strings by “filling in the blanks” and formatting them nicely.
There are a few ways of doing this. See here and here for some discussion.
Old formatting style (borrowed from the C programming language):
template = "Hello, my name is %s. I am %.2f years old."
template % ("Newborn Baby", 4/12)
'Hello, my name is Newborn Baby. I am 0.33 years old.'
New formatting style (see documentation):
template_new = "Hello, my name is {}. I am {:.2f} years old."
template_new.format('Newborn Baby', 4/12)
'Hello, my name is Newborn Baby. I am 0.33 years old.'
Newer formatting style (see here) - note the f
before the start of the string:
name = "Newborn Baby"
age = 4/12
template_new = f'Hello, my name is {name}. I am {age:.2f} years old.'
template_new
'Hello, my name is Newborn Baby. I am 0.33 years old.'
Dictionaries (10 min)#
A dictionary is a mapping between key-values pairs.
house = {'bedrooms': 3, 'bathrooms': 2, 'city': 'Vancouver', 'price': 2499999, 'date_sold': (1,3,2015)}
condo = {'bedrooms' : 2,
'bathrooms': 1,
'city' : 'Burnaby',
'price' : 699999,
'date_sold': (27,8,2011)
}
We can access a specific field of a dictionary with square brackets:
house['price']
2499999
condo['city']
'Burnaby'
We can also edit dictionaries (they are mutable):
condo['price'] = 5 # price already in the dict
condo
{'bedrooms': 2,
'bathrooms': 1,
'city': 'Burnaby',
'price': 5,
'date_sold': (27, 8, 2011)}
condo['flooring'] = "wood"
condo
{'bedrooms': 2,
'bathrooms': 1,
'city': 'Burnaby',
'price': 5,
'date_sold': (27, 8, 2011),
'flooring': 'wood'}
We can delete fields entirely (though I rarely use this):
del condo["city"]
condo
{'bedrooms': 2,
'bathrooms': 1,
'price': 5,
'date_sold': (27, 8, 2011),
'flooring': 'wood'}
condo[5] = 443345
condo
{'bedrooms': 2,
'bathrooms': 1,
'price': 5,
'date_sold': (27, 8, 2011),
'flooring': 'wood',
5: 443345}
condo[(1,2,3)] = 777
condo
{'bedrooms': 2,
'bathrooms': 1,
'price': 5,
'date_sold': (27, 8, 2011),
'flooring': 'wood',
5: 443345,
(1, 2, 3): 777}
condo["nothere"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[127], line 1
----> 1 condo["nothere"]
KeyError: 'nothere'
A sometimes useful trick about default values:
condo["bedrooms"]
2
is shorthand for
condo.get("bedrooms")
2
With this syntax you can also use default values:
condo.get("bedrooms", "unknown")
2
condo.get("fireplaces", "unknown")
'unknown'
A common operation is finding the maximum dictionary key by value.
There are a few ways to do this, see this StackOverflow page.
One way of doing it:
max(word_lengths, key=word_lengths.get)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-175-4aea82f80396> in <module>
----> 1 max(word_lengths, key=word_lengths.get)
NameError: name 'word_lengths' is not defined
We saw word_lengths.get
above - it is saying that we should call this function on each key of the dict to decide how to sort.
Empties#
lst = list() # empty list
lst
[]
lst = [] # empty list
lst
[]
tup = tuple() # empty tuple
tup
()
tup = () # empty tuple
tup
()
dic = dict() # empty dict
dic
{}
dic = {} # empty dict
dic
{}
st = set() # emtpy set
st
set()
st = {} # NOT an empty set!
type(st)
dict
st = {1}
type(st)
set
Conditionals (10 min)#
Conditional statements allow us to write programs where only certain blocks of code are executed depending on the state of the program.
Let’s look at some examples and take note of the keywords, syntax and indentation.
Check out the Python documentation and Think Python (Chapter 5) for more information about conditional execution.
name = input("What's your name?")
if name.lower() == 'mike':
print("That's my name too!")
elif name.lower() == 'santa':
print("That's a funny name.")
else:
print("Hello {}! That's a cool name.".format(name))
print('Nice to meet you!')
What's your name?mike
That's my name too!
bool(None)
False
The main points to notice:
Use keywords
if
,elif
andelse
The colon
:
ends each conditional expressionIndentation (by 4 empty space) defines code blocks
In an
if
statement, the first block whose conditional statement returnsTrue
is executed and the program exits theif
blockif
statements don’t necessarily needelif
orelse
elif
lets us check several conditionselse
lets us evaluate a default block if all other conditions areFalse
the end of the entire
if
statement is where the indentation returns to the same level as the firstif
keyword
If statements can also be nested inside of one another:
name = input("What's your name?")
if name.lower() == 'mike':
print("That's my name too!")
elif name.lower() == 'santa':
print("That's a funny name.")
else:
print("Hello {0}! That's a cool name.".format(name))
if name.lower().startswith("super"):
print("Do you have superpowers?")
print('Nice to meet you!')
What's your name?supersam
Hello supersam! That's a cool name.
Do you have superpowers?
Nice to meet you!
Inline if/else#
words = ["the", "list", "of", "words"]
x = "long list" if len(words) > 10 else "short list"
x
'short list'
if len(words) > 10:
x = "long list"
else:
x = "short list"
x
'short list'
(optional) short-circruiting#
BLAH # not defined
True or BLAH
True and BLAH
False and BLAH