Introduction to Python

Introduction to Python#

In this class, we will watch the first of four lectures by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.

Attribution#

The original version of these Python lectures were by Patrick Walls.
These lectures were delivered by Mike Gelbart and are available publicly here.

About this course (5 min)#

High-level overview:#

The MDS program has a programming prerequisite.
Therefore, this course does not start from “no programming knowledge”.
- You should know what an if statement is.
- You should know what a for loop is.
- You should know what a function is.
However, not all of you have used Python/R.
So, this course is about Python-specific and R-specific syntax/knowledge.
We will cover things like loops, but just the syntax, not the concept of a loop.
Weeks 1&2: Python, lectures by Mike Gelbart
Weeks 3&4: R, lectures by Tiffany Timbers

Lecture Outline:#

Basic datatypes (20 min)
Lists and tuples (20 min)
Break (5 min)
String methods (5 min)
Dictionaries (10 min)
Conditionals (10 min)

Basic datatypes (20 min)#

A value is a piece of data that a computer program works with such as a number or text.
There are different types of values: 42 is an integer and "Hello!" is a string.
A variable is a name that refers to a value.
- In mathematics and statistics, we usually use variables names like \(x\) and \(y\).
- In Python, we can use any word as a variable name (as long as it starts with a letter and is not a reserved word in Python such as for, while, class, lambda, etc.).
And we use the assignment operator = to assign a value to a variable.

See the Python 3 documentation for a summary of the standard built-in Python datatypes. See Think Python (Chapter 2) for a discussion of variables, expressions and statements in Python.

Common built-in Python data types#

English name	Type name	Description	Example
integer	`int`	positive/negative whole numbers	`42`
floating point number	`float`	real number in decimal form	`3.14159`
boolean	`bool`	true or false	`True`
string	`str`	text	`"I Can Has Cheezburger?"`
list	`list`	a collection of objects - mutable & ordered	`['Ali','Xinyi','Miriam']`
tuple	`tuple`	a collection of objects - immutable & ordered	`('Thursday',6,9,2018)`
dictionary	`dict`	mapping of key-value pairs	`{'name':'DSCI','code':511,'credits':2}`
none	`NoneType`	represents no value	`None`

Numeric Types#

x = 42

type(x)

int

print(x)

x # in Jupyter/IPython we don't need to explicitly print for the last line of a cell

pi = 3.14159

print(pi)

3.14159

type(pi)

float

λ = 2

Arithmetic Operators#

The syntax for the arithmetic operators are:

Operator	Description
`+`	addition
`-`	subtraction
`*`	multiplication
`/`	division
`**`	exponentiation
`//`	integer division
`%`	modulo

Let’s apply these operators to numeric types and observe the results.

1 + 2 + 3 + 4 + 5

0.1 + 0.2

0.30000000000000004

Tip

From Firas: This is floating point arithmetic. For an explanation of what’s going on, see this tutorial.

2 * 3.14159

6.28318

2**10

type(2**10)

int

2.0**10

1024.0

int_2 = 2

float_2 = 2.0

float_2_again = 2.

101 / 2

50.5

101 // 2 # "integer division" - always rounds down

101 % 2 # "101 mod 2", or the remainder when 101 is divided by 2

None#

NoneType is its own type in Python.
It only has one possible value, None

x = None

print(x)

None

type(x)

NoneType

You may have seen similar things in other languages, like null in Java, etc.

Strings#

Text is stored as a type called a string.
We think of a string as a sequence of characters.
We write strings as characters enclosed with either:
- single quotes, e.g., 'Hello'
- double quotes, e.g., "Goodbye"
- triple single quotes, e.g., '''Yesterday'''
- triple double quotes, e.g., """Tomorrow"""

my_name = "Mike Gelbart"

print(my_name)

Mike Gelbart

type(my_name)

str

course = 'DSCI 511'

print(course)

DSCI 511

type(course)

str

If the string contains a quotation or apostrophe, we can use double quotes or triple quotes to define the string.

sentence = "It's a rainy day."

print(sentence)

It's a rainy day.

type(sentence)

str

saying = '''They say: 
"It's a rainy day!"'''

print(saying)

They say: 
"It's a rainy day!"

Boolean#

The Boolean (bool) type has two values: True and False.

the_truth = True

print(the_truth)

True

type(the_truth)

bool

lies = False

print(lies)

False

type(lies)

bool

Comparison Operators#

Compare objects using comparison operators. The result is a Boolean value.

Operator	Description
`x == y`	is `x` equal to `y`?
`x != y`	is `x` not equal to `y`?
`x > y`	is `x` greater than `y`?
`x >= y`	is `x` greater than or equal to `y`?
`x < y`	is `x` less than `y`?
`x <= y`	is `x` less than or equal to `y`?
`x is y`	is `x` the same object as `y`?

2 < 3

True

"Data Science" != "Deep Learning"

True

2 == "2"

False

2 == 2.0

True

Note: we will discuss is next week.

Operators on Boolean values.

Operator	Description
`x and y`	are `x` and `y` both true?
`x or y`	is at least one of `x` and `y` true?
`not x`	is `x` false?

True and True

True

True and False

False

False or False

False

("Python 2" != "Python 3") and (2 <= 3)

True

not True

False

not not True

True

Casting#

Sometimes (but rarely) we need to explicitly cast a value from one type to another.
Python tries to do something reasonable, or throws an error if it has no ideas.

x = int(5.0)
x

type(x)

int

x = str(5.0)
x

'5.0'

type(x)

str

str(5.0) == 5.0

False

# list(5.0) # there is no reasonable thing to do here

int(5.3)

Lists and Tuples (20 min)#

Lists and tuples allow us to store multiple things (“elements”) in a single object.
The elements are ordered.

my_list = [1, 2, "THREE", 4, 0.5]

print(my_list)

[1, 2, 'THREE', 4, 0.5]

type(my_list)

list

You can get the length of the list with len:

len(my_list)

today = (1, 2, "THREE", 4, 0.5)

print(today)

(1, 2, 'THREE', 4, 0.5)

type(today)

tuple

len(today)

Indexing and Slicing Sequences#

We can access values inside a list, tuple, or string using the backet syntax.
Python uses zero-based indexing, which means the first element of the list is in position 0, not position 1.
Sadly, R uses one-based indexing, so get ready to be confused.

my_list

[1, 2, 'THREE', 4, 0.5]

my_list[0]

my_list[4]

0.5

my_list[5]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-75-075ca585e721> in <module>
----> 1 my_list[5]

IndexError: list index out of range

today[4]

0.5

We use negative indices to count backwards from the end of the list.

my_list

[1, 2, 'THREE', 4, 0.5]

my_list[-1]

0.5

We use the colon : to access a subsequence. This is called “slicing”.

my_list[1:4]

[2, 'THREE', 4]

Above: note that the start is inclusive and the end is exclusive.
So my_list[1:3] fetches elements 1 and 2, but not 3.
In other words, it gets the 2nd and 3rd elements in the list.

We can omit the start or end:

my_list[:3]

[1, 2, 'THREE']

my_list[3:]

[4, 0.5]

my_list[:] # *almost* same as my_list - more details next week

[1, 2, 'THREE', 4, 0.5]

Strings behave the same as lists and tuples when it comes to indexing and slicing.

alphabet = "abcdefghijklmnopqrstuvwxyz"

alphabet[0]

'a'

alphabet[-1]

'z'

alphabet[-3]

'x'

alphabet[:5]

'abcde'

alphabet[12:20]

'mnopqrst'

List Methods#

A list is an object and it has methods for interacting with its data.
For example, list.append(item) appends an item to the end of the list.
See the documentation for more list methods.

primes = [2,3,5,7,11]
primes

[2, 3, 5, 7, 11]

len(primes)

primes.append(13)

primes

[2, 3, 5, 7, 11, 13]

len(primes)

max(primes)

min(primes)

sum(primes)

[1,2,3] + ["Hello", 7]

[1, 2, 3, 'Hello', 7]

Sets#

Another built-in Python data type is the set, which stores an un-ordered list of unique items.
More on sets in DSCI 512.

s = {2,3,5,11}
s

{2, 3, 5, 11}

{1,2,3} == {3,2,1}

True

[1,2,3] == [3,2,1]

False

s.add(2) # does nothing
s

{2, 3, 5, 11}

s[0]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-125-c9c96910e542> in <module>
----> 1 s[0]

TypeError: 'set' object is not subscriptable

Above: throws an error because elements are not ordered.

Mutable vs. Immutable Types#

Strings and tuples are immutable types which means they cannot be modified.
Lists are mutable and we can assign new values for its various entries.
This is the main difference between lists and tuples.

names_list = ["Indiana","Fang","Linsey"]
names_list

['Indiana', 'Fang', 'Linsey']

names_list[0] = "Cool guy"
names_list

['Cool guy', 'Fang', 'Linsey']

names_tuple = ("Indiana","Fang","Linsey")
names_tuple

('Indiana', 'Fang', 'Linsey')

names_tuple[0] = "Not cool guy"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-130-bd6a1b77b220> in <module>
----> 1 names_tuple[0] = "Not cool guy"

TypeError: 'tuple' object does not support item assignment

Same goes for strings. Once defined we cannot modifiy the characters of the string.

my_name = "Mike"

my_name[-1] = 'q'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-132-94c4564b18e3> in <module>
----> 1 my_name[-1] = 'q'

TypeError: 'str' object does not support item assignment

x = ([1,2,3],5)

x[1] = 7

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-138-415ce6bd0126> in <module>
----> 1 x[1] = 7

TypeError: 'tuple' object does not support item assignment

([1, 2, 3], 5)

x[0][1] = 4

([1, 4, 3], 5)

Break (5 min)#

String Methods (5 min)#

There are various useful string methods in Python.
MDS-CL students will soon be the experts we can go to for help!

all_caps = "HOW ARE YOU TODAY?"
print(all_caps)

HOW ARE YOU TODAY?

new_str = all_caps.lower()
new_str

'how are you today?'

Note that the method lower doesn’t change the original string but rather returns a new one.

all_caps

'HOW ARE YOU TODAY?'

There are many string methods. Check out the documentation.

all_caps.split()

['HOW', 'ARE', 'YOU', 'TODAY?']

all_caps.count("O")

One can explicitly cast a string to a list:

caps_list = list(all_caps)
caps_list

['H',
 'O',
 'W',
 ' ',
 'A',
 'R',
 'E',
 ' ',
 'Y',
 'O',
 'U',
 ' ',
 'T',
 'O',
 'D',
 'A',
 'Y',
 '?']

len(all_caps)

len(caps_list)

String formatting#

Python has ways of creating strings by “filling in the blanks” and formatting them nicely.
There are a few ways of doing this. See here and here for some discussion.

Old formatting style (borrowed from the C programming language):

template = "Hello, my name is %s. I am %.2f years old."

template % ("Newborn Baby", 4/12)

'Hello, my name is Newborn Baby. I am 0.33 years old.'

New formatting style (see documentation):

template_new = "Hello, my name is {}. I am {:.2f} years old."

template_new.format('Newborn Baby', 4/12)

'Hello, my name is Newborn Baby. I am 0.33 years old.'

Newer formatting style (see here) - note the f before the start of the string:

name = "Newborn Baby"
age = 4/12
template_new = f'Hello, my name is {name}. I am {age:.2f} years old.'
template_new

'Hello, my name is Newborn Baby. I am 0.33 years old.'

Dictionaries (10 min)#

A dictionary is a mapping between key-values pairs.

house = {'bedrooms': 3, 'bathrooms': 2, 'city': 'Vancouver', 'price': 2499999, 'date_sold': (1,3,2015)}

condo = {'bedrooms' : 2, 
         'bathrooms': 1, 
         'city'     : 'Burnaby', 
         'price'    : 699999, 
         'date_sold': (27,8,2011)
        }

We can access a specific field of a dictionary with square brackets:

house['price']

condo['city']

'Burnaby'

We can also edit dictionaries (they are mutable):

condo['price'] = 5 # price already in the dict
condo

{'bedrooms': 2,
 'bathrooms': 1,
 'city': 'Burnaby',
 'price': 5,
 'date_sold': (27, 8, 2011)}

condo['flooring'] = "wood"

condo

{'bedrooms': 2,
 'bathrooms': 1,
 'city': 'Burnaby',
 'price': 5,
 'date_sold': (27, 8, 2011),
 'flooring': 'wood'}

We can delete fields entirely (though I rarely use this):

del condo["city"]

condo

{'bedrooms': 2,
 'bathrooms': 1,
 'price': 5,
 'date_sold': (27, 8, 2011),
 'flooring': 'wood'}

condo[5] = 443345

condo

{'bedrooms': 2,
 'bathrooms': 1,
 'price': 5,
 'date_sold': (27, 8, 2011),
 'flooring': 'wood',
 5: 443345}

condo[(1,2,3)] = 777
condo

{'bedrooms': 2,
 'bathrooms': 1,
 'price': 5,
 'date_sold': (27, 8, 2011),
 'flooring': 'wood',
 5: 443345,
 (1, 2, 3): 777}

condo["nothere"]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[127], line 1
----> 1 condo["nothere"]

KeyError: 'nothere'

A sometimes useful trick about default values:

condo["bedrooms"]

is shorthand for

condo.get("bedrooms")

With this syntax you can also use default values:

condo.get("bedrooms", "unknown")

condo.get("fireplaces", "unknown")

'unknown'

A common operation is finding the maximum dictionary key by value.
There are a few ways to do this, see this StackOverflow page.
One way of doing it:

max(word_lengths, key=word_lengths.get)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-175-4aea82f80396> in <module>
----> 1 max(word_lengths, key=word_lengths.get)

NameError: name 'word_lengths' is not defined

We saw word_lengths.get above - it is saying that we should call this function on each key of the dict to decide how to sort.

Empties#

lst = list() # empty list
lst

[]

lst = [] # empty list
lst

[]

tup = tuple() # empty tuple
tup

()

tup = () # empty tuple
tup

()

dic = dict() # empty dict
dic

{}

dic = {} # empty dict
dic

{}

st = set() # emtpy set
st

set()

st = {} # NOT an empty set!
type(st)

dict

st = {1}
type(st)

set

Conditionals (10 min)#

Conditional statements allow us to write programs where only certain blocks of code are executed depending on the state of the program.
Let’s look at some examples and take note of the keywords, syntax and indentation.
Check out the Python documentation and Think Python (Chapter 5) for more information about conditional execution.

name = input("What's your name?")

if name.lower() == 'mike':
    print("That's my name too!")
elif name.lower() == 'santa':
    print("That's a funny name.")
else:
    print("Hello {}! That's a cool name.".format(name))

    print('Nice to meet you!')

What's your name?mike
That's my name too!

bool(None)

False

The main points to notice:

Use keywords if, elif and else
The colon : ends each conditional expression
Indentation (by 4 empty space) defines code blocks
In an if statement, the first block whose conditional statement returns True is executed and the program exits the if block
if statements don’t necessarily need elif or else
elif lets us check several conditions
else lets us evaluate a default block if all other conditions are False
the end of the entire if statement is where the indentation returns to the same level as the first if keyword

If statements can also be nested inside of one another:

name = input("What's your name?")

if name.lower() == 'mike':
    print("That's my name too!")
elif name.lower() == 'santa':
    print("That's a funny name.")
else:
    print("Hello {0}! That's a cool name.".format(name))
    if name.lower().startswith("super"):
        print("Do you have superpowers?")

print('Nice to meet you!')

What's your name?supersam
Hello supersam! That's a cool name.
Do you have superpowers?
Nice to meet you!

Inline if/else#

words = ["the", "list", "of", "words"]

x = "long list" if len(words) > 10 else "short list"
x

'short list'

if len(words) > 10:
    x = "long list"
else:
    x = "short list"

'short list'

(optional) short-circruiting#

BLAH # not defined

True or BLAH

True and BLAH

False and BLAH