Introduction to Python#

In this class, we will watch the first of four lectures by Dr. Mike Gelbart, option co-director of the UBC-Vancouver MDS program.

Attribution#

About this course (5 min)#

High-level overview:#

  • The MDS program has a programming prerequisite.

  • Therefore, this course does not start from “no programming knowledge”.

    • You should know what an if statement is.

    • You should know what a for loop is.

    • You should know what a function is.

  • However, not all of you have used Python/R.

  • So, this course is about Python-specific and R-specific syntax/knowledge.

  • We will cover things like loops, but just the syntax, not the concept of a loop.

  • Weeks 1&2: Python, lectures by Mike Gelbart

  • Weeks 3&4: R, lectures by Tiffany Timbers

Lecture Outline:#

  • Basic datatypes (20 min)

  • Lists and tuples (20 min)

  • Break (5 min)

  • String methods (5 min)

  • Dictionaries (10 min)

  • Conditionals (10 min)

Basic datatypes (20 min)#

  • A value is a piece of data that a computer program works with such as a number or text.

  • There are different types of values: 42 is an integer and "Hello!" is a string.

  • A variable is a name that refers to a value.

    • In mathematics and statistics, we usually use variables names like \(x\) and \(y\).

    • In Python, we can use any word as a variable name (as long as it starts with a letter and is not a reserved word in Python such as for, while, class, lambda, etc.).

  • And we use the assignment operator = to assign a value to a variable.

See the Python 3 documentation for a summary of the standard built-in Python datatypes. See Think Python (Chapter 2) for a discussion of variables, expressions and statements in Python.

Common built-in Python data types#

English name

Type name

Description

Example

integer

int

positive/negative whole numbers

42

floating point number

float

real number in decimal form

3.14159

boolean

bool

true or false

True

string

str

text

"I Can Has Cheezburger?"

list

list

a collection of objects - mutable & ordered

['Ali','Xinyi','Miriam']

tuple

tuple

a collection of objects - immutable & ordered

('Thursday',6,9,2018)

dictionary

dict

mapping of key-value pairs

{'name':'DSCI','code':511,'credits':2}

none

NoneType

represents no value

None

Numeric Types#

x = 42
type(x)
int
print(x)
42
x # in Jupyter/IPython we don't need to explicitly print for the last line of a cell
42
pi = 3.14159
print(pi)
3.14159
type(pi)
float
λ = 2

Arithmetic Operators#

The syntax for the arithmetic operators are:

Operator

Description

+

addition

-

subtraction

*

multiplication

/

division

**

exponentiation

//

integer division

%

modulo

Let’s apply these operators to numeric types and observe the results.

1 + 2 + 3 + 4 + 5
15
0.1 + 0.2
0.30000000000000004

Tip

From Firas: This is floating point arithmetic. For an explanation of what’s going on, see this tutorial.

2 * 3.14159
6.28318
2**10
1024
type(2**10)
int
2.0**10
1024.0
int_2 = 2
float_2 = 2.0
float_2_again = 2.
101 / 2
50.5
101 // 2 # "integer division" - always rounds down
50
101 % 2 # "101 mod 2", or the remainder when 101 is divided by 2
1

None#

  • NoneType is its own type in Python.

  • It only has one possible value, None

x = None
print(x)
None
type(x)
NoneType

You may have seen similar things in other languages, like null in Java, etc.

Strings#

  • Text is stored as a type called a string.

  • We think of a string as a sequence of characters.

  • We write strings as characters enclosed with either:

    • single quotes, e.g., 'Hello'

    • double quotes, e.g., "Goodbye"

    • triple single quotes, e.g., '''Yesterday'''

    • triple double quotes, e.g., """Tomorrow"""

my_name = "Mike Gelbart"
print(my_name)
Mike Gelbart
type(my_name)
str
course = 'DSCI 511'
print(course)
DSCI 511
type(course)
str

If the string contains a quotation or apostrophe, we can use double quotes or triple quotes to define the string.

sentence = "It's a rainy day."
print(sentence)
It's a rainy day.
type(sentence)
str
saying = '''They say: 
"It's a rainy day!"'''
print(saying)
They say: 
"It's a rainy day!"

Boolean#

  • The Boolean (bool) type has two values: True and False.

the_truth = True
print(the_truth)
True
type(the_truth)
bool
lies = False
print(lies)
False
type(lies)
bool

Comparison Operators#

Compare objects using comparison operators. The result is a Boolean value.

Operator

Description

x == y

is x equal to y?

x != y

is x not equal to y?

x > y

is x greater than y?

x >= y

is x greater than or equal to y?

x < y

is x less than y?

x <= y

is x less than or equal to y?

x is y

is x the same object as y?

2 < 3
True
"Data Science" != "Deep Learning"
True
2 == "2"
False
2 == 2.0
True

Note: we will discuss is next week.

Operators on Boolean values.

Operator

Description

x and y

are x and y both true?

x or y

is at least one of x and y true?

not x

is x false?

True and True
True
True and False
False
False or False
False
("Python 2" != "Python 3") and (2 <= 3)
True
not True
False
not not True
True

Casting#

  • Sometimes (but rarely) we need to explicitly cast a value from one type to another.

  • Python tries to do something reasonable, or throws an error if it has no ideas.

x = int(5.0)
x
5
type(x)
int
x = str(5.0)
x
'5.0'
type(x)
str
str(5.0) == 5.0
False
# list(5.0) # there is no reasonable thing to do here
int(5.3)
5

Lists and Tuples (20 min)#

  • Lists and tuples allow us to store multiple things (“elements”) in a single object.

  • The elements are ordered.

my_list = [1, 2, "THREE", 4, 0.5]
print(my_list)
[1, 2, 'THREE', 4, 0.5]
type(my_list)
list

You can get the length of the list with len:

len(my_list)
5
today = (1, 2, "THREE", 4, 0.5)
print(today)
(1, 2, 'THREE', 4, 0.5)
type(today)
tuple
len(today)
5

Indexing and Slicing Sequences#

  • We can access values inside a list, tuple, or string using the backet syntax.

  • Python uses zero-based indexing, which means the first element of the list is in position 0, not position 1.

  • Sadly, R uses one-based indexing, so get ready to be confused.

my_list
[1, 2, 'THREE', 4, 0.5]
my_list[0]
1
my_list[4]
0.5
my_list[5]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-75-075ca585e721> in <module>
----> 1 my_list[5]

IndexError: list index out of range
today[4]
0.5

We use negative indices to count backwards from the end of the list.

my_list
[1, 2, 'THREE', 4, 0.5]
my_list[-1]
0.5

We use the colon : to access a subsequence. This is called “slicing”.

my_list[1:4]
[2, 'THREE', 4]
  • Above: note that the start is inclusive and the end is exclusive.

  • So my_list[1:3] fetches elements 1 and 2, but not 3.

  • In other words, it gets the 2nd and 3rd elements in the list.

We can omit the start or end:

my_list[:3]
[1, 2, 'THREE']
my_list[3:]
[4, 0.5]
my_list[:] # *almost* same as my_list - more details next week
[1, 2, 'THREE', 4, 0.5]

Strings behave the same as lists and tuples when it comes to indexing and slicing.

alphabet = "abcdefghijklmnopqrstuvwxyz"
alphabet[0]
'a'
alphabet[-1]
'z'
alphabet[-3]
'x'
alphabet[:5]
'abcde'
alphabet[12:20]
'mnopqrst'

List Methods#

  • A list is an object and it has methods for interacting with its data.

  • For example, list.append(item) appends an item to the end of the list.

  • See the documentation for more list methods.

primes = [2,3,5,7,11]
primes
[2, 3, 5, 7, 11]
len(primes)
5
primes.append(13)
primes
[2, 3, 5, 7, 11, 13]
len(primes)
6
max(primes)
13
min(primes)
2
sum(primes)
41
[1,2,3] + ["Hello", 7]
[1, 2, 3, 'Hello', 7]

Sets#

  • Another built-in Python data type is the set, which stores an un-ordered list of unique items.

  • More on sets in DSCI 512.

s = {2,3,5,11}
s
{2, 3, 5, 11}
{1,2,3} == {3,2,1}
True
[1,2,3] == [3,2,1]
False
s.add(2) # does nothing
s
{2, 3, 5, 11}
s[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-125-c9c96910e542> in <module>
----> 1 s[0]

TypeError: 'set' object is not subscriptable

Above: throws an error because elements are not ordered.

Mutable vs. Immutable Types#

  • Strings and tuples are immutable types which means they cannot be modified.

  • Lists are mutable and we can assign new values for its various entries.

  • This is the main difference between lists and tuples.

names_list = ["Indiana","Fang","Linsey"]
names_list
['Indiana', 'Fang', 'Linsey']
names_list[0] = "Cool guy"
names_list
['Cool guy', 'Fang', 'Linsey']
names_tuple = ("Indiana","Fang","Linsey")
names_tuple
('Indiana', 'Fang', 'Linsey')
names_tuple[0] = "Not cool guy"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-130-bd6a1b77b220> in <module>
----> 1 names_tuple[0] = "Not cool guy"

TypeError: 'tuple' object does not support item assignment

Same goes for strings. Once defined we cannot modifiy the characters of the string.

my_name = "Mike"
my_name[-1] = 'q'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-132-94c4564b18e3> in <module>
----> 1 my_name[-1] = 'q'

TypeError: 'str' object does not support item assignment
x = ([1,2,3],5)
x[1] = 7
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-138-415ce6bd0126> in <module>
----> 1 x[1] = 7

TypeError: 'tuple' object does not support item assignment
x
([1, 2, 3], 5)
x[0][1] = 4
x
([1, 4, 3], 5)

Break (5 min)#

String Methods (5 min)#

  • There are various useful string methods in Python.

  • MDS-CL students will soon be the experts we can go to for help!

all_caps = "HOW ARE YOU TODAY?"
print(all_caps)
HOW ARE YOU TODAY?
new_str = all_caps.lower()
new_str
'how are you today?'

Note that the method lower doesn’t change the original string but rather returns a new one.

all_caps
'HOW ARE YOU TODAY?'

There are many string methods. Check out the documentation.

all_caps.split()
['HOW', 'ARE', 'YOU', 'TODAY?']
all_caps.count("O")
3

One can explicitly cast a string to a list:

caps_list = list(all_caps)
caps_list
['H',
 'O',
 'W',
 ' ',
 'A',
 'R',
 'E',
 ' ',
 'Y',
 'O',
 'U',
 ' ',
 'T',
 'O',
 'D',
 'A',
 'Y',
 '?']
len(all_caps)
18
len(caps_list)
18

String formatting#

  • Python has ways of creating strings by “filling in the blanks” and formatting them nicely.

  • There are a few ways of doing this. See here and here for some discussion.

Old formatting style (borrowed from the C programming language):

template = "Hello, my name is %s. I am %.2f years old."
template % ("Newborn Baby", 4/12)
'Hello, my name is Newborn Baby. I am 0.33 years old.'

New formatting style (see documentation):

template_new = "Hello, my name is {}. I am {:.2f} years old."
template_new.format('Newborn Baby', 4/12)
'Hello, my name is Newborn Baby. I am 0.33 years old.'

Newer formatting style (see here) - note the f before the start of the string:

name = "Newborn Baby"
age = 4/12
template_new = f'Hello, my name is {name}. I am {age:.2f} years old.'
template_new
'Hello, my name is Newborn Baby. I am 0.33 years old.'

Dictionaries (10 min)#

A dictionary is a mapping between key-values pairs.

house = {'bedrooms': 3, 'bathrooms': 2, 'city': 'Vancouver', 'price': 2499999, 'date_sold': (1,3,2015)}

condo = {'bedrooms' : 2, 
         'bathrooms': 1, 
         'city'     : 'Burnaby', 
         'price'    : 699999, 
         'date_sold': (27,8,2011)
        }

We can access a specific field of a dictionary with square brackets:

house['price']
2499999
condo['city']
'Burnaby'

We can also edit dictionaries (they are mutable):

condo['price'] = 5 # price already in the dict
condo
{'bedrooms': 2,
 'bathrooms': 1,
 'city': 'Burnaby',
 'price': 5,
 'date_sold': (27, 8, 2011)}
condo['flooring'] = "wood"
condo
{'bedrooms': 2,
 'bathrooms': 1,
 'city': 'Burnaby',
 'price': 5,
 'date_sold': (27, 8, 2011),
 'flooring': 'wood'}

We can delete fields entirely (though I rarely use this):

del condo["city"]
condo
{'bedrooms': 2,
 'bathrooms': 1,
 'price': 5,
 'date_sold': (27, 8, 2011),
 'flooring': 'wood'}
condo[5] = 443345
condo
{'bedrooms': 2,
 'bathrooms': 1,
 'price': 5,
 'date_sold': (27, 8, 2011),
 'flooring': 'wood',
 5: 443345}
condo[(1,2,3)] = 777
condo
{'bedrooms': 2,
 'bathrooms': 1,
 'price': 5,
 'date_sold': (27, 8, 2011),
 'flooring': 'wood',
 5: 443345,
 (1, 2, 3): 777}
condo["nothere"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[127], line 1
----> 1 condo["nothere"]

KeyError: 'nothere'

A sometimes useful trick about default values:

condo["bedrooms"]
2

is shorthand for

condo.get("bedrooms")
2

With this syntax you can also use default values:

condo.get("bedrooms", "unknown")
2
condo.get("fireplaces", "unknown")
'unknown'
  • A common operation is finding the maximum dictionary key by value.

  • There are a few ways to do this, see this StackOverflow page.

  • One way of doing it:

max(word_lengths, key=word_lengths.get)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-175-4aea82f80396> in <module>
----> 1 max(word_lengths, key=word_lengths.get)

NameError: name 'word_lengths' is not defined

We saw word_lengths.get above - it is saying that we should call this function on each key of the dict to decide how to sort.

Empties#

lst = list() # empty list
lst
[]
lst = [] # empty list
lst
[]
tup = tuple() # empty tuple
tup
()
tup = () # empty tuple
tup
()
dic = dict() # empty dict
dic
{}
dic = {} # empty dict
dic
{}
st = set() # emtpy set
st
set()
st = {} # NOT an empty set!
type(st)
dict
st = {1}
type(st)
set

Conditionals (10 min)#

  • Conditional statements allow us to write programs where only certain blocks of code are executed depending on the state of the program.

  • Let’s look at some examples and take note of the keywords, syntax and indentation.

  • Check out the Python documentation and Think Python (Chapter 5) for more information about conditional execution.

name = input("What's your name?")

if name.lower() == 'mike':
    print("That's my name too!")
elif name.lower() == 'santa':
    print("That's a funny name.")
else:
    print("Hello {}! That's a cool name.".format(name))

    print('Nice to meet you!')
What's your name?mike
That's my name too!
bool(None)
False

The main points to notice:

  • Use keywords if, elif and else

  • The colon : ends each conditional expression

  • Indentation (by 4 empty space) defines code blocks

  • In an if statement, the first block whose conditional statement returns True is executed and the program exits the if block

  • if statements don’t necessarily need elif or else

  • elif lets us check several conditions

  • else lets us evaluate a default block if all other conditions are False

  • the end of the entire if statement is where the indentation returns to the same level as the first if keyword

If statements can also be nested inside of one another:

name = input("What's your name?")

if name.lower() == 'mike':
    print("That's my name too!")
elif name.lower() == 'santa':
    print("That's a funny name.")
else:
    print("Hello {0}! That's a cool name.".format(name))
    if name.lower().startswith("super"):
        print("Do you have superpowers?")

print('Nice to meet you!')
What's your name?supersam
Hello supersam! That's a cool name.
Do you have superpowers?
Nice to meet you!

Inline if/else#

words = ["the", "list", "of", "words"]

x = "long list" if len(words) > 10 else "short list"
x
'short list'
if len(words) > 10:
    x = "long list"
else:
    x = "short list"
x
'short list'

(optional) short-circruiting#

BLAH # not defined
True or BLAH
True and BLAH
False and BLAH