Pyton - part 1

Language basics, collections, and hardware-software interfaceMaterial adapted from Justin Johnson’s Python tutorial, from Python 3 Tutorial, from Cornell Virtual Workshop tutorial on Python for High Performance, and from When to Use a List Comprehension in Python by James Timmins.A Jupyter notebook version of this material is available online in Google Colab.

Lesson plan

Q&A for homework 1 + lecture outline
Language basics (with an in-class exercise), common operations, and built-in values
ICE: Write a program to compute the greatest common divisor
Collections and loops (with a couple of ICEs)
Hardware-software interface: virtual space
Summary

Basics of Python

Python overview

Python is easy to start working with.
It is a great general-purpose programming language on its own, but with the help of a few popular libraries (numpy, scipy, matplotlib) it becomes a powerful environment for scientific computing (similar to Matlab or Octave).
It is used by major deep learning frameworks like PyTorch and TensorFlow.
Python is a high-level, strongly typed (interpreter always respects the types of each variable), dynamically typed (a variable is simply a value bound to a name), multiparadigm programming language (one can code in imperative as well as functional paradigms). Python code is often said to be almost like pseudocode, since it allows you to express very powerful ideas in very few lines of code while being very readable.

Python Internals

Python is interpreted (i.e., executed in an interpreter)
Compiled programs (e.g., C/C++, Fortran) generally run faster than interpreted programs
Interpreted languages are often better for rapid prototyping and high-level program control
Optimizing “time to science” might suggest prioritizing program development time over computational run time (or maybe vice versa)
How should we best use Python for applications that require numerically intensive computations?
Can we have the best of both worlds (expressiveness and performance)?
If there are tradeoffs, where are they and how can they be mitigated?
Python is both an interpreted and a compiled language. But calling Python a compiled language would be misleading. This would imply that the compiler translates the Python code into machine language. Python code is translated into intermediate code, then compiled by a VM implementation, known as the PVM, the Python Virtual Machine, into machine instructions (most commonly using C). This is a similar approach to the one taken by Java. There is even a way of translating Python programs into Java byte code for the Java Virtual Machine (JVM). This can be achieved with Jython.

Because of this, Python is “slower,” but it can run highly optimized C/C++ subroutines which make scientific computing (e.g. matrix multiplication with numpy) really fast.

Therefore, should we compile Python scripts to make them faster or how can Python scripts be compiled? Normally, you don’t need to do anything and you shouldn’t bother, because “Python” is already doing the thinking for you, i.e., it takes the necessary steps automatically.

For whatever reason you want to compile a python program manually? No problem. It can be done with the module py_compile, either using the interpreter shell or using the following command at the shell prompt

import py_compile
py_compile.compile('my_first_simple_program.py')

!python -m py_compile my_first_simple_program.py

Either way, you may notice two things. First, there will be a new subdirectory “pycache”, if it doesn’t already exist. You will find a file “my_first_simple_script.cpython-34.pyc” in this subdirectory. This is the compiled version of our file in byte code.

You can also automatically compile all Python files using the compileall module. You can do it from the shell prompt by running compileall.py and providing the path of the directory containing the Python files to compile.

However, you don’t have to and shouldn’t bother about compiling Python code. The compilation is hidden from the user for a good reason. Some newbies wonder sometimes where these ominous files with the .pyc suffix might come from. If Python has write-access for the directory where the Python program resides, it will store the compiled byte code in a file that ends with a .pyc suffix. If Python has no write access, the program will work anyway. The byte code will be produced but discarded when the program exits. Whenever a Python program is called, Python will check, if a compiled version with the .pyc suffix exists. This file has to be newer than the file with the .py suffix. If such a file exists, Python will load the byte code, which will speed up the start up time of the script. If there is no byte code version, Python will create the byte code before it starts the execution of the program. Execution of a Python program means execution of the byte code on the Python.

Every time a Python script is executed, a byte code is created. If a Python script is imported as a module, the byte code will be stored in the corresponding .pyc file.

Comments

# single-line comments

'''
multi-line comments
'''

"""
multi-line comments as well
"""

Indenting Code

Python programs get structured through indentation, i.e., code blocks are defined by their indentation. Okay that’s what we expect from any program code, isn’t it? Yes, but in the case of Python it’s a language requirement, not a matter of style. This principle makes it easier to read and understand other people’s Python code.

For example, the following program implements an algorithm to calculate Pythagorean triples. A Pythagorean triple consists of three positive integers $a$, $b$, and $c$, such that $a^2 + b^2 = c^2$.

# calculate Pythagorean triples
from math import sqrt
n = input("Maximum Number? ")
n = int(n)+1
for a in range(1,n):
    for b in range(a,n):
        c_square = a**2 + b**2
        c = int(sqrt(c_square))
    if ((c_square - c**2) == 0):
        print(a, b, c)

So, how does it work? All statements with the same distance to the right belong to the same block of code, i.e. the statements within a block line up vertically. The block ends at a line less indented or the end of the file. If a block has to be more deeply nested, it is simply indented further to the right.

In addition, loops and conditional statements end with a colon, :. The same is true for functions and other structures introducing blocks. All in all, Python structures by colons and indentation.

In-class exercise

What do you think the following function does?

def someGreatFunction(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return someGreatFunction(left) + middle + someGreatFunction(right)

print (someGreatFunction([3,6,8,10,1,2,1]))

Output and Print Statements

Python’s print statement is very flexible. It can print many data types and data structures and even on the same line if separated by commas. Additional tricks for printing and formating can be found in the documentation.

dictionary = {'a': 'apple', 'b': 'banana', 'c': 'crabcakes'}
array = [1, 2, 3, 4, 5]

print (1, 'hello', 2, dictionary, array)

Basic data types

Have you programmed low-level languages like C, C++ or other similar programming languages? If so, you might think you know already enough about data types and variables. You know a lot, that’s right, but not enough for Python, as there are differences in the way Python and C deal with variables. There are integers, floating point numbers, strings, and many more, but things are not the same as in C or C++.

If you want to use lists or associative arrays in C, e.g., you will have to construct the data type list or associative arrays from scratch, i.e., design memory structure and the allocation management. You will have to implement the necessary search and access methods as well. Python provides power data types like lists and associative arrays called "dict", as a genuine part of the language.

Variables in Python

There is no declaration of variables required in Python, which makes it quite easy. It’s not even possible to declare the variables. If there is need for a variable, you should think of a name and start using it as a variable.

Another remarkable aspect of Python: Not only the value of a variable may change during program execution, but the type as well. You can assign an integer value to a variable, use it as an integer for a while and then assign a string to the same variable.

i = 42          # data type is implicitly set to integer
print (i)
i = 42 + 0.11   # data type is changed to float
print (i)
i = "forty"     # and now it will be a string
print (i)

When Python executes an assignment like “i = 42”, it evaluates the right side of the assignment and recognizes that it corresponds to the integer number 42. It creates an object of the integer class to save this data. In other words, Python automatically takes care of the physical representation for the different data types.

Object References

We want to take a closer look on variables now. Python variables are references to objects, but the actual data is contained in the objects. As variables are pointing to objects and objects can be of arbitrary data types, variables cannot have types associated with them. This is a huge difference from C, C++ or Java, where a variable is associated with a fixed data type. In these languages, this association can’t be changed as long as the program is running.

Therefore it is possible to write code like the following in Python:

x = 42
print(x)
x = "Now x references a string"
print(x)

We want to demonstrate something else now. Let’s look at the following code:

x = 42
y = x

We created an integer object 42 and assigned it to the variable x. After this we assigned x to the variable y. This means that both variables reference the same object.

What will happen when we execute y = 78 after the previous code?

Python will create a new integer object with the content 78 and then the variable y will reference this newly created object.

Most probably, we will see further changes to the variables in the flow of our program. There might be, for example, a string assignment to the variable x. The previously integer object 42 will be orphaned after this assignment. It will be removed by Python, because no other variable is referencing it.

You may ask yourself, how can we see or prove that x and y really reference the same object after the assignment y = x of our previous example?

The identity function id() can be used for this purpose. Every instance (object or variable) has an identity, i.e., an integer which is unique within the script or program, i.e., other objects have different identities. So, let’s have a look at our previous example and how the identities will change:

x = 42
id(x)

y = x
id(x), id(y)

y = 78
id(x), id(y)

Valid Variable Names

The naming of variables follows the more general concept of an identifier. A Python identifier is a name used to identify a variable, function, class, module or other object.

A variable name and an identifier can consist of the uppercase letters “A” through “Z”, the lowercase letters “a” through “z”, the underscore _ and, except for the first character, the digits 0 through 9. Python 3.x is based on Unicode. That is, variable names and identifier names can additionally contain Unicode characters as well.

Identifiers are unlimited in length. Case is significant. The fact that identifier names are case-sensitive can cause problems to some Windows users, where file names are case-insensitive.

Exceptions from the rules above are the special Python keywords.

Numbers

Python’s built-in core data types are in some cases also called object types. There are three built-in data types for numbers: integer (which in Python3 can be of unlimited size), hexadecimal, and binary. There are also two kinds of division operators:

“true division” performed by “/”
“floor division” performed by “//”

x = 787366098712738903245678234782358292837498729182728
print(x)
x * x * x

Integers and floats work as you would expect from other languages:

x = 3
print (x, type(x))

print (x + 1)   # Addition;
print (x - 1)   # Subtraction;
print (x * 2)   # Multiplication;
print (x ** 2)  # Exponentiation;

x += 1
print (x)  # Prints "4"
x *= 2
print (x)  # Prints "8"

y = 2.5
print (type(y))                 # Prints "<type 'float'>"
print (y, y + 1, y * 2, y ** 2) # Prints "2.5 3.5 5.0 6.25"

Note that unlike many languages, Python does not have unary increment (x++) or decrement (x--) operators.

Python also has built-in types for long integers and complex numbers; you can find all of the details in the documentation.

Booleans and None

Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (&&, ||, etc.):

t, f = True, False
print (type(t)) # Prints "<type 'bool'>"

Now let’s look at the operations:

print (t and f) # Logical AND;
print (t or f)  # Logical OR;
print (not t)   # Logical NOT;
print (t != f)  # Logical XOR;

None is a valid object and can be used like one. It represents the absence of something.

x = None
print (x)

Strings

Like strings in Java and unlike C or C++, Python strings cannot be changed. Trying to change an indexed position will raise an error.

Strings are created by putting a sequence of characters in quotes. Strings can be surrounded by single quotes, double quotes or triple quotes, which are made up of three single or three double quotes. Strings are immutable. In other words, once defined, they cannot be changed. We will cover this topic in detail shortly.

hello = 'hello'   # String literals can use single quotes
world = "world"   # or double quotes; it does not matter.
print (hello, len(hello))

hw = hello + ' ' + world  # String concatenation
print (hw)                # prints "hello world"

hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
print (hw12)                            # prints "hello world 12"

"Hello" + " " + "World"

A string in triple quotes can span several lines without using the escape character:

city = """
... Toronto is the largest city in Canada
... and the provincial capital of Ontario.
... It is located in Southern Ontario on the
... northwestern shore of Lake Ontario.
... """
print(city)

Multiplication on strings is defined, which is essentially a multiple concatenation:

":hello:" * 4

String objects have several useful methods, such as:

s = "hello"
print (s.capitalize())           # Capitalize a string; prints "Hello"
print (s.upper())                # Convert a string to uppercase; prints "HELLO"
print (s.rjust(7))               # Right-justify a string, padding with spaces; prints "  hello"
print (s.center(7))              # Center a string, padding with spaces; prints " hello "
print (s.replace('l', '(ell)'))  # Replace all instances of one substring with another;
                                 # prints "he(ell)(ell)o"
print ('  world '.strip())       # Strip leading and trailing whitespace; prints "world"

String equality checks

If both a and b are strings, the is-operator checks if they have the same identity, i.e., share the same memory location. If a is b is True, then it trivially follows that a == b has to be True as well. Yet, if a == b is True it doesn’t imply that a is b is True as well!

Let’s have a look at how strings are stored in Python:

a = "Linux"
b = "Linux"
a is b

Okay, but what happens, if the strings are longer? We use the longest village name in the world in the following example. It’s a small village with about 3000 inhabitants in the South of the island of Anglesey in the North-West of Wales:

a = "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
b = "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
a is b

Nothing has changed in our first “Linux” example. But what works for Wales doesn’t work for Baden-Württemberg in Germany:

a = "Baden-Württemberg"
b = "Baden-Württemberg"
a is b

a == b

You are right, it has nothing to do with geographical places. The special character, i.e., the hyphen, is to “blame”.

a = "Baden!"
b = "Baden!"
a is b

a = "Baden1"
b = "Baden1"
a is b

You can find a list of all string methods in the documentation.

In-class exercise

Write a program to compute the greatest common divisor (GCD) of two integers, $a$ and $b$, using Euclid’s algorithm.

Containers

Python includes several built-in container types: lists, dictionaries, sets, and tuples.

Lists

A list is the Python equivalent of an array (its elements can be updated, thus mutable), but is resizeable and can contain elements of different types:

x = []                                 # empty list
y = list()                             # empty list
stuff = [1, ['hi','bye'], -0.12, None] # Can mix types

xs = [3, 1, 2]   # Create a list
print (xs, xs[2])
print (xs[-1])   # Negative indices count from the end of the list; prints "2"

xs[2] = 'foo'    # Lists can contain elements of different types
print (xs)

xs.append('bar') # Add a new element to the end of the list
print (xs)

x = xs.pop()     # Remove and return the last element of the list
print (x, xs)

We say earlier how repetition operator can be used on strings. We can apply it to nested lists as well:

x = ["a","b","c"]
y = [x] * 4
y

y[0][0] = "p"
y

This result is quite astonishing for beginners of Python programming. We have assigned a new value to the first element of the first sublist of y, i.e. y[0][0] and we have “automatically” changed the first elements of all the sublists in y, i.e. y[1][0], y[2][0], y[3][0].

The reason is that the repetition operator, * 4, creates 4 references to the list x: and so it’s clear that every element of y is changed, if we apply a new value to y[0][0].

As usual, you can find all the gory details about lists in the documentation.

Slicing

In addition to accessing list elements one at a time, Python provides concise syntax to access sublists. This is known as slicing.

nums = list(range(5))# range is a built-in function that creates a list of integers
print (nums)         # Prints "[0, 1, 2, 3, 4]"
print (nums[2:4])    # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print (nums[2:])     # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print (nums[:2])     # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print (nums[:])      # Get a slice of the whole list; prints ["0, 1, 2, 3, 4]"
print (nums[:-1])    # Slice indices can be negative; the index wraps around; this prints ["0, 1, 2, 3]"
print (nums[2:-1])   # can mix and match positive and negative indeces
nums[2:4] = [8, 9]   # Assign a new sublist to a slice
print (nums)         # Prints "[0, 1, 8, 8, 4]"

Loops

Python has two primitive loop commands: while loops and for loops (i.e., there are no do-while loops in Python).

You can loop over the elements of a list like this:

animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print (animal)

If you want access to the index of each element within the body of a loop, use the built-in enumerate function:

animals = ['cat', 'dog', 'monkey']
for idx, animal in enumerate(animals):
    print ('#%d: %s' % (idx + 1, animal))

Lists can also be used in loops (e.g., range(0,5)). This is similar to for (i = 0; i < 5; i++) in C/C++ and Java.

for i in range(0,5):
  print (i)

List comprehensions

When programming, frequently we want to transform some values. As a simple example, consider the following code that computes square numbers:

nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print (squares)

You can make this code simpler using a list comprehension:

nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print (squares)

Three elements in every comprehension

Every list comprehension in Python includes three elements:

expression is the member itself, a call to a method, or any other valid expression that returns a value. In the example above, the expression i * i is the square of the member value.
member is the object or value in the list or iterable. In the example above, the member value is i.
iterable is a list, set, sequence, generator, or any other object that can return its elements one at a time. In the example above, the iterable is range(10).

new_list = [expression for member in iterable]

Rather than creating an empty list and adding each element to the end using a for loop, you simply define the list and its contents at the same time by following this format:

squares = [i * i for i in range(10)]
squares
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Conditionals in comprehensions

Filtering conditionals

A more complete description of the comprehension formula adds support for optional conditionals. The most common way to add conditional logic to a list comprehension is to add a conditional to the end of the expression:

new_list = [expression for member in iterable (if conditional)]

Here, your conditional statement comes just before the closing bracket.

Conditionals are important because they allow list comprehensions to filter out unwanted values, which would normally require a call to filter():

sentence = 'the rocket came back from mars'
# filter out any characters in sentence that aren’t a vowel
vowels = [i for i in sentence if i in 'aeiou']
vowels

Different handling conditionals

You can place the conditional at the end of the statement for simple filtering, but what if you want to change a member value instead of filtering it out? In this case, it’s useful to place the conditional near the beginning of the expression:

new_list = [expression (if conditional) for member in iterable]

With this formula, you can use conditional logic to select from multiple possible output options. For example, if you have a list of prices, then you may want to replace negative prices with 0 and leave the positive values unchanged:

original_prices = [1.25, -9.45, 10.22, 3.78, -5.92, 1.16]
prices = [i if i > 0 else 0 for i in original_prices]
prices

Here, your expression i contains a conditional statement, if i > 0 else 0. This tells Python to output the value of i if the number is positive, but to change i to 0 if the number is negative. If this seems overwhelming, then it may be helpful to view the conditional logic as its own function:

def get_price(price):
    return price if price > 0 else 0

prices = [get_price(i) for i in original_prices]
prices

Now, your conditional statement is contained within get_price(), and you can use it as part of your list comprehension expression.

Another example of list comprehension with conditions:

nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
some_squares_some_cubes = [x ** 2 if x % 2 == 0 else x ** 3 for x in nums]
print (even_squares)
print (some_squares_some_cubes)

In-class exercise

Use list comprehensions to find the prime numbers between 1 and 100.

Dictionaries

A dictionary stores (key, value) pairs, similar to a Map in Java or an object in Javascript. You can use it like this:

d = {}                               # empty dictionary
d = dict()                           # emtpy dictionary
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
d['bunny'] = 'cute2'                 # add another item
print (d['cat'])                     # Get an entry from a dictionary; prints "cute"
print ('cat' in d)                   # Check if a dictionary has a given key; prints "True"

d['fish'] = 'wet'      # Set an entry in a dictionary
print (d['fish'])      # Prints "wet"

print (d['lion'])  # KeyError: 'lion' not a key of d

print (d.get('lion', 'N/A'))    # Get an element with a default; prints "N/A"
print (d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"

del (d['fish'])              # Remove an element from a dictionary
print (d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"

You can find all you need to know about dictionaries in the documentation.

It is easy to iterate over the keys in a dictionary:

d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print ('A %s has %d legs' % (animal, legs))

If you want access to keys and their corresponding values, use the items method:

d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, legs in d.items():
    print ('A %s has %d legs' % (animal, legs))

Dictionary comprehensions

These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:

nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print (even_num_to_square)

Defaultdict

You can also initialize a dictionary with a default value type using a defaultdict. Additional examples can be found here.

from collections import defaultdict

grocery_lists = defaultdict(list)
grocery_lists['Leon'] = ['Milk', 'Apples', 'Cereal']
grocery_lists['Leon'].append('Chocolate')

grocery_lists['Greg'] = ['Flour', 'Egg', 'Baking Powder', 'Salt', 'Sugar']
grocery_lists['Greg'].extend(['Pancake Mix', 'Whipped Cream'])

print (grocery_lists)

You can also have a defaultdict of a defaultdict. This comes up when you want a 2D dictionary, where you can index multiple times.

A good example is if you want to store a friendship network, with edges between friends with edge weights. You can easily achieve this using a standard dict, with keys as tuples representing friendship pairs. However this will make finding neighbors in the graph difficult.

Note: lambda functions are python’s anonymous function. You can read more about it here.

network = defaultdict(lambda: defaultdict(int))

network['Mary']['Sally'] = 1
network['Sue']['Elizabeth'] = 4
network['George']['Lucas'] = 3

print (network)

You can even make an infinitely nested defaultdict. But with great power, comes great responsibility. Your code may no longer be readable (depending on how you implement it). Also, the print statement becomes nonesense.

Some good methods are detailed here. I will show one.

nests = lambda: defaultdict(nests)
inf_nest = nests()

inf_nest[1][2][3][4][5] = 4
inf_nest[1][2][4] = 3
inf_nest['hello']['world'] = True

print (inf_nest)

Counters

Another useful built-in dictionary type is the Counter. It is functionally equivalent to defaultdict(int), except there are additional built-in functions for Counters.

You can call a Counter on a string, and it will count the frequency of the characters, or on a list and it will count the frequency of the elements.

from collections import Counter

letter_count = Counter('Hello World!')
print (letter_count)

element_count = Counter([1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6])
print (element_count)

Counters have two great functions, elements() and most_common().

elements() will return an iterator over elements repeating each as many times as its count.

most_common([n]) returns a list of the n most common elements and their counts from the most common to the least. If n is omitted or None, most_common() returns all elements in the counter.

print (list(element_count.elements()))

print (letter_count.most_common(3))

Sets

A set is an unordered collection of distinct elements. As a simple example, consider the following:

animals = {'cat', 'dog'}
print ('cat' in animals)   # Check if an element is in a set; prints "True"
print ('fish' in animals)  # prints "False"

animals.add('fish')        # Add an element to a set
print ('fish' in animals)
print (len(animals))       # Number of elements in a set;

animals.add('cat')         # Adding an element that is already in the set does nothing
print (len(animals))
animals.remove('cat')      # Remove an element from a set
print (len(animals))

Loops

Iterating over a set has the same syntax as iterating over a list. However, since sets are unordered, you cannot make assumptions about the order in which you visit the elements of the set:

animals = {'cat', 'dog', 'fish'}
for idx, animal in enumerate(animals):
    print ('#%d: %s' % (idx + 1, animal))
# Prints "#1: fish", "#2: dog", "#3: cat"

Set comprehensions

Like lists and dictionaries, we can easily construct sets using set comprehensions:

from math import sqrt
print ({int(sqrt(x)) for x in range(30)})

You can also perform many mathematical operations on python sets, like union, intersection, subset and difference. Documentation can be found here.

household_pets = {'cat', 'dog', 'turtle', 'hamster'}
felines = {'cat', 'lion', 'cheetah', 'leopard'}

print ('Union: ', household_pets | felines)
print ('Intersection: ', household_pets & felines)
print ('Difference: ', household_pets - felines) # A-B = elements in A not in B

household_pets &= felines                       # set household_pets to the intersection
print (household_pets)

In-class exercise

Calculate the list of prime numbers up to an arbitrary number n.

Tuples

A tuple is an immutable ordered list of values (i.e., its elements cannot be updated).

names = ('Zach', 'Jay') # Note the parentheses
names[0] == 'Zach'
len(names) == 2
print (names)
empty = tuple()         # Empty tuple
single = (10,)          # Single-element tuple. Comma matters!
names[0] = 'Richard'    # TypeError: 'tuple' object does not support item assignment

A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot. Here is a trivial example:

d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
t = (5, 6)                              # Create a tuple
print (type(t))
print (d[t])
print (d[(1, 2)])

t[0] = 1

Shallow and Deep Copy

Python shows a strange behaviour for beginners of the language - in comparison with some other traditional programming languages - when assigning and copying simple data types like integers and strings. The difference between shallow and deep copying is only relevant for compound objects, i.e., objects containing other objects, like lists or class instances.

In the following code snippet, y points to the same memory location as x. We can see this by applying the id() function on x and y. However, unlike “real” pointers like those in C and C++, things change when we assign a new value to y. In this case y will receive a separate memory location.

x = 3
y = x
print(id(x), id(y))

y = 4
print(id(x), id(y))
print(x,y)

But even if this internal behaviour appears strange compared to programming languages like C, C++ and Perl, the observable results of the assignments answer our expectations. However, it can be problematic, if we copy mutable objects like lists and dictionaries.

Python creates only real copies if it has to, i.e., if the user, the programmer, explicitly demands it.

We will introduce you the most crucial problems, which can occur when copying mutable objects, i.e. when copying lists and dictionaries.

Copying a list:

colors1 = ["red", "blue"]
colors2 = colors1
print(colors1)
print(colors2)
print(id(colors1),id(colors2))

In the example above a simple list is assigned to colors1. This list is a so-called “shallow list”, because it doesn’t have a nested structure, i.e., no sublists are contained in the list. In the next step we assign colors1 to colors2.

The id() function shows us that both variables point at the same list object, i.e., they share this object.

Now we want to see, what happens, if we assign a new list object to colors2. As expected, the values of colors1 remained unchanged - a new memory location had been allocated for colors2, as we have assigned a completely new list, i.e. a new list object to this variable.

colors2 = ["rouge", "vert"]
print(colors1)
print(colors2)
print(id(colors1),id(colors2))

Now we have to examine, what will happen, if we change just one element of the list of colors2 or colors1.

colors1 = ["red", "blue"]
colors2 = colors1
print(id(colors1),id(colors2))

colors2[1] = "green"
print(id(colors1),id(colors2))
print(colors1)
print(colors2)

Let’s see, what has happened in detail in the previous code. We assigned a new value to the second element of colors2, i.e. the element with the index 1. Lots of beginners will be surprised as the list of colors1 has been “automatically” changed as well. Of course, we don’t have two lists: We have only two names for the same list!

The explanation is that we didn’t assign a new object to colors2. We changed colors2 inside or as it is usually called “in-place”. Both variables colors1 and colors2 still point to the same list object.

Virtual space

Memory is not unbounded, even though in programs we treat it as such (RAM).
How programs are structured in memory can lead to bugs (referencing -> segmentation fault).

Memory hierarchy

The computer memory is hierarchically organized, as shown in Figure 1.

⊕ <a name='figure1'>Figure 1</a>. The computer memory is organized hierarchically: the faster memory is closer to the CPU, and the slower memory is further from the CPU. The faster the memory, the more expensive and smaller it is, whereas the slower the memory the cheaper and larger it is.
Figure 1. The computer memory is organized hierarchically: the faster memory is closer to the CPU, and the slower memory is further from the CPU. The faster the memory, the more expensive and smaller it is, whereas the slower the memory the cheaper and larger it is.

Virtual memory

Each process has the virtual memory organized as follows. The program code and data is at lower addresses (Figure 2a). The runtime heap grows up as memory is allocated with malloc (C) or new (C++) (Figure 2b). Any imported libraries are placed in the shared libraries region (Figure 2c). The user stack grows and shrinks dynamically during the code execution (Figure 2d), and the kernel is at the top of the virtual memory, available through system calls (Figure 2e).

⊕ <a name='figure2a'>Figure 2a</a>. The program code and data are placed at the beginning of virtual memory.
Figure 2a. The program code and data are placed at the beginning of virtual memory.

⊕ <a name='figure2b'>Figure 2b</a>. Runtime heap is used for dynamically allocated memory.
Figure 2b. Runtime heap is used for dynamically allocated memory.

⊕ <a name='figure2c'>Figure 2c</a>. Libraries that are imported in the source code of the application are placed in the shared libraries region of the virtual memory.
Figure 2c. Libraries that are imported in the source code of the application are placed in the shared libraries region of the virtual memory.

⊕ <a name='figure2d'>Figure 2d</a>. The user stack expands as functions are called and contracts when functions return control to the caller.
Figure 2d. The user stack expands as functions are called and contracts when functions return control to the caller.

⊕ <a name='figure2e'>Figure 2e</a>. The kernel is at the top of the virtual memory space and is invisible to the user code.
Figure 2e. The kernel is at the top of the virtual memory space and is invisible to the user code.

Virtual memory example

Let’s look at an example to see where the different components of a source code are stored in the virtual memory. All global variables are stored in the un/initialized data segment (Figure 3a). When main starts executing, its frame is placed on top of the user stack, and its arguments are allocated in this frame (Figure 3b). Static variables are also stored in the un/initialized data segment, whereas local variables are stored in the frame of the function, on the stack (Figure 3c). Dynamically allocated memory is stored in the heap, with its pointer stored in the frame of the function, on the stack (Figure 3d). During the program execution, doCalc is called on line 38, at which point its frame is created on top of the user stack (Figure 3e). This in turn calls square, on line 19, and its frame is created on top of the user stack. When it returns, its frame is popped off the stack (Figure 3f).

⊕ <a name='figure3a'>Figure 3a</a>. Global variables are stored in the un/initialized data segment.
Figure 3a. Global variables are stored in the un/initialized data segment.

⊕ <a name='figure3b'>Figure 3b</a>. When the program starts executing, the frame for main function is placed on top of the user stack, and its arguments are allocated in this frame.
Figure 3b. When the program starts executing, the frame for main function is placed on top of the user stack, and its arguments are allocated in this frame.

⊕ <a name='figure3c'>Figure 3c</a>. Any static variables are stored in the un/initialized data segment, whereas the local variables are stored in the frame of the function, on the stack.
Figure 3c. Any static variables are stored in the un/initialized data segment, whereas the local variables are stored in the frame of the function, on the stack.

⊕ <a name='figure3d'>Figure 3d</a>. Dynamically allocated memory is stored in the heap, with its pointer stored in the frame of the function, on the stack.
Figure 3d. Dynamically allocated memory is stored in the heap, with its pointer stored in the frame of the function, on the stack.

⊕ <a name='figure3e'>Figure 3e</a>. **doCalc** is called on line 38, at which point its frame is created on top of the user stack.
Figure 3e. doCalc is called on line 38, at which point its frame is created on top of the user stack.

⊕ <a name='figure3f'>Figure 3f</a>. **square** is called on line 19, and its frame is created on top of the user stack. When it returns, its frame is popped off the stack.
Figure 3f. square is called on line 19, and its frame is created on top of the user stack. When it returns, its frame is popped off the stack.

User stack

The frames on top of the user stack when the code is executing the statements on lines 10-12 are shown in Figure 4.

⊕ <a name='figure4'>Figure 4</a>. When the code is executing the statements on lines 10-12, the user stack has the frames for **main**, **doCalc**, and **square**, in this order, since **main** called **doCalc**, which in turned called **square**.
Figure 4. When the code is executing the statements on lines 10-12, the user stack has the frames for main, doCalc, and square, in this order, since main called doCalc, which in turned called square.

Stack details

Each stack frame contains the return address, miscellaneous bookkeeping, local variables, temporaries, and the arguments to the called routines, as shown in Figure 5.

⊕
Figure 5. An example of a user stack when subroutine D is executing. Each stack frame contains the return address, miscellaneous bookkeeping, local variables, temporaries, and the arguments to the called routines. The stack pointer (sp), always points to the top of the user stack, and the frame pointer (fp) points to the start of the stack frame of the executing subroutine.

Summary

Language basics, common operations, and built-in values
Collections and loops
Hardware-software interface: virtual space
Next time: Python programming language, part 2 of 2
Reminders:
- Provide feedback/unofficial course evaluation (any time)
- Submit questions for quiz 1 by midnight

Back to course home