Haskell - part 4
Declaring types and classes, the countdown problem, and lazy evaluationMaterial adapted from Erik Meijer’s Functional ProgrammingA Jupyter notebook version of this material is available online in Google Colab. Although Colab does not support Haskell, the notebook can be downloaded and run in Jupyter Notebook, if the IHaskell extension is installed.
Declaring types and classes
Type declarations
In Haskell, a new name for an existing type can be defined using a type declaration.
type String = [Char]
String is a synonym for the type [Char]
.
type String = [Char]
Type declarations can be used to make other types easier to read. For example, given
type Pos = (Int,Int)
we can define:
origin :: Pos
origin = (0,0)
left :: Pos → Pos
left (x,y) = (x-1,y)
type Pos = (Int,Int)
origin :: Pos
origin = (0,0)
left :: Pos -> Pos
left (x,y) = (x-1,y)
Like function definitions, type declarations can also have parameters. For example, given
type Pair a = (a,a)
we can define:
mult :: Pair Int → Int
mult (m,n) = m*n
copy :: a → Pair a
copy x = (x,x)
type Pair a = (a,a)
mult :: Pair Int -> Int
mult (m,n) = m*n
copy :: a -> Pair a
copy x = (x,x)
Type declarations can be nested:
-- OK
type Pos = (Int,Int)
type Trans = Pos → Pos
However, they cannot be recursive:
-- Not OK
type Tree = (Int,[Tree])
-- OK
type Pos = (Int,Int)
type Trans = Pos -> Pos
-- Not OK
type Tree = (Int,[Tree])
Data declarations
A completely new type can be defined by specifying its values using a data declaration.
data Bool = False | True
Bool
is a new type, with two new values False
and True
.
data Bool = False | True
Note:
- The two values
False
andTrue
are called the constructors for the typeBool
. - Type and constructor names must begin with an upper-case letter.
- Data declarations are similar to context free grammars. The former specifies the values of a type, the latter the sentences of a language.
Values of new types can be used in the same ways as those of built in types. For example, given
data Answer = Yes | No | Unknown
we can define:
answers :: [Answer]
answers = [Yes,No,Unknown]
flip :: Answer → Answer
flip Yes = No
flip No = Yes
flip Unknown = Unknown
data Answer = Yes | No | Unknown
answers :: [Answer]
answers = [Yes,No,Unknown]
flip :: Answer -> Answer
flip Yes = No
flip No = Yes
flip Unknown = Unknown
The constructors in a data declaration can also have parameters. For example, given
data Shape = Circle Float
| Rect Float Float
we can define:
square :: Float → Shape
square n = Rect n n
area :: Shape → Float
area (Circle r) = pi * r^2
area (Rect x y) = x * y
data Shape = Circle Float
| Rect Float Float
square :: Float -> Shape
square n = Rect n n
area :: Shape -> Float
area (Circle r) = pi * r^2
area (Rect x y) = x * y
Note:
- Shape has values of the form
Circle r
wherer
is a float, andRect x y
wherex
andy
are floats. Circle
andRect
can be viewed as functions that construct values of typeShape
:
Circle :: Float → Shape
Rect :: Float → Float → Shape
Circle :: Float -> Shape
Rect :: Float -> Float -> Shape
Not surprisingly, data declarations themselves can also have parameters. For example, given
data Maybe a = Nothing | Just a
we can define:
safediv :: Int → Int → Maybe Int
safediv _ 0 = Nothing
safediv m n = Just (m `div` n)
safehead :: [a] → Maybe a
safehead [] = Nothing
safehead xs = Just (head xs)
data Maybe a = Nothing | Just a
safediv :: Int -> Int -> Maybe Int
safediv _ 0 = Nothing
safediv m n = Just (m `div` n)
safehead :: [a] -> Maybe a
safehead [] = Nothing
safehead xs = Just (head xs)
Recursive types
In Haskell, new types can be declared in terms of themselves. That is, types can be recursive.
data Nat = Zero | Succ Nat
Nat
is a new type, with constructors Zero :: Nat
and Succ :: Nat → Nat
.
data Nat = Zero | Succ Nat
Note:
- A value of type
Nat
is eitherZero
, or of the formSucc n
wheren :: Nat
. That is,Nat
contains the following infinite sequence of values:
Zero
Succ Zero
Succ (Succ Zero)
•
•
•
-
We can think of values of type
Nat
as natural numbers, whereZero
represents0
, andSucc
represents the successor function1+
. -
For example, the value
Succ (Succ (Succ Zero))
represents the natural number 1 + (1 + (1 + 0)) = 3
.
Using recursion, it is easy to define functions that convert between values of type Nat
and Int
:
nat2int :: Nat → Int
nat2int Zero = 0
nat2int (Succ n) = 1 + nat2int n
int2nat :: Int → Nat
int2nat 0 = Zero
int2nat n = Succ (int2nat (n-1))
nat2int :: Nat -> Int
nat2int Zero = 0
nat2int (Succ n) = 1 + nat2int n
int2nat :: Int -> Nat
int2nat 0 = Zero
int2nat n = Succ (int2nat (n-1))
Two naturals can be added by converting them to integers, adding, and then converting back:
add :: Nat → Nat → Nat
add m n = int2nat (nat2int m + nat2int n)
However, using recursion the function add
can be defined without the need for conversions:
add Zero n = n
add (Succ m) n = Succ (add m n)
add :: Nat -> Nat -> Nat
add m n = int2nat (nat2int m + nat2int n)
add Zero n = n
add (Succ m) n = Succ (add m n)
For example:
add (Succ (Succ Zero)) (Succ Zero)
= Succ (add (Succ Zero) (Succ Zero))
= Succ (Succ (add Zero (Succ Zero))
= Succ (Succ (Succ Zero))
Note:
- The recursive definition for
add
corresponds to the laws0+n = n
and(1+m)+n = 1+(m+n)
.
Arithmetic expressions
Consider a simple form of expressions built up from integers using addition and multiplication, as shown in Figure 1.
Figure 1. The syntax tree for the arithmetic expression 1 + 2 * 3.
Using recursion, a suitable new type to represent such expressions can be declared by:
data Expr = Val Int
| Add Expr Expr
| Mul Expr Expr
For example, the expression on the previous slide would be represented as follows:
Add (Val 1) (Mul (Val 2) (Val 3))
Using recursion, it is now easy to define functions that process expressions. For example:
size :: Expr → Int
size (Val n) = 1
size (Add x y) = size x + size y
size (Mul x y) = size x + size y
eval :: Expr → Int
eval (Val n) = n
eval (Add x y) = eval x + eval y
eval (Mul x y) = eval x * eval y
size :: Expr -> Int
size (Val n) = 1
size (Add x y) = size x + size y
size (Mul x y) = size x + size y
eval :: Expr -> Int
eval (Val n) = n
eval (Add x y) = eval x + eval y
eval (Mul x y) = eval x * eval y
Note:
- The three constructors have types:
Val :: Int → Expr
Add :: Expr → Expr → Expr
Mul :: Expr → Expr → Expr
- Many functions on expressions can be defined by replacing the constructors by other functions using a suitable
fold
function. For example:
eval = fold id (+) (*)
Val :: Int -> Expr
Add :: Expr -> Expr -> Expr
Mul :: Expr -> Expr -> Expr
eval = fold id (+) (*)
Binary trees
In computing, it is often useful to store data in a two-way branching structure or binary tree, as shown in Figure 2.
Figure 2. Binary trees are useful in computing to store data.
Using recursion, a suitable new type to represent such binary trees can be declared by:
data Tree = Leaf Int
| Node Tree Int Tree
For example, the tree on the previous slide would be represented as follows:
Node (Node (Leaf 1) 3 (Leaf 4))
5
(Node (Leaf 6) 7 (Leaf 9))
data Tree = Leaf Int
| Node Tree Int Tree
Node (Node (Leaf 1) 3 (Leaf 4))
5
(Node (Leaf 6) 7 (Leaf 9))
We can now define a function that decides if a given integer occurs in a binary tree:
occurs :: Int → Tree → Bool
occurs m (Leaf n) = m==n
occurs m (Node l n r) = m==n
|| occurs m l
|| occurs m r
But… in the worst case, when the integer does not occur, this function traverses the entire tree.
occurs :: Int -> Tree -> Bool
occurs m (Leaf n) = m==n
occurs m (Node l n r) = m==n
|| occurs m l
|| occurs m r
Now consider the function flatten
that returns the list of all the integers contained in a tree:
flatten :: Tree → [Int]
flatten (Leaf n) = [n]
flatten (Node l n r) = flatten l
++ [n]
++ flatten r
A tree is a search tree if it flattens to a list that is ordered. Our example tree is a search tree, as it flattens to the ordered list [1,3,4,5,6,7,9]
.
flatten :: Tree -> [Int]
flatten (Leaf n) = [n]
flatten (Node l n r) = flatten l
++ [n]
++ flatten r
Search trees have the important property that when trying to find a value in a tree we can always decide which of the two sub-trees it may occur in:
occurs m (Leaf n) = m==n
occurs m (Node l n r) | m==n = True
| m<n = occurs m l
| m>n = occurs m r
This new definition is more efficient, because it only traverses one path down the tree.
occurs m (Leaf n) = m==n
occurs m (Node l n r) | m==n = True
| m<n = occurs m l
| m>n = occurs m r
Exercises
- Using recursion and the function
add
, define a function that multiplies two natural numbers. - Define a suitable function
fold
for expressions, and give a few examples of its use. - A binary tree is complete if the two sub-trees of every node are of equal size. Define a function that decides if a binary tree is complete.
The countdown problem
What is countdown?
- A popular quiz program on British television that has been running since 1982.
- Based upon an original French version called “Des Chiffres et Des Lettres”.
- Includes a numbers game that we shall refer to as the countdown problem.
Example
Using the numbers
1 3 7 10 25 50
and the arithmetic operators
+ - * ÷
construct an expression whose value is 765
.
Rules
- All the numbers, including intermediate results, must be positive naturals,
(1,2,3,…)
. - Each of the source numbers can be used at most once when constructing the expression.
- We abstract from other rules that are adopted on television for pragmatic reasons.
For our example, one possible solution is
(25-10) * (50+1) = 765
Notes:
- There are
780
solutions for this example. - Changing the target number to
831
gives an example that has no solutions.
Evaluating expressions
Operators:
data Op = Add | Sub | Mul | Div
Apply an operator:
apply :: Op → Int → Int → Int
apply Add x y = x + y
apply Sub x y = x - y
apply Mul x y = x * y
apply Div x y = x `div` y
data Op = Add | Sub | Mul | Div
apply :: Op -> Int -> Int -> Int
apply Add x y = x + y
apply Sub x y = x - y
apply Mul x y = x * y
apply Div x y = x `div` y
Decide if the result of applying an operator to two positive natural numbers is another such:
valid :: Op → Int → Int → Bool
valid Add _ _ = True
valid Sub x y = x > y
valid Mul _ _ = True
valid Div x y = x `mod` y == 0
Expressions:
data Expr = Val Int | App Op Expr Expr
valid :: Op -> Int -> Int -> Bool
valid Add _ _ = True
valid Sub x y = x > y
valid Mul _ _ = True
valid Div x y = x `mod` y == 0
data Expr = Val Int | App Op Expr Expr
Return the overall value of an expression, provided that it is a positive natural number:
eval :: Expr → [Int]
eval (Val n) = [n | n > 0]
eval (App o l r) = [apply o x y | x ← eval l
, y ← eval r
, valid o x y]
Either succeeds and returns a singleton list, or fails and returns the empty list.
eval :: Expr -> [Int]
eval (Val n) = [n | n > 0]
eval (App o l r) = [apply o x y | x <- eval l
, y <- eval r
, valid o x y]
Formalising the problem
Return a list of all possible ways of choosing zero or more elements from a list:
choices :: [a] → [[a]]
For example:
> choices [1,2]
[[],[1],[2],[1,2],[2,1]]
choices :: [a] -> [[a]]
choices [1,2]
Return a list of all the values in an expression:
values :: Expr → [Int]
values (Val n) = [n]
values (App _ l r) = values l ++ values r
Decide if an expression is a solution for a given list of source numbers and a target number:
solution :: Expr → [Int] → Int → Bool
solution e ns n = elem (values e) (choices ns)
&& eval e == [n]
values :: Expr -> [Int]
values (Val n) = [n]
values (App _ l r) = values l ++ values r
solution :: Expr -> [Int] -> Int -> Bool
solution e ns n = elem (values e) (choices ns)
&& eval e == [n]
Brute force solution
Return a list of all possible ways of splitting a list into two non-empty parts:
split :: [a] → [([a],[a])]
For example:
> split [1,2,3,4]
[([1],[2,3,4]),([1,2],[3,4]),([1,2,3],[4])]
split :: [a] -> [([a],[a])]
split [1,2,3,4]
Return a list of all possible expressions whose values are precisely a given list of numbers:
exprs :: [Int] → [Expr]
exprs [] = []
exprs [n] = [Val n]
exprs ns = [e | (ls,rs) ← split ns
, l ← exprs ls
, r ← exprs rs
, e ← combine l r]
The key function in this lecture.
exprs :: [Int] <- [Expr]
exprs [] = []
exprs [n] = [Val n]
exprs ns = [e | (ls,rs) <- split ns
, l <- exprs ls
, r <- exprs rs
, e <- combine l r]
Combine two expressions using each operator:
combine :: Expr → Expr → [Expr]
combine l r =
[App o l r | o ← [Add,Sub,Mul,Div]]
Return a list of all possible expressions that solve an instance of the countdown problem:
solutions :: [Int] → Int → [Expr]
solutions ns n = [e | ns' ← choices ns
, e ← exprs ns
, eval e == [n]]
combine :: Expr -> Expr -> [Expr]
combine l r =
[App o l r | o <- [Add,Sub,Mul,Div]]
solutions :: [Int] -> Int -> [Expr]
solutions ns n = [e | ns' <- choices ns
, e <- exprs ns
, eval e == [n]]
How fast is it?
System: 1.2GHz Pentium M laptop
Compiler: GHC version 6.4.1
Example: solutions [1,3,7,10,25,50] 765
One solution: 0.36 seconds
All solutions: 43.98 seconds
Can we do better?
- Many of the expressions that are considered will typically be invalid - fail to evaluate.
- For our example, only around 5 million of the 33 million possible expressions are valid.
- Combining generation with evaluation would allow earlier rejection of invalid expressions.
Fusing two functions
Valid expressions and their values:
type Result = (Expr,Int)
We seek to define a function that fuses together the generation and evaluation of expressions:
results :: [Int] → [Result]
results ns = [(e,n) | e ← exprs ns
, n ← eval e]
type Result = (Expr,Int)
results :: [Int] -> [Result]
results ns = [(e,n) | e <- exprs ns
, n <- eval e]
This behaviour is achieved by defining
results [] = []
results [n] = [(Val n,n) | n > 0]
results ns =
[res | (ls,rs) ← split ns
, lx ← results ls
, ry ← results rs
, res ← combine' lx ry]
where
combine' :: Result → Result → [Result]
results [] = []
results [n] = [(Val n,n) | n > 0]
results ns =
[res | (ls,rs) <- split ns
, lx <- results ls
, ry <- results rs
, res <- combine' lx ry]
combine' :: Result -> Result -> [Result]
Combining results:
combine’ (l,x) (r,y) =
[(App o l r, apply o x y)
| o ← [Add,Sub,Mul,Div]
, valid o x y]
New function that solves countdown problems:
solutions' :: [Int] → Int → [Expr]
solutions' ns n =
[e | ns' ← choices ns
, (e,m) ← results ns'
, m == n]
combine' (l,x) (r,y) =
[(App o l r, apply o x y)
| o <- [Add,Sub,Mul,Div]
, valid o x y]
solutions' :: [Int] -> Int -> [Expr]
solutions' ns n =
[e | ns' <- choices ns
, (e,m) <- results ns'
, m == n]
How fast is it now?
Example: solutions' [1,3,7,10,25,50] 765
One solution: 0.04 seconds
All solutions: 3.47 seconds
Around 10 times faster in both cases.
Can we do better?
- Many expressions will be essentially the same using simple arithmetic properties, such as:
x * y = y * x
x * 1 = x
- Exploiting such properties would considerably reduce the search and solution spaces.
Exploiting properties
Strengthening the valid predicate to take account of commutativity and identity properties:
valid :: Op → Int → Int → Bool
valid Add x y = x ≤ y
valid Sub x y = x > y
valid Mul x y = x ≤ y && x ≠ 1 && y ≠ 1
valid Div x y = x `mod` y == 0 && y ≠ 1
valid :: Op -> Int -> Int -> Bool
valid Add x y = x <= y
valid Sub x y = x > y
valid Mul x y = x <= y && x != 1 && y != 1
valid Div x y = x `mod` y == 0 && y != 1
How fast is it now?
Example: solutions'' [1,3,7,10,25,50] 765
Valid: 250,000 expressions
Around 20 times less.
Solutions: 49 expressions
Around 16 times less.
One solution: 0.02 seconds
Around 2 times faster.
All solutions: 0.44 seconds
Around 7 times faster.
More generally, our program usually produces a solution to problems from the television show in an instant, and all solutions in under a second.
Lazy evaluation
Introduction
Up to now, we have not looked in detail at how Haskell expressions are evaluated.
In fact they are evaluated using a simple technique that, among other things:
1. Avoids doing unnecessary evaluation
2. Allows programs to be more modular
3. Allows us to program with infinite lists
The evaluation technique is called lazy evaluation, and Haskell is a lazy functional language.
Evaluating expressions
Basically, expressions are evaluated or reduced by successively applying definitions until no further simplification is possible.
For example, given the definition: square n = n * n
The expression square(3 + 4)
can be evaluated using the following sequence of reductions:
Evaluating expressions
square n = n * n
square (3 + 4)
= square 7
= 7 * 7
= 49
However, this is not the only possible reduction sequence.
Another is the following:
square (3 + 4)
= (3 + 4) * (3 + 4)
= 7 * (3 + 4)
= 7 * 7
= 49
Now we have applied square before doing the addition, but the final result is the same.
FACT
In Haskell, two different (but terminating) ways of evaluating the same expression will always give the same final result.
Reduction strategies
At each stage during evaluation of an expression, there may be many possible subexpressions that can be reduced by applying a definition.
There are two common strategies for deciding which redex (reducible subexpression) to choose:
- Innermost reduction: An innermost redex is always reduced
- Outermost reduction: An outermost redex is always reduced
How do the two strategies compare … ?
Termination
Given the definition: loop = tail loop
Let’s evaluate the expression fst (1, loop)
using these two reduction strategies.
Innermost reduction
fst (1, loop)
= fst (1, tail loop)
= fst (1, tail (tail loop))
= ...
This strategy does not terminate.
Outermost reduction
fst (1, loop)
= 1
This strategy gives a result in one step.
Facts
- Outermost reduction may give a result when innermost reduction fails to terminate.
- For a given expression if there exists any reduction sequence that terminates, then outermost reduction also terminates, with the same result.
Number of reductions
Innermost
square (3 + 4)
= square 7
= 7 * 7
= 49
Outermost
square (3 + 4)
= (3 + 4) * (3 + 4)
= 7 * (3 + 4)
= 7 * 7
= 49
The outermost version is inefficient: the subexpression 3 + 4
is duplicated when square is reduced, and so must be reduced twice.
Fact Outermost reduction may require more steps than innermost reduction.
The problem can be solved by using pointers to indicate sharing of expressions during evaluation, as shown in Figure 3.
Figure 3. Square uses pointers to the expression 3 + 4, evaluates the expression to 7, and then multiplies the result.
This gives a new reduction strategy:
Lazy evaluation = Outermost reduction + sharing
Facts
- Never requires more steps than innermost reduction
- Haskell uses lazy evaluation
Infinite lists
In addition to the termination advantages, using lazy evaluation allows us to program with infinite lists of values!
Consider the recursive definition:
ones :: [Int]
ones = 1 : ones
Unfolding the recursion a few times gives:
ones = 1 : ones
= 1 : 1 : ones
= 1 : 1 : 1 : ones
That is, ones is the infinite list of 1’s.
Now consider evaluating the expression
head ones
using innermost reduction and lazy evaluation.
Innermost reduction
head ones = head (1 : ones)
= head (1 : 1 : ones)
= head (1 : 1 : 1 : ones)
= ...
In this case, evaluation does not terminate.
Lazy evaluation
head ones = head (1 : ones)
= 1
In this case, evaluation gives the result 1.
That is, using lazy evaluation only the first value in the infinite list ones is actually produced, since this is all that is required to evaluate the expression head ones as a whole.
Lazy evaluation
In general we have the slogan:
Using lazy evaluation, expressions are only evaluated as much as required to produce the final result.
We see now that
ones = 1 : ones
really only defines a potentially infinite list that is only evaluated as much as required by the context in which it is used.
Modular programming
We can generate finite lists by taking elements from infinite lists.
> take 5 ones
[1,1,1,1,1]
> take 5 [1..]
[1,2,3,4,5]
Lazy evaluation allows us to make programs more modular, by separating control from data, as shown in Figure 4.
Figure 4. Using lazy evaluation, the data is only evaluated as much as required by the control part.
Example: generating primes
A simple procedure for generating the infinite list of prime numbers is as follows:
- Write down the list
2, 3, 4, …
; - Mark the first value
p
in the list as prime; - Delete all multiples of
p
from the list; - Return to step 2.
Example: generating primes
The first few steps are shown in Figure 5.
Figure 5. The procedure known as the “seive of Eratosthenes,” after the Greek mathematician who first described it, can be used to identify prime numbers.
It can be translated directly into Haskell:
primes :: [Int]
primes = seive [2..]
seive :: [Int] -> [Int]
seive (p:xs) = p : seive [x | x <- xs, x `mod` p /= 0]
and executed as follows:
> primes
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,...
primes :: [Int]
primes = seive [2..]
seive :: [Int] -> [Int]
seive (p:xs) = p : seive [x | x <- xs, x `mod` p /= 0]
primes
By freeing the generation of primes from the constraint of finiteness, we obtain a modular definition on which different boundary conditions can be imposed in different situations:
Selecting the first 10 primes:
> take 10 primes
[2,3,5,7,11,13,17,19,23,29]
Selecting the primes less than 15:
> takeWhile (<15) primes
[2,3,5,7,11,13]
Lazy evaluation is powerful programming tool!
take 10 primes
takeWhile (<15) primes
Fun exercises
Define a program
fibs :: [Integer]
that generates the infinite Fibonacci sequence [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...
using the following simple procedure:
- The first two numbers are
0
and1
; - The next is the sum of the previous two;
- Return to step 2.
Fun exercises
Define a program
fib :: Int -> Integer
that calculates the n
th Fibonacci number.