Lambda-calculus

Review: programming modelsMaterial adapted from CSCI 3136 Principles of Programming Languages at Dalhousie University

Imperative programming: Computation is a sequence of side effects.

Functional Programming: Computation by function evalution. There are no side effects.

This requires functions to be ﬁrst-class objects: they can be passed as function arguments and returned as the results of computations.

Functions that operate on functions and/or return functions are called higher-order functions.

Iteration is impossible in functional languages because iteration inherently requires side effects. Iteration can be simulated using recursion.

Other common features of functional programming languages

Strong reliance on list types and strong support for manipulating them
Structured function returns
Structured data constructors
Garbage collection

Functional programming and the real world

Advantages of functional programming

The lack of side effects makes it easier to reason about program:

– Formal proofs of program correctness become easier.

– Tests always produce the same result.
Higher level of abstraction $\to$ more expressive

Real-world concerns

The interaction with the real world (ﬁle I/O, user I/O, …) happens through side effects.
Most functional programming languages have support for side effects to accommodate this.

– Scheme: very ﬂexible: supports mixing of different programming styles

– Haskell: Controlled side effects through monads

$\lambda$-calculus (Alonzo Church 1941)

. . . is the theoretical foundation of functional programming. It expresses computation entirely using function application.

A context-free grammar for lambda expressions

$$ \begin{align*} Expr &\to name\mid number\mid\lambda\ name.Expr\mid Func\ Arg\\ Func &\to name\mid (\lambda\ name.Expr)\mid Func\ Arg\\ Arg &\to name\mid number\mid (\lambda\ name.Expr)\mid (Func\ Arg) \end{align*} $$

Numbers are not really part of this, but we introduce them here to make the exposition easier.

Examples

$x$, times, plus, 7
$\lambda x.$x
$\lambda y.$times
$\lambda x.x y$z
$\lambda y.$times $y y$
$\lambda x.($times $x\lambda y.y)$

Interpretation

A variable is bound if it is in the scope of a $\lambda$ and free otherwise.

Examples

x is bound in $\lambda x.x y$ but free in the subexpression x y
x is free in $\lambda y . x y$

$\lambda x.Expr$ (function deﬁnition)

is a function with formal parameter x, which becomes a bound variable in Expr

$(\lambda x.Expr)A\equiv Expr[A\setminus x]$ (function application)

replaces every free occurrence of x in Expr with A

$(\lambda x.\lambda y.Expr)AB\equiv(\lambda y.Expr[A\setminus x])B\equiv Expr[A\setminus x$ and $B\setminus y]$

Computing using $\lambda$-calculus

Computation is done using three transformations:

$\beta$-reduction

$(\lambda x.E) M \to_\beta E[M \setminus x$]
Permissible only if no free variable of M becomes bound in $E[M \setminus x]$

$\alpha$-conversion

$\lambda x.E\to_\alpha\lambda y.E[y\setminus x]$, where y is a variable not occurring in E
Needed to make $\beta$-reduction possible

$\eta$-reduction

$\lambda x.F x \to_\eta F$, where F contains no free occurrences of x
Eliminates “surplus” lambda abstractions

The need for $\alpha$-conversion

\[(\lambda x.\lambda y.x y)(\lambda z.y)c\]

Substitution without $\alpha$-conversion

\[\begin{align} (\lambda x.\lambda y.x y)(\lambda z.y)c &\to_\beta (\lambda y.(\lambda z.y)y)c\\ &\to_\beta(\lambda z.c)c\\ &\to_\beta c \end{align}\]

Substitution with $\alpha$-conversion

\[\begin{align} (\lambda x.\lambda y.x y)(\lambda z.y)c &\to_\alpha (\lambda x.\lambda w.xw)(\lambda z.y)c\\ &\to_\beta(\lambda w.(\lambda z.y)w)c\\ &\to_\eta(\lambda z.y)c\\ &\to_\beta y \end{align}\]

Associativity in $\lambda$-expressions

Function deﬁnition is right-associative:

\[\lambda x.\lambda y.\lambda z.E\equiv \lambda x.(\lambda y.(\lambda z.E)))\]

Function application is left-associative:

\[ABCD\equiv((AB)C)D\]

$\lambda$ extends as far to the right as possible (to the end of the expression or to the next parenthesis):

\[\lambda x.ABCD\equiv\lambda x.(ABCD)\]

Labels

Labels can be created as shortcuts for long expressions. This is kind of like a global function deﬁnition in a programming language.

Examples

\[\begin{align} square&\equiv\lambda x.times\ x\ x\\ id&\equiv\lambda x.x\\ const&\equiv\lambda x.\lambda y.x\\ hypot&\equiv\lambda x.\lambda y.sqrt(plus(square\ x)(square\ y)) \end{align}\]

Logic and control flow (1)

\[\begin{align} true&\equiv\lambda x.\lambda y.x\\ false&\equiv\lambda x.\lambda y. y\\ not&\equiv\lambda t.t\ false\ true\\ if&\equiv\lambda c.\lambda t.\lambda e.cte \end{align}\]

Example 1

\[\begin{align} not\ true&\equiv(\lambda t.t(\lambda x.\lambda y.y)(\lambda x.\lambda y.x))(\lambda x.\lambda y.x)\\ &\to_\beta(\lambda x.\lambda y.x)(\lambda x.\lambda y.y)(\lambda x.\lambda y.x)\\ &\to_\beta(\lambda y.(\lambda x.\lambda y.y))(\lambda x.\lambda y.x)\\ &\to_\beta(\lambda x.\lambda y.y)\\ &\equiv false \end{align}\]

Logic and control flow (2)

Example 2

\[\begin{align} if\ true\ 3\ 4&\equiv(\lambda c.\lambda t.\lambda e.cte)(\lambda x.\lambda y.x)\ 3\ 4\\ &\to_\beta(\lambda t.\lambda e.(\lambda x.\lambda y.x)te)\ 3\ 4\\ &\to_\beta(\lambda e.(\lambda x.\lambda y.x)3e) 4\\ &\to_\beta(\lambda x.\lambda y.x)\ 3\ 4\\ &\to_\beta(\lambda y.3)\ 4\\ &\to_\beta3 \end{align}\]

What about “and” and “or”?

Representing natural numbers using functions

Building data structures: lists (1)

\[\begin{align} cons&\equiv\lambda h.\lambda t.\lambda x.xht&\textrm{(prepend single element h to list t)}\\ car&\equiv\lambda l.l(\lambda x.\lambda y.x)&\textrm{(return head of list l)}\\ cdr&\equiv\lambda l.l(\lambda x.\lambda y.y)&\textrm{(return tail of list l)}\\ nil&\equiv\lambda x.true&\textrm{(empty list)}\\ null?&\equiv\lambda l.l(\lambda x.\lambda y.false)&\textrm{(is list l empty?)} \end{align}\]

Example 1

\[\begin{align} null?nil&\equiv(\lambda l.l(\lambda x.\lambda y.false))(\lambda x.true)\\ &\to_\beta(\lambda x.true)(\lambda x.\lambda y.false)\\ &\to_\beta true \end{align}\]

Building data structures: lists (2)

Example 2

\[\begin{align} null?(cons\ 1\ nil)&\equiv(\lambda l.l(\lambda x.\lambda y.false))((\lambda h.\lambda t.\lambda x.xht)1(\lambda x.true))\\ &\to_\beta(\lambda h.\lambda t.\lambda x.xht)1(\lambda x.true)(\lambda x.\lambda y.false)\\ &\to_\beta(\lambda t.\lambda x.x1t)(\lambda x.true)(\lambda x.\lambda y.false)\\ &\to_\beta(\lambda x.x1(\lambda x.true))(\lambda x.\lambda y.false)\\ &\to_\beta(\lambda x.\lambda y.false)1(\lambda x.true)\\ &\to_\beta(\lambda y.false)(\lambda x.true)\\ &\to_\beta false \end{align}\]

Building data structures: lists (3)

Example 3

\[\begin{align} car(cdr(cons\ 1\ (cons\ 2\ (cons\ 3\ nil)))) &\equiv(\lambda l.l(\lambda x.\lambda y.x))(cdr(cons\ 1\ (cons\ 2\ (cons\ 3\ nil))))\\ &\to_\beta cdr(cons\ 1\ (cons\ 2\ (cons\ 3\ nil)))(\lambda x.\lambda y.x)\\ &\equiv(\lambda l.l(\lambda x.\lambda y.y))(cons\ 1\ (cons\ 2\ (cons\ 3\ nil)))(\lambda x.\lambda y.x)\\ &\to_\beta cons\ 1\ (cons\ 2\ (cons\ 3\ nil))(\lambda x.\lambda y.y)(\lambda x.\lambda y.x)\\ &\equiv(\lambda h.\lambda t.\lambda x.xht)\ 1\ (cons\ 2\ (cons\ 3\ nil))(\lambda x.\lambda y.y)(\lambda x.\lambda y.x)\\ &\to_\beta^* (\lambda x.\lambda y.y)\ 1\ (cons\ 2\ (cons\ 3\ nil))(\lambda x.\lambda y.x)\\ &\to_\beta^* (cons\ 2\ (cons\ 3\ nil))(\lambda x.\lambda y.x)\\ &\equiv(\lambda h.\lambda t.\lambda x.xht)\ 2\ (cons\ 3\ nil)(\lambda x.\lambda y.x)\\ &\to_\beta^* (\lambda x.\lambda y.x)\ 2\ (cons\ 3\ nil)\\ &\to_\beta^* 2 \end{align}\]

Currying

(Haskell Curry, logician)

Currying: No need for multi-argument functions. Instead, represent such functions as functions returning functions returning functions …

Currying makes “partial function application” possible:

times$\equiv\lambda x.\lambda y.x*y$ (a function with two arguments)
Full application: times 2 3 $\equiv$ 6 (a number)
Partial application: times 2 $\equiv \lambda y .2 * y$ (a function doubling its argument)

Currying is cool (1)

Currying Is Cool (2)

Currying is cool (3)

Scheme (Anonymous functions and higher-order functions for list processing, but no currying)

(if cond
   (map (lambda (x) (* 10 x)) A)
   (map (lambda (x) (+ 2 x)) A))

Haskell (Anonymous functions + currying)

if cond then map (* 10) A
          else map (+ 2) A

Recursion in $\lambda$-calculus (1)

Example: Computing factorials

Attempt 1 (Does not work)

\[fac\equiv\lambda n.if(= n\ 0)\ 1\ (*n(fac(−\ n\ 1)))\]

The reason this is not correct is because “fac” is just a shorthand for the $\lambda$-expression on its right-hand side. In order to obtain a valid $\lambda$-expression, all such shorthands need to be replaced with the $\lambda$-expressions they represent.

This leads to an inﬁnite process here.

Towards a correct deﬁnition

\[fac\equiv(\lambda f .\lambda n.if(= n\ 0)\ 1\ (*n(f(−\ n\ 1)))fac\]

In other words,

\[fac\equiv F\ fac\]

where

\[F\equiv(\lambda f.\lambda n.if(= n\ 0)\ 1\ (*\ n(f (−\ n\ 1)))\]

“fac” is a ﬁxed point of F.

Recursion in $\lambda$-calculus (2)

The Y-combinator (ﬁxed-point operator)

\[Y\equiv\lambda h.(\lambda x.h(x\ x))(\lambda x.h(x\ x))\]

Whenever we would normally write a recursive function

\[f\equiv\lambda x.\ ... f\ ...\]

we now write

\[F\equiv Y(\lambda f .\lambda x.\ ... f\ ...)\]

and then evaluate F x instead of f x.

Back to factorials:

Incorrect: $fac\equiv\lambda n.if(=\ n\ 0)\ 1\ (*\ n\ (fac(−\ n\ 1)))$
Correct: $Fac\equiv Y(\lambda f .\lambda n.if(= n\ 0)\ 1\ (*\ n(f (−\ n\ 1))))$

Recursion in $\lambda$-calculus (3)

\[\begin{align} Fac\ 3&\equiv&Y(\lambda f .\lambda n.if(=\ n\ 0)1(*\ n(f (−\ n\ 1))))3\\ &\equiv&(\lambda h.(\lambda x.h(x\ x))(\lambda x.h(x\ x)))\\ &&(\lambda f .\lambda n.if(=\ n\ 0)\ 1\ (*\ n(f (−\ n\ 1))))3\\ &\to_\beta&(\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)\ 1\ (*\ n(f (−\ n\ 1))))(x\ x))\\ &&(\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)\ 1\ (*\ n(f (−\ n\ 1))))(x\ x))3\\ &\to_\beta&(\lambda f .\lambda n.if(=\ n\ 0)\ 1\ (*\ n\ (f (−\ n\ 1))))\\ &&((\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)\ 1\ (*\ n(f (−\ n\ 1))))(x\ x))\\ &&(\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)\ 1\ (*\ n(f (−\ n\ 1))))(x\ x)))3\\ &\to_\beta^*&if(= 3\ 0)1\\ &&(*\ 3(((\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(*\ n(f (−\ n\ 1))))(x\ x))\\ &&(\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(*\ n(f (−\ n\ 1))))(x\ x)))(−\ 3\ 1)))\\ &\to_\beta^*&3(((\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)\ 1\ (*\ n(f (−\ n\ 1))))(x\ x))\\ &&(\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(*\ n(f (−\ n\ 1))))(x\ x)))2)\\ &\to_\beta^*&3(*\ 2(*\ 1(((\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(*\ n(f (−\ n\ 1))))(x\ x))\\ &&(\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(*\ n(f (−\ n\ 1))))(x\ x)))0)))\\ &\to_\beta^*&*3(*2(*1(((\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(*n(f (−\ n\ 1))))(x\ x))\\ &&(\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(* n(f (−\ n\ 1))))(x\ x)))0)))\\ &\to_\beta^*&*3(*2(*1(if(=\ 0\ 0)1\\ &&(*0(((\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(*\ n(f (−\ n\ 1))))(x\ x))\\ &&(\lambda x.(\lambda f .\lambda n.if(=\ n\ 0)1(*\ n(f (−\ n\ 1))))(x\ x)))(−\ 0\ 1))))))\\ &\to_\beta^*&*3(*2(*11))\\ &\to_\beta^*&6 \end{align}\]

(Non-)termination of reductions

Some reductions never terminate:

\[(\lambda x.x\ x)(\lambda x.x\ x)\to_\beta (\lambda x.x\ x)(\lambda x.x\ x)\]

Others may produce longer and longer expressions:

\[\begin{align} (\lambda x.x\ x\ x)(\lambda x.x\ x\ x)&\to_\beta (\lambda x.x\ x\ x)(\lambda x.x\ x\ x)(\lambda x.x\ x\ x)\\ &\to_\beta(\lambda x.x\ x\ x)(\lambda x.x\ x\ x)(\lambda x.x\ x\ x)(\lambda x.x\ x\ x)\\ &\to_\beta... \end{align}\]

This is the same as inﬁnite loops or inﬁnite recursion in incorrect programs.

Normal-order vs applicative-order reduction (1)

Normal-order reduction: Subtitute the argument into the function unevaluated and continue reducing the resulting expression.

\[(\lambda x.h)((\lambda x.x\ x)(\lambda x.x\ x))\to_beta h\]

Applicative-order reduction: Reduce both the function and the argument expressions to the simplest possible form before substituting the argument into the function. Then continue to reduce the resulting expression.

\[(\lambda x.h)((\lambda x.x\ x)(\lambda x.x\ x))\to_\beta^* ...\textrm{(infinite expansion)}\]

Normal-order vs applicative-order reduction (2)

Why does this matter?

Normal-order reduction allows us to construct inﬁnite objects and use them, as long as we only ever use a ﬁnite part of them.

Example: Fibonacci numbers in Haskell

fibonacci = 1:1:(zipWith (+) fibonacci (tail fibonacci))

> take 10 fibonacci
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

Church-Rosser theorem

If two reductions of the same $\lambda$-expression terminate, they terminate with the same result.

If some reduction of a $\lambda$-expression terminates, the normal-order reduction terminates.

Back to course home