Overview

Lesson plan

Introductions and motivation for taking this course
Programming domains, language evaluation criteria, influences on language design and their tradeoffs
Compilation vs. interpretation
Summary

Course theme – first part

Course theme

“A career in computer science is a commitment to a lifetime of learning. You will not be taught every detail you will need in your career while you are a student. The goal of a computer science education is to give you the tools you need so you can teach yourself new languages, frameworks, and architectures as they come along. The creativity encouraged by a lifetime of learning makes computer science one of the most exciting fields today” (Kent D. Lee, ).

Introduction – 1 lecture
Python – 2 lectures
Prolog – 2 lectures
Haskell – 4 lectures

Course theme – second part

“Education should prepare young people for jobs that do not yet exist, using technologies that have not yet been invented, to solve problems of which we are not yet aware” (Richard Riley, former Secretary of Education).

Parsing and scanning – 2 lectures
Formal semnatics – 2 lectures
Review for final exam – 1 lecture

What will we study?

Programming languages characterized by:
- Syntax – what a program looks like
- Semantics – what a program means
- Implementation – how a program executes
Trade-offs between design (specifying syntax and semantics) and implementation
Will focus on these concepts, illustrated by various languages, rather than on the languages themselves

Course theme

Tell me

And I will forget

Show me

And I may remember

Involve me

And I will understand

Why study programming languages?

Help you choose a language
- Systems programming \(\to\) C, Modula-3
- Scientific computations \(\to\) Fortran, Ada
- Embedded systems \(\to\) C, Ada, Modula-2
- Symbolic computing \(\to\) Lisp, Scheme, ML
Make it easier to learn new languages
- Some languages are similar
- Some concepts are even more similar: iteration, recursion, abstraction
Make better use of the languages you know
- Improve ability to develop effective algorithms
- Understand obscure features
- Understand implementation costs
- Simulate useful features in languages that do not support them
Prepare for further study in language design and implementation (compilers)
Help understand other system software (assemblers, linkers, debuggers)

History of programming languages

Early history

1940’s: programming meant modifying the hardware; programmers had intimate contact with the computer
Early 1950’s: Early low-level languages developed
- machine code
- assembly language
Late 1950’s – early 1960’s: First higher-level languages arrive
- Fortran
- LISP
- COBOL
- ALGOL

Structured programming, module systems, and the \(\lambda\)-calculus

Late 1960’s – 1970’s: Structured programming
- Pascal
- C
Late 1970’s – early 1980’s: Data abstraction/module systems
- Modula-2
- CLU
- Ada
1960’s: \(\lambda\)-calculus really starts to influence languages
- ISWIM (Landin)
- ML (late 1970’s)

Object orientation, dynamic languages, and the Internet

1980’s: Object oriented languages start to catch on
- Smalltalk
- C++
- Extensions to existing languages: Objective C, Object Pascal, etc
1990’s: The Internet starts to impact languages more
- Java
1990’s – 2000’s: Dynamic languages become more popular
- Web scripting languages (JavaScript, etc)
- Perl, Python, PHP
- Some new interest in Lisp
- Functional starts to become trendy (Scala, Haskell, Clojure)

Classification of languages

Imperative languages

Main focus: machine state – the set of values stored in memory locations
Command-driven: Each statement uses current state to compute a new state
Syntax: S1; S2; S3; …
Example languages: C, Pascal, FORTRAN, COBOL, CLU, Ada

Imperative Languages, C

while (b > 0) {
  a = a \* 2;
  b = b - 1;
}

Object-oriented languages

Classes are complex data types grouped with operations (methods) for creating, examining, and modifying elements (objects); subclasses include (inherit) the objects and methods from superclasses

Computation is based on objects sending messages (methods applied to arguments) to other objects
Syntax: Varies, object \(\leftarrow\) method(args)
Example languages: Java, C++, Smalltalk, Simula-67, Beta

Object-oriented languages, C++

class Square {
  public:
    int x,y,h,l;
    Square() { x = y = h = l = 0; }
};

Applicative/Functional Languages

Programs as functions that take arguments and return values; arguments and returned values may be functions

Programming consists of building the function that computes the answer; function application and composition main method of computation
Syntax: P1(P2(P3 X))
Example languages: ML, LISP, Scheme, Haskell, Miranda, Clean, Erlang, OCaml

Applicative/Functional Languages, OCaml

# let twice f x = f(f(x));;
# let inc x = x + 1;;
# inc 5;;
6
# twice inc 5;;
7

Logic/rule-based languages

Programs as sets of basic rules for decomposing problem
Computation by deduction: search, unification and backtracking main components
Syntax: Answer (:-) specification rule
Example languages: Prolog, Datalog, BNF Parsing

- human(socrates).
- mortal(X) :- human(X).
-? mortal(Who).
Who = socrates

Domain-specific languages (DSLs)

Languages targeted as specific domains: data access, program manipulation, modeling/verification, etc
May be subsets or extensions of existing languages
May be textual (SQL) or graphical (UML)
May also have very limited computational power (XML and HTML)

Themes in language development

Languages have become more abstract – abstraction of data, memory, machine errors, execution, etc
Languages have shifted performance burden to compilers – machines used to be expensive, now people are
DSLs have begun to proliferate, both as extensions to existing languages and as languages in their own right

Language evaluation

Evaluation criteria

Readability
Writability
Reliability
Cost

Readability

Overall simplicity
Orthogonality
Control statements (structured control vs. gotos)
Flexible data types and structures
Flexible, consistent syntax (identifiers, keywords, consistency)

Writability

Simplicity and orthogonality
Support for process and data abstractions
Expressivity

Reliability

Type checking
Type safety
Exception handling
Aliasing
Readability and writability

Cost

Training
Writing programs
Programming environment availability
Compilation
Execution performance
Language implementation/execution environment expense
Reliability
Maintenance
Others: portability? generality?

Goals can conflict…

Orthogonality is important, but too much can be confusing
Sometimes more readable languages are not as writable, or vice versa (nested comments, overloaded operators, etc)
Reliability and cost can be in tension – more reliable programs may be more expensive to execute

Compilation vs. interpretation

Language implementation

Develop layers of machines, each more primitive than the previous
Translate between successive layers (compile or interpret)
Ultimately, all programs execute in hardware, as shown in Figure 1.1, from Concepts of Programming Languages, by Robert W Sebesta.

⊕ <a name='figure1.1'>Figure 1.1</a>. With the von Neuman computer architecture, the central processing unit reads the instructions and the data from memory, writes the results of the operations in memory, and interacts with the input and output devices.
Figure 1.1. With the von Neuman computer architecture, the central processing unit reads the instructions and the data from memory, writes the results of the operations in memory, and interacts with the input and output devices.

Virtual Machines

At first, programs written in assembly language (or at very first, machine language)
Hand-coded to be very efficient
Now, no longer write in native assembly language
Use layers of software (eg operating system)
Each layer makes a virtual machine in which the next layer is defined, as shown in Figure 1.2, from Concepts of Programming Languages, by Robert W Sebesta.

⊕ <a name='figure1.2'>Figure 1.2</a>. The operating system and the compilers/interpreters create virtual machines on which the programs can run.
Figure 1.2. The operating system and the compilers/interpreters create virtual machines on which the programs can run.

Compilers often define virtual machine layers: lambda calculus, continuations, graph reduction, etc. in functional languages, bytecode machines, etc

Compilation

What is a compiler?

A compiler from language \(L_1\) to language \(L_2\) is a program that takes an \(L_1\) program and for each piece of code in \(L_1\) generates a piece of code in \(L_2\) with the same meaning

There are different ways to compile a program:

Traditional compilation, as shown in Figure 1.a, from Programming Language Pragmatics, by Michael L. Scott

⊕ <a name='figure1.a'>Figure 1.a</a>. With traditional compilation the compiler translates the source program into an executable that runs on the target machine, the target program. The latter can then run and converts the input into the output.
Figure 1.a. With traditional compilation the compiler translates the source program into an executable that runs on the target machine, the target program. The latter can then run and converts the input into the output.

Compilations with linking, as shown in Figure 1.b, from Programming Language Pragmatics, by Michael L. Scott

⊕ <a name='figure1.b'>Figure 1.b</a>. If the source program imports any libraries, the compiler first translates it into an incomplete machine language. The linker then combines it with the compiled library routines to produce an executable program in machine language.
Figure 1.b. If the source program imports any libraries, the compiler first translates it into an incomplete machine language. The linker then combines it with the compiled library routines to produce an executable program in machine language.

Compiling to assembler, as shown in Figure 1.c, from Programming Language Pragmatics, by Michael L. Scott

⊕ <a name='figure1.c'>Figure 1.c</a>. An alternative is for the compiler to translate the source program into assembly language, and then for the assembler to convert it into machine language.
Figure 1.c. An alternative is for the compiler to translate the source program into assembly language, and then for the assembler to convert it into machine language.

Traditional C compilation, as shown in Figure 1.d, from Programming Language Pragmatics, by Michael L. Scott

⊕ <a name='figure1.d'>Figure 1.d</a>. Traditionally, C code is first passed through the preprocessor which replaces the macros with their larger constructs, after which the compiler translates it into assembly language.
Figure 1.d. Traditionally, C code is first passed through the preprocessor which replaces the macros with their larger constructs, after which the compiler translates it into assembly language.

The cfront C++ compiler, as shown in Figure 1.e, from Programming Language Pragmatics, by Michael L. Scott

⊕ <a name='figure1.e'>Figure 1.e</a>. The C++ compiler takes the source code after it was expanded by the preprocessor, and converts into C code. This is then passed through the C compiler, which converts it into assembly language.
Figure 1.e. The C++ compiler takes the source code after it was expanded by the preprocessor, and converts into C code. This is then passed through the C compiler, which converts it into assembly language.

The compilation steps are shown in Figure 1.3, from Concepts of Programming Languages, by Robert W Sebesta.

⊕ <a name='figure1.3'>Figure 1.3</a>. The compilation process. The source code goes through the scanner (lexical analyzer), the parser (syntax analyzer), and the intermediate code generator (semantic analyzer). Then the code is optimized and converted to machine language.
Figure 1.3. The compilation process. The source code goes through the scanner (lexical analyzer), the parser (syntax analyzer), and the intermediate code generator (semantic analyzer). Then the code is optimized and converted to machine language.

Interpretation

What is an interpreter?

An interpreter of \(L_1\) in \(L_2\) is an \(L_2\) program that executes the meaning of a given \(L_1\) program

Interpreters and compilers must know about both the syntax of the language and its semantics, and must preserve the semantics (the meaning)

Different types of interpretation are shown below:

Traditional interpretation, shown in Figure 1.f, from Programming Language Pragmatics, by Michael L. Scott

⊕ <a name='figure1.f'>Figure 1.f</a>. With traditional interpretation, the interpreter is the locus of control during the code execution. It takes the source program and the user input to generate the output.
Figure 1.f. With traditional interpretation, the interpreter is the locus of control during the code execution. It takes the source program and the user input to generate the output.

Virtual machines/IL, shown in Figure 1.g, from Programming Language Pragmatics, by Michael L. Scott

⊕ <a name='figure1.g'>Figure 1.g</a>. Some programming languages translate the source code into intermediate code that is then executed on a virtual machine.
Figure 1.g. Some programming languages translate the source code into intermediate code that is then executed on a virtual machine.

Pascal P-code, shown in Figure 1.h, from Programming Language Pragmatics, by Michael L. Scott

⊕ <a name='figure1.h'>Figure 1.h</a>. The Pascal compiler, written in Pascal, generates output in P-code. A P-code interpreter, written in Pascal, then translates this code into machine language.
Figure 1.h. The Pascal compiler, written in Pascal, generates output in P-code. A P-code interpreter, written in Pascal, then translates this code into machine language.

The interpretation steps are shown in Figure 1.5, from Concepts of Programming Languages, by Robert W Sebesta.

⊕ <a name='figure1.5'>Figure 1.5</a>. The interpretation process. The source code goes through the scanner (lexical analyzer), the parser (syntax analyzer), and the intermediate code generator (semantic analyzer). The interpreter then runs the intermediate code, gets the user input, and generates the results.
Figure 1.5. The interpretation process. The source code goes through the scanner (lexical analyzer), the parser (syntax analyzer), and the intermediate code generator (semantic analyzer). The interpreter then runs the intermediate code, gets the user input, and generates the results.

Major compiler phases

The compiler phases are shown in Figure 1.i, from Programming Language Pragmatics, by Michael L. Scott.

⊕ <a name='figure1.i'>Figure 1.i</a>. Compilation has two major components: the front-end (scanner, parser, and semantic analysis), and the back-end (optimization, code generation, and machine-specific optimization).
Figure 1.i. Compilation has two major components: the front-end (scanner, parser, and semantic analysis), and the back-end (optimization, code generation, and machine-specific optimization).

Compiler front-end

Lex: break the source into separate tokens
Parse: analyze phrase structure and apply semantic actions, usually to build an abstract syntax tree
Semantic analysis: determine what each phrase means, connect variable name to definition (typically with symbol tables), check types

GCD program, C-like language

int main() {
  int i = getint(), j = getint();
  while (i != j) {
    if (i > j) i = i - j;
    else j = j - i;
  }
  puint(i);
}

Lexemes

int	main	(	)	{	int	i	=
getint	(	)	,	j	=	getint	(
)	;	while	(	i	!=	j	)
{	if	(	i	>	j	)	i
=	i	-	j	;	else	j	=
j	-	i	;	}	putint	(	i
)	;	}

The parser takes the lexemes generated by the scanner and generates a parse tree, as shown in Figure 1.j and Figure 1.k, and then reduces it to an abstract syntax tree, as shown in Figure 1.j (figures from Programming Language Pragmatics, by Michael L. Scott).

⊕ <a name='figure1.j'>Figure 1.j</a>. Part of the parse tree generated from lexemes.
Figure 1.j. Part of the parse tree generated from lexemes.

⊕ <a name='figure1.k'>Figure 1.k</a>. Part of the parse tree generated from lexemes.
Figure 1.k. Part of the parse tree generated from lexemes.

⊕ <a name='figure1.j'>Figure 1.j</a>. Most of the parse tree is removed to generate an abstract syntax tree.
Figure 1.j. Most of the parse tree is removed to generate an abstract syntax tree.

Compiler back-end

Translate to IR
Optimize: improve generated code, while maintaining original meaning
Generate code: instruction selection, register allocation

Sample IR

Source:

X = A + B + C;

Three address code:

%tmp.1 = A + B;
X = %tmp.1 + C;

Sample optimization

Source:

while (X != 0) {
  B = 5;
  C = X + B;
  X = ...
}

Target:

if (X != 0) B = 5;
while (X != 0) {
  C = X + B;
  X = ...
}

For further information

For history, most books on languages. Sebesta Chapter 2 is good, as well as Chapter 2 of Louden’s Programming Languages: Principles and Practice. The books for the History of Programming Languages workshops are good in-depth sources as well. Finally, IEEE Annals of the History of Computing has articles on this theme occasionally.
Language Design principles can be found in the same sources. Obviously, there is disagreement about which features are best, so it’s best to look at multiple sources.
For compilers, Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman is the classic text. Cooper and Torczon’s Engineering a Compiler is a more recent treatment.

Summary

Motivation for taking this course
Syllabus details
Programming domains, language evaluation criteria, influences on language design and their tradeoffs
Compilation vs. interpretation

Next time: Python programming language, part 1 of 2
Reminders:

Provide feedback/unofficial course evaluation (any time)
Submit questions for quiz 1 by Thursday, August 13 at 11:59 PM

Back to course home