Overview
Lesson plan
-
Introductions and motivation for taking this course
-
Programming domains, language evaluation criteria, influences on language design and their tradeoffs
-
Compilation vs. interpretation
-
Summary
Course theme – first part
Course theme
“A career in computer science is a commitment to a lifetime of learning. You will not be taught every detail you will need in your career while you are a student. The goal of a computer science education is to give you the tools you need so you can teach yourself new languages, frameworks, and architectures as they come along. The creativity encouraged by a lifetime of learning makes computer science one of the most exciting fields today” (Kent D. Lee, ).
-
Introduction – 1 lecture
-
Python – 2 lectures
-
Prolog – 2 lectures
-
Haskell – 4 lectures
Course theme – second part
“Education should prepare young people for jobs that do not yet exist, using technologies that have not yet been invented, to solve problems of which we are not yet aware” (Richard Riley, former Secretary of Education).
-
Parsing and scanning – 2 lectures
-
Formal semnatics – 2 lectures
-
Review for final exam – 1 lecture
What will we study?
-
Programming languages characterized by:
-
Syntax – what a program looks like
-
Semantics – what a program means
-
Implementation – how a program executes
-
-
Trade-offs between design (specifying syntax and semantics) and implementation
-
Will focus on these concepts, illustrated by various languages, rather than on the languages themselves
Course theme
Tell me
And I will forget
Show me
And I may remember
Involve me
And I will understand
Why study programming languages?
-
Help you choose a language
-
Systems programming \(\to\) C, Modula-3
-
Scientific computations \(\to\) Fortran, Ada
-
Embedded systems \(\to\) C, Ada, Modula-2
-
Symbolic computing \(\to\) Lisp, Scheme, ML
-
-
Make it easier to learn new languages
-
Some languages are similar
-
Some concepts are even more similar: iteration, recursion, abstraction
-
-
Make better use of the languages you know
-
Improve ability to develop effective algorithms
-
Understand obscure features
-
Understand implementation costs
-
Simulate useful features in languages that do not support them
-
-
Prepare for further study in language design and implementation (compilers)
-
Help understand other system software (assemblers, linkers, debuggers)
History of programming languages
Early history
-
1940’s: programming meant modifying the hardware; programmers had intimate contact with the computer
-
Early 1950’s: Early low-level languages developed
-
machine code
-
assembly language
-
-
Late 1950’s – early 1960’s: First higher-level languages arrive
-
Fortran
-
LISP
-
COBOL
-
ALGOL
-
Structured programming, module systems, and the \(\lambda\)-calculus
-
Late 1960’s – 1970’s: Structured programming
-
Pascal
-
C
-
-
Late 1970’s – early 1980’s: Data abstraction/module systems
-
Modula-2
-
CLU
-
Ada
-
-
1960’s: \(\lambda\)-calculus really starts to influence languages
-
ISWIM (Landin)
-
ML (late 1970’s)
-
Object orientation, dynamic languages, and the Internet
-
1980’s: Object oriented languages start to catch on
-
Smalltalk
-
C++
-
Extensions to existing languages: Objective C, Object Pascal, etc
-
-
1990’s: The Internet starts to impact languages more
- Java
-
1990’s – 2000’s: Dynamic languages become more popular
-
Web scripting languages (JavaScript, etc)
-
Perl, Python, PHP
-
Some new interest in Lisp
-
Functional starts to become trendy (Scala, Haskell, Clojure)
-
Classification of languages
Imperative languages
-
Main focus: machine state – the set of values stored in memory locations
-
Command-driven: Each statement uses current state to compute a new state
-
Syntax: S1; S2; S3; …
-
Example languages: C, Pascal, FORTRAN, COBOL, CLU, Ada
Imperative Languages, C
while (b > 0) {
a = a \* 2;
b = b - 1;
}
Object-oriented languages
Classes are complex data types grouped with operations (methods) for creating, examining, and modifying elements (objects); subclasses include (inherit) the objects and methods from superclasses
-
Computation is based on objects sending messages (methods applied to arguments) to other objects
-
Syntax: Varies, object \(\leftarrow\) method(args)
-
Example languages: Java, C++, Smalltalk, Simula-67, Beta
Object-oriented languages, C++
class Square {
public:
int x,y,h,l;
Square() { x = y = h = l = 0; }
};
Applicative/Functional Languages
Programs as functions that take arguments and return values; arguments and returned values may be functions
-
Programming consists of building the function that computes the answer; function application and composition main method of computation
-
Syntax: P1(P2(P3 X))
-
Example languages: ML, LISP, Scheme, Haskell, Miranda, Clean, Erlang, OCaml
Applicative/Functional Languages, OCaml
# let twice f x = f(f(x));;
# let inc x = x + 1;;
# inc 5;;
6
# twice inc 5;;
7
Logic/rule-based languages
-
Programs as sets of basic rules for decomposing problem
-
Computation by deduction: search, unification and backtracking main components
-
Syntax: Answer (:-) specification rule
-
Example languages: Prolog, Datalog, BNF Parsing
- human(socrates).
- mortal(X) :- human(X).
-? mortal(Who).
Who = socrates
Domain-specific languages (DSLs)
-
Languages targeted as specific domains: data access, program manipulation, modeling/verification, etc
-
May be subsets or extensions of existing languages
-
May be textual (SQL) or graphical (UML)
-
May also have very limited computational power (XML and HTML)
Themes in language development
-
Languages have become more abstract – abstraction of data, memory, machine errors, execution, etc
-
Languages have shifted performance burden to compilers – machines used to be expensive, now people are
-
DSLs have begun to proliferate, both as extensions to existing languages and as languages in their own right
Language evaluation
Evaluation criteria
-
Readability
-
Writability
-
Reliability
-
Cost
Readability
-
Overall simplicity
-
Orthogonality
-
Control statements (structured control vs. gotos)
-
Flexible data types and structures
-
Flexible, consistent syntax (identifiers, keywords, consistency)
Writability
-
Simplicity and orthogonality
-
Support for process and data abstractions
-
Expressivity
Reliability
-
Type checking
-
Type safety
-
Exception handling
-
Aliasing
-
Readability and writability
Cost
-
Training
-
Writing programs
-
Programming environment availability
-
Compilation
-
Execution performance
-
Language implementation/execution environment expense
-
Reliability
-
Maintenance
-
Others: portability? generality?
Goals can conflict…
-
Orthogonality is important, but too much can be confusing
-
Sometimes more readable languages are not as writable, or vice versa (nested comments, overloaded operators, etc)
-
Reliability and cost can be in tension – more reliable programs may be more expensive to execute
Compilation vs. interpretation
Language implementation
-
Develop layers of machines, each more primitive than the previous
-
Translate between successive layers (compile or interpret)
-
Ultimately, all programs execute in hardware, as shown in Figure 1.1, from Concepts of Programming Languages, by Robert W Sebesta.
Figure 1.1. With the von Neuman computer architecture, the central processing unit reads the instructions and the data from memory, writes the results of the operations in memory, and interacts with the input and output devices.
Virtual Machines
-
At first, programs written in assembly language (or at very first, machine language)
-
Hand-coded to be very efficient
-
Now, no longer write in native assembly language
-
Use layers of software (eg operating system)
-
Each layer makes a virtual machine in which the next layer is defined, as shown in Figure 1.2, from Concepts of Programming Languages, by Robert W Sebesta.
Figure 1.2. The operating system and the compilers/interpreters create virtual machines on which the programs can run.
- Compilers often define virtual machine layers: lambda calculus, continuations, graph reduction, etc. in functional languages, bytecode machines, etc
Compilation
What is a compiler?
A compiler from language \(L_1\) to language \(L_2\) is a program that takes an \(L_1\) program and for each piece of code in \(L_1\) generates a piece of code in \(L_2\) with the same meaning
There are different ways to compile a program:
- Traditional compilation, as shown in Figure 1.a, from Programming Language Pragmatics, by Michael L. Scott
Figure 1.a. With traditional compilation the compiler translates the source program into an executable that runs on the target machine, the target program. The latter can then run and converts the input into the output.
- Compilations with linking, as shown in Figure 1.b, from Programming Language Pragmatics, by Michael L. Scott
Figure 1.b. If the source program imports any libraries, the compiler first translates it into an incomplete machine language. The linker then combines it with the compiled library routines to produce an executable program in machine language.
- Compiling to assembler, as shown in Figure 1.c, from Programming Language Pragmatics, by Michael L. Scott
Figure 1.c. An alternative is for the compiler to translate the source program into assembly language, and then for the assembler to convert it into machine language.
- Traditional C compilation, as shown in Figure 1.d, from Programming Language Pragmatics, by Michael L. Scott
Figure 1.d. Traditionally, C code is first passed through the preprocessor which replaces the macros with their larger constructs, after which the compiler translates it into assembly language.
- The cfront C++ compiler, as shown in Figure 1.e, from Programming Language Pragmatics, by Michael L. Scott
Figure 1.e. The C++ compiler takes the source code after it was expanded by the preprocessor, and converts into C code. This is then passed through the C compiler, which converts it into assembly language.
The compilation steps are shown in Figure 1.3, from Concepts of Programming Languages, by Robert W Sebesta.
Figure 1.3. The compilation process. The source code goes through the scanner (lexical analyzer), the parser (syntax analyzer), and the intermediate code generator (semantic analyzer). Then the code is optimized and converted to machine language.
Interpretation
What is an interpreter?
An interpreter of \(L_1\) in \(L_2\) is an \(L_2\) program that executes the meaning of a given \(L_1\) program
Interpreters and compilers must know about both the syntax of the language and its semantics, and must preserve the semantics (the meaning)
Different types of interpretation are shown below:
- Traditional interpretation, shown in Figure 1.f, from Programming Language Pragmatics, by Michael L. Scott
Figure 1.f. With traditional interpretation, the interpreter is the locus of control during the code execution. It takes the source program and the user input to generate the output.
- Virtual machines/IL, shown in Figure 1.g, from Programming Language Pragmatics, by Michael L. Scott
Figure 1.g. Some programming languages translate the source code into intermediate code that is then executed on a virtual machine.
- Pascal P-code, shown in Figure 1.h, from Programming Language Pragmatics, by Michael L. Scott
Figure 1.h. The Pascal compiler, written in Pascal, generates output in P-code. A P-code interpreter, written in Pascal, then translates this code into machine language.
The interpretation steps are shown in Figure 1.5, from Concepts of Programming Languages, by Robert W Sebesta.
Figure 1.5. The interpretation process. The source code goes through the scanner (lexical analyzer), the parser (syntax analyzer), and the intermediate code generator (semantic analyzer). The interpreter then runs the intermediate code, gets the user input, and generates the results.
Major compiler phases
The compiler phases are shown in Figure 1.i, from Programming Language Pragmatics, by Michael L. Scott.
Figure 1.i. Compilation has two major components: the front-end (scanner, parser, and semantic analysis), and the back-end (optimization, code generation, and machine-specific optimization).
Compiler front-end
-
Lex: break the source into separate tokens
-
Parse: analyze phrase structure and apply semantic actions, usually to build an abstract syntax tree
-
Semantic analysis: determine what each phrase means, connect variable name to definition (typically with symbol tables), check types
GCD program, C-like language
int main() {
int i = getint(), j = getint();
while (i != j) {
if (i > j) i = i - j;
else j = j - i;
}
puint(i);
}
Lexemes
int main ( ) { int i =
getint ( ) , j = getint (
) ; while ( i != j )
{ if ( i > j ) i
= i - j ; else j =
j - i ; } putint ( i
) ; }
The parser takes the lexemes generated by the scanner and generates a parse tree, as shown in Figure 1.j and Figure 1.k, and then reduces it to an abstract syntax tree, as shown in Figure 1.j (figures from Programming Language Pragmatics, by Michael L. Scott).
Figure 1.j. Part of the parse tree generated from lexemes.
Figure 1.k. Part of the parse tree generated from lexemes.
Figure 1.j. Most of the parse tree is removed to generate an abstract syntax tree.
Compiler back-end
-
Translate to IR
-
Optimize: improve generated code, while maintaining original meaning
-
Generate code: instruction selection, register allocation
Sample IR
Source:
X = A + B + C;
Three address code:
%tmp.1 = A + B;
X = %tmp.1 + C;
Sample optimization
Source:
while (X != 0) {
B = 5;
C = X + B;
X = ...
}
Target:
if (X != 0) B = 5;
while (X != 0) {
C = X + B;
X = ...
}
For further information
-
For history, most books on languages. Sebesta Chapter 2 is good, as well as Chapter 2 of Louden’s Programming Languages: Principles and Practice. The books for the History of Programming Languages workshops are good in-depth sources as well. Finally, IEEE Annals of the History of Computing has articles on this theme occasionally.
-
Language Design principles can be found in the same sources. Obviously, there is disagreement about which features are best, so it’s best to look at multiple sources.
-
For compilers, Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman is the classic text. Cooper and Torczon’s Engineering a Compiler is a more recent treatment.
Summary
Summary
-
Motivation for taking this course
-
Syllabus details
-
Programming domains, language evaluation criteria, influences on language design and their tradeoffs
-
Compilation vs. interpretation
Next time: Python programming language, part 1 of 2
Reminders:
-
Provide feedback/unofficial course evaluation (any time)
-
Submit questions for quiz 1 by Thursday, August 13 at 11:59 PM