Semantics

MotivationMaterial adapted from CSCI 3136 Principles of Programming Languages at Dalhousie University, and from Mark Hills’ CSCI 3675 Programming Languages

Semantic analysis is the third step in compilation, as shown in Figure 1, from Programming Language Pragmatics, by Michael L. Scott.

⊕ <a name='figure1'>Figure 1</a>. The syntax analysis converts the parse tree into an abstract syntax tree or other intermediate form.
Figure 1. The syntax analysis converts the parse tree into an abstract syntax tree or other intermediate form.

Syntax and semantics

Syntax

Describes form of a valid program
Can be described by a context-free grammar

Semantics

Describes meaning of a program
Cannot be be described by a context-free grammar

Some constraints that may appear syntactic are enforced by semantic analysis:

E.g., use of identiﬁer only after its declaration

Semantic analysis

Enforces semantic rules
Builds intermediate representation (e.g., abstract syntax tree)
Fills symbol table
Passes results to intermediate code generator

Two approaches

Interleaved with syntactic analysis
As a separate phase

Formal mechanism: Attributes grammars

Enforcing semantic rules

Static semantic rules

Enforced by compiler at compile time
Example: Do not use undeclared variable

Dynamic semantic rules

Compiler generates code for enforcement at run time
Examples: division by zero, array index out of bounds
Some compilers allow these checks to be disabled

Varieties of semantics

There are many methods of giving semantics to a programming language. Several common methods include:

Operational semantics
Axiomatic semantics
Denotational semantics

Note that:

it may be easier to represent certain languages with certain types of semantics
the types of semantics are complementary - they are good for different purposes, with no method “the best”

Operational semantics

First, start with a definition of our “machine” which we run programs on
The machine state, including the program, is often called the configuration
Next, describe how to execute programs in a given language by describing how to execute individual statements and parts of statements - rules follow the structure of the program
A program’s “meaning” is defined by how it changes the configuration
Useful as a basis for implementations

Axiomatic semantics

Axiomatic semantics are also called Floyd-Hoare logic

based on logic - first-order predicate calculus
semantics represented as a logical system build from axioms and inference rules
mainly suited to simple imperative languages
used to prove a post-condition from a pre-condition: given something holds in the starting state, we can show something else holds in the end state

{Precondition} Program {Postcondition}

Denotational semantics

In denotational semantics, we want to find the meaning, or denotation, of phrases in our language.

we construct a function $\mathcal{M}$ assigning a mathematical meaning to each language construct;
these functions are compositional - we can construct the meaning of a language construct by composing the meanings of its components
useful for proving properties of programs - used for early type soundness proofs and in theorem provers (plus elsewhere)

Alternative methods

There are many other methods which have been devised to give semantics to languages.

“Compiler-based” semantics - the meaning is whatever the compiler says it is
Abstract Machines - similar to operational, an abstract machine with instructions, etc is devised and used to give semantics
Term Rewriting - semantics are formulated as term rewriting systems, with evaluation given by the rewriting relation
Rewriting Logic - an extension of equational logic that provides for concurrency

Transition semantics

Transition semantics is a form of operational semantics.

Configurations include the code and machine state: $(C,m)$
Semantics specified as transitions between configurations, altering the machine state
Rules of the form: $\langle C,m \rangle \rightarrow \langle C',m' \rangle$
$C$, $C'$ represents the code yet to be executed
$m$, $m'$ represents the state (store, memory, etc), often a finite map from names to values
May not need $m$ - simple calculator languages with only numbers and operations don’t, for instance

Key point: each transition indicates exactly one step of computation

IMP - a simple imperative language

We can use a simple imperative language, IMP, as a sample language for semantics. This language has the following syntactic categories:

numbers N, which are the integers (including negatives)
truth values $\mathbf{T} = \lbrace \mathbf{true},\mathbf{false} \rbrace$
locations Loc
arithmetic expressions Aexp
boolean expressions Bexp
commands Com

For shorthand, $n,m \in \mathbf{N}; X,Y \in \mathbf{Loc}; a \in \mathbf{Aexp}; b \in \mathbf{Bexp}; c \in \mathbf{Com}$.

IMP syntax

Using a variant of BNF, we can specify the syntax for IMP as: $\begin{align*} & Aexp & a & ::= & & n\ |\ X\ |\ a_0 + a_1\ |\ a_0 - a_1\ |\ a_0 \times a_1 & \\ & Bexp & b & ::= & & \mathbf{true}\ |\ \mathbf{false}\ |\ a_0 = a_1\ |\ a_0 \leq a_1\ |\ \neg b\ |\ b_0 \wedge b_1\ |\ b_0 \vee b_1 & \\ & Com & c & ::= & & \mathbf{skip}\ |\ X\ :=\ a\ |\ c_0; c_1\ |\ \mathbf{if}\ b\ \mathbf{then}\ c_0\ \mathbf{else}\ c_1\ |\ \\ & & & & & \mathbf{while}\ b\ \mathbf{do}\ c & \\ \end{align*}$

Important Point: We assume reasonable input programs that parse and we don’t care about precedence, associativity, etc - we assume all that has been figured out for us

IMP configurations

Our configurations will contain two elements:

The code ($a$, $b$, or $c$)
The state

We can define the set of states $\Sigma$ as functions $\sigma: \mathbf{Loc} \rightarrow \mathbf{N}$, or functions from locations to integer values.

Arithmetic rules and simple expressions

Our semantic relation for arithmetic expressions will be of the form:

$$\langle a,\sigma \rangle \rightarrow n$$

We assume arithmetic expressions have no side-effects.

We have a simple axiom for numbers:

$$\langle n,\sigma\rangle \rightarrow n$$

Location lookup is also similar:

$$\langle X,\sigma \rangle \rightarrow \sigma(X)$$

Arithmetic expressions: sums

Arithmetic expressions: subtraction

Arithmetic expressions: products

Boolean rules and simple expressions

Our semantic relation for boolean expressions will be of the form:

$$\langle b,\sigma \rangle \rightarrow t$$

We assume boolean expressions have no side-effects.

We have simple axioms for boolean constants, including true:

$$\langle \mathbf{true},\sigma\rangle \rightarrow \mathbf{true}$$

and false:

$$\langle \mathbf{false},\sigma\rangle \rightarrow \mathbf{false}$$

Boolean expressions: equality

The rules for $\leq$ are similar.

Boolean expressions: and

The rules for $\vee$ and $\neg$ are similar.

Commands

While expressions in our language cannot have side effects, commands can. So, here we need to model the changes in state that occur when commands run. Here, our semantic relation will be of the form:

$$\langle c,\sigma \rangle \rightarrow \sigma'$$

So, when command $c$ is fully evaluated, potentially altered memory $\sigma'$ is returned.

Simple commands

The $\mathbf{skip}$ command does not alter the state:

$$\langle \mathbf{skip},\sigma \rangle \rightarrow \sigma$$

Assignment - some new notation

The assignment command does alter the state. One way we can view it is we get back a new state function which is the same everywhere except at the location we’ve updated, which now holds the new value. We can define this as:

$$\sigma[m/X](Y) = \left\{ \begin{array}{rr} m & \mbox{if}\ Y = X \\ \sigma(Y) & \mbox{if}\ Y \neq X \end{array}\right.$$

Assignment

With our new notation, assignment can be shown as follows. Note we reduce the expression we are assigning to $X$ first, before we do the actual assignment.

Sequencing

We can sequence commands as well in our language. We always complete execution of the first command before starting on the second. This gives us the following semantic rules:

Conditionals

We have a conditional statement in our language. We need to evaluate the guard first before deciding which branch to take. We can represent this as:

Loops

We have one loop, the while command. Here, we need to evaluate the guard - if it is still true, we want to evaluate the body, and we want to then evaluate the loop again. In some sense, this takes us back where we started, but most likely with a different state (if not, we probably won’t terminate). We will expand to a conditional to do this, or else we would “lose” the guard when we evaluate it.

$$\langle \mathbf{while}\ b\ \mathbf{do}\ c, \sigma \rangle \rightarrow \langle \mathbf{if}\ b\ \mathbf{then}\ (c; \mathbf{while}\ b\ \mathbf{do}\ c)\ \mathbf{else}\ \mathbf{skip}, \sigma \rangle $$

Example

To see how we can “evaluate” something operationally, start with: