Semantics

MotivationMaterial adapted from CSCI 3136 Principles of Programming Languages at Dalhousie University, and from Mark Hills’ CSCI 3675 Programming Languages

Semantic analysis is the third step in compilation, as shown in Figure 1, from Programming Language Pragmatics, by Michael L. Scott.

<a name='figure1'>Figure 1</a>. The syntax analysis converts the parse tree into an abstract syntax tree or other intermediate form.
Figure 1. The syntax analysis converts the parse tree into an abstract syntax tree or other intermediate form.

Syntax and semantics

Syntax

Semantics

Some constraints that may appear syntactic are enforced by semantic analysis:

Semantic analysis

Two approaches

Formal mechanism: Attributes grammars

Enforcing semantic rules

Static semantic rules

Dynamic semantic rules

Varieties of semantics

There are many methods of giving semantics to a programming language. Several common methods include:

Note that:

Operational semantics

Axiomatic semantics

Axiomatic semantics are also called Floyd-Hoare logic

{Precondition} Program {Postcondition}

Denotational semantics

In denotational semantics, we want to find the meaning, or denotation, of phrases in our language.

Alternative methods

There are many other methods which have been devised to give semantics to languages.

Transition semantics

Transition semantics is a form of operational semantics.

Key point: each transition indicates exactly one step of computation

IMP - a simple imperative language

We can use a simple imperative language, IMP, as a sample language for semantics. This language has the following syntactic categories:

For shorthand, \(n,m \in \mathbf{N}; X,Y \in \mathbf{Loc}; a \in \mathbf{Aexp}; b \in \mathbf{Bexp}; c \in \mathbf{Com}\).

IMP syntax

Using a variant of BNF, we can specify the syntax for IMP as: \(\begin{align*} & Aexp & a & ::= & & n\ |\ X\ |\ a_0 + a_1\ |\ a_0 - a_1\ |\ a_0 \times a_1 & \\ & Bexp & b & ::= & & \mathbf{true}\ |\ \mathbf{false}\ |\ a_0 = a_1\ |\ a_0 \leq a_1\ |\ \neg b\ |\ b_0 \wedge b_1\ |\ b_0 \vee b_1 & \\ & Com & c & ::= & & \mathbf{skip}\ |\ X\ :=\ a\ |\ c_0; c_1\ |\ \mathbf{if}\ b\ \mathbf{then}\ c_0\ \mathbf{else}\ c_1\ |\ \\ & & & & & \mathbf{while}\ b\ \mathbf{do}\ c & \\ \end{align*}\)

Important Point: We assume reasonable input programs that parse and we don’t care about precedence, associativity, etc - we assume all that has been figured out for us

IMP configurations

Our configurations will contain two elements:

We can define the set of states \(\Sigma\) as functions \(\sigma: \mathbf{Loc} \rightarrow \mathbf{N}\), or functions from locations to integer values.

Arithmetic rules and simple expressions

Our semantic relation for arithmetic expressions will be of the form:

$$\langle a,\sigma \rangle \rightarrow n$$

We assume arithmetic expressions have no side-effects.

$$\langle n,\sigma\rangle \rightarrow n$$
$$\langle X,\sigma \rangle \rightarrow \sigma(X)$$

Arithmetic expressions: sums

Sum rules

Arithmetic expressions: subtraction

Substraction rules

Arithmetic expressions: products

Product rules

Boolean rules and simple expressions

Our semantic relation for boolean expressions will be of the form:

$$\langle b,\sigma \rangle \rightarrow t$$

We assume boolean expressions have no side-effects.

$$\langle \mathbf{true},\sigma\rangle \rightarrow \mathbf{true}$$
$$\langle \mathbf{false},\sigma\rangle \rightarrow \mathbf{false}$$

Boolean expressions: equality

Equality rules

The rules for \(\leq\) are similar.

Boolean expressions: and

And rules

The rules for \(\vee\) and \(\neg\) are similar.

Commands

While expressions in our language cannot have side effects, commands can. So, here we need to model the changes in state that occur when commands run. Here, our semantic relation will be of the form:

$$\langle c,\sigma \rangle \rightarrow \sigma'$$

So, when command \(c\) is fully evaluated, potentially altered memory \(\sigma'\) is returned.

Simple commands

The \(\mathbf{skip}\) command does not alter the state:

$$\langle \mathbf{skip},\sigma \rangle \rightarrow \sigma$$

Assignment - some new notation

The assignment command does alter the state. One way we can view it is we get back a new state function which is the same everywhere except at the location we’ve updated, which now holds the new value. We can define this as:

$$\sigma[m/X](Y) = \left\{ \begin{array}{rr} m & \mbox{if}\ Y = X \\ \sigma(Y) & \mbox{if}\ Y \neq X \end{array}\right.$$

Assignment

With our new notation, assignment can be shown as follows. Note we reduce the expression we are assigning to \(X\) first, before we do the actual assignment.

Assignment rule

Sequencing

We can sequence commands as well in our language. We always complete execution of the first command before starting on the second. This gives us the following semantic rules:

Sequencing rules

Conditionals

We have a conditional statement in our language. We need to evaluate the guard first before deciding which branch to take. We can represent this as:

Conditional rules

Loops

We have one loop, the while command. Here, we need to evaluate the guard - if it is still true, we want to evaluate the body, and we want to then evaluate the loop again. In some sense, this takes us back where we started, but most likely with a different state (if not, we probably won’t terminate). We will expand to a conditional to do this, or else we would “lose” the guard when we evaluate it.

$$\langle \mathbf{while}\ b\ \mathbf{do}\ c, \sigma \rangle \rightarrow \langle \mathbf{if}\ b\ \mathbf{then}\ (c; \mathbf{while}\ b\ \mathbf{do}\ c)\ \mathbf{else}\ \mathbf{skip}, \sigma \rangle $$

Example

To see how we can “evaluate” something operationally, start with:

$$\langle \mathtt{if}\ x > 5\ \mathtt{then}\ y := 2 + 3\ \mathtt{else}\ y := 3 + 4, \lbrace x \mapsto 7 \rbrace \rangle$$

In this state, we have at some point assigned the value \(7\) to location \(x\).

Example

$$\langle \mathtt{if}\ x > 5\ \mathtt{then}\ y := 2 + 3\ \mathtt{else}\ y := 3 + 4, \lbrace x \mapsto 7 \rbrace \rangle$$

Since this is a conditional, we first need to evaluate the guard until we get a value, either \(\mathbf{true}\) or \(\mathbf{false}\);

Evaluation example

Example

$$\langle \mathtt{if}\ x > 5\ \mathtt{then}\ y := 2 + 3\ \mathtt{else}\ y := 3 + 4, \lbrace x \mapsto 7 \rbrace \rangle$$

Now that we just have values in the relational expression, we can use the appropriate rule to determine the truth or falsity of the guard.

Evaluation example

Example

$$\langle \mathtt{if}\ x > 5\ \mathtt{then}\ y := 2 + 3\ \mathtt{else}\ y := 3 + 4, \lbrace x \mapsto 7 \rbrace \rangle$$

Now that we have a truth value for the guard, we can use the appropriate conditional rule to pick the correct statement to continue with.

$$\langle \mathtt{if}\ \mathbf{true}\ \mathtt{then}\ y := 2 + 3\ \mathtt{else}\ y := 3 + 4, \lbrace x \mapsto 7 \rbrace \rangle \rightarrow$$ $$\langle y := 2 + 3, \lbrace x \mapsto 7 \rbrace \rangle$$

Example

$$\langle \mathtt{if}\ x > 5\ \mathtt{then}\ y := 2 + 3\ \mathtt{else}\ y := 3 + 4, \lbrace x \mapsto 7 \rbrace \rangle$$

We can now evaluate the arithmetic expression, since we need to have an integer value before we can do assignment.

Evaluation example

Example

$$\langle \mathtt{if}\ x > 5\ \mathtt{then}\ y := 2 + 3\ \mathtt{else}\ y := 3 + 4, \lbrace x \mapsto 7 \rbrace \rangle$$

Finally, we can use the assignment rule to assign a value to y.

$$\langle y := 5, \lbrace x \mapsto 7 \rbrace \rangle \rightarrow \lbrace x \mapsto 7, y \mapsto 5 \rbrace$$

Execution summary

Evaluation example

Evaluation in transition semantics

As we saw above, we can view evaluation as a sequence of steps with trees of justifications for each step. This gives us a sequence of evaluation steps:

$$\langle C_1,m_1 \rangle \rightarrow\langle C_2,m_2 \rangle \rightarrow\cdots \rightarrow m$$

We can then define \(\rightarrow^*\) as the transitive closure of \(\rightarrow\), essentially giving us a relation that evaluates from \(\langle C_i,m_i \rangle\) to \(m\) (or, from a starting program and state to the final state).

Adding functions and local bindings

We don’t currently have functions or let expressions in our language. How would adding them impact our semantics? We will change arithmetic expressions to just expressions (no need to change anything else defined so far, this just gives it a name more consistent with it’s new use) and add syntax there.

\[\begin{align*} & AExp & a & ::= & & n\ |\ X\ |\ a_0 + a_1\ |\ a_0 - a_1\ |\ a_0 \times a_1\ |\ & \\ & & & & & \mathbf{let}\ X = a_0\ \mathbf{in}\ a_1\ |\ \mathbf{fun}\ X\ \rightarrow\ a\ |\ a_0\ a_1& \\ & Bexp & b & ::= & & \mathbf{true}\ |\ \mathbf{false}\ |\ a_0 = a_1\ |\ a_0 \leq a_1\ |\ \neg b\ |\ b_0 \wedge b_1\ |\ b_0 \vee b_1 & \\ & Com & c & ::= & & \mathbf{skip}\ |\ X\ :=\ a\ |\ c_0; c_1\ |\ \mathbf{if}\ b\ \mathbf{then}\ c_0\ \mathbf{else}\ c_1\ |\ \\ & & & & & \mathbf{while}\ b\ \mathbf{do}\ c & \\ \end{align*}\]

Semantics for functions and lets

We also have one axiom - a function, without application, just evaluates to itself:

$$\langle (\mathbf{fun}\ X \rightarrow a), \sigma \rangle \rightarrow(\mathbf{fun}\ X \rightarrow a)$$

Call by value

Using call-by-value, we will evaluate terms before substitution. First, the let expression:

Call by value rules

Call by value

Next, function application:

Call by value rules

Call by name

Using call-by-name, we will perform substitution first. This lets us get by without additional rules to reduce the arguments first:

\[\langle \mathbf{let}\ X = a_0\ \mathbf{in}\ a_1,\sigma \rangle \rightarrow\langle [a_0/X]a_1, \sigma \rangle\] \[\langle (\mathbf{fun}\ X \rightarrow a_0) a_1, \sigma \rangle \rightarrow\langle [a_1/X]a_0, \sigma \rangle\]

Call by value vs. call by name

Question: Is there any difference between the two?

Answer: Maybe, depending on the language…

Church-Rosser

A language (reduction system) that is Church-Rosser is one in which reduction order does not affect the final result (although it can affect convergence): if \(E \rightarrow^{*} E_1\) and \(E \rightarrow^{*} E_2\) then there exists a value \(V\) such that \(E_1 \rightarrow^{*} V\) and \(E_2 \rightarrow^{*} V\). This is also known as confluence or the diamond property.

Diamon property

Are all systems Church-Rosser?

No. Especially considering side-effects, most languages are not Church-Rosser.

ICE on transition semantics

Using the semantic rules, show the steps taken to a final value for the following program. If this program does not terminate, write diverges. For this program, the state begins empty (i.e., there are no mappings from names to values).

\[X := 1 + (2 * 3)\]

Natural semantics

Natural semantics are similar to transition semantics except

Rules will be of the form

$$\langle C,m \rangle \Downarrow m'$$

And it should be the case that

$$\langle C,m \rangle \rightarrow^{*} m'\ \mbox{iff}\ \langle C,m \rangle\Downarrow m'$$

An aside

Both transition semantics and natural semantics are forms of operational semantics. Transition semantics is also known as structural operational semantics, or small-step semantics. Natural semantics is also known as big-step semantics.

Natural semantics of atomic expressions

In both cases we only take one step, so the semantics are roughly the same:

\[\langle true,\sigma \rangle \Downarrow true\] \[\langle false,\sigma \rangle \Downarrow false\]

Arithmetic expressions

Since the rules for addition, subtraction, and multiplication all look the same except for the operator used, we will just let \(\texttt{op} = \lbrace +,-,\times \rbrace\) below:

Arithmetic expressions rules

Relational operations

We again use {\tt op} here to stand for a number of operations which all have similar rules. Let \(\texttt{op} = \lbrace =, \leq \rbrace\) below:

Relational operations rules

Logical operations: and

Since our and operation “short-circuits”, we need two rules, one that tells us the result when the first operation is false, and one that tells us the result when the first operation is true:

And rules

Logical operations: or

Or is similar to and - both “short-circuit”, so we also need two rules here. Note the main difference is that we flip around the value used to decide whether to continue with the second argument:

Or rules

Logical operations: not

Negation is fairly simple - we can just handle it with two cases, one for evaluating the first argument to true, the other for false:

Not rules

Commands: skip

Similarly to how expressions are handled, we will evaluate commands completely, yielding the state after the command is executed. Skip is defined similarly to the definition in transition semantics:

$$\langle \mathbf{skip},\sigma \rangle \Downarrow \sigma$$

Commands: assignment

With assignment, we build in the evaluation of the expression directly into the assignment rule, instead of requiring two steps.

Assignment rules

Commands: sequencing

Sequencing again requires we completely evaluate the component commands. Here, we can directly see the result of the first command being “passed on” to the second.

Sequencing rules

Commands: conditional

We make the choice directly in the rules premises, since we are required to evaluate to the final state.

Conditional rules

Commands: while

The rule for the while command is “recursive”, in that it refers to itself. One important point to note is that we can use while directly here, instead of needing to wrap it inside an if like we did with the transition semantics.

While rules

Example

To see how we can “evaluate” something with natural semantics, start with:

$$\langle \mathtt{if}\ x \leq 5\ \mathtt{then}\ y := 2 + 3\ \mathtt{else}\ y := 3 + 4, \lbrace x \mapsto 7 \rbrace \rangle$$

In this state, we have at some point assigned the value \(7\) to location \(x\).

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Example

Evaluation example

Commands: let

The rule for let is slightly different than in our transition semantics example - here we allow the let body to be a command, not just an arithmetic expression. This means we now could have side effects in the let body.

Let rules

Note that, overall, we return \(\sigma"\), not \(\sigma'\). \(\sigma"\) is defined to be identical to \(\sigma'\) everywhere except \(x\) - for \(x\), it is defined as identical to \(\sigma\) if \(x\) is defined in \(\sigma\), else it is undefined. Why? Because we need to propagate all state changes, but also need to enforce scope.

Example

Evaluation example

Example

Evaluation example

The let construct, with the need to track state, somewhat conflicts with the assignment operation, leading to semantics that are a bit clumsy.

Why have two semantics?

We now have natural and transition semantics for our language. Why have both?


Back to course home