联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Java编程Java编程

日期:2021-07-13 09:34

UCA - HUAT Formal Languages and Compilation - 2020/2021

Compiler - Based on a similar lab work in Caen University

Antlr, Calculator and MVaP language

The objectives of the lab work (on machine) :

1. to be familiar with a lexical and syntactic analyser tool. We have choosen the ANTLRtool,

which uses Java language.

Your friend is the ANTLRdocumentation : https ://github.com/antlr/antlr4/blob/master/doc/index.md

2. In a second part, you will have to design a programming language dealing with variables,

conditional structures and loops. Apart from conditional structures, assignements to

variables and loops, the language is intented to manipulate aritmetical and boolean

expressions. You will use antlr for the lexical, syntactic and code generation. The target

language for the code generation will be the MVaP language, the language of the MVaP

stack machine. The descriptions of the instructions are given in the MVaP-Cheat-Sheet

pdf.

1 Quick start of Antlr

Question. Download the install file install.txt and follow the instructions to install

ANTLR. The installation instructions are for a Gnu/Linux or similar operating systems (Unix,

Mac OS X). I recommend then to read https ://github.com/antlr/antlr4/blob/master/doc/index.md.

The following is an attribute grammar for arithmetical expressions using sum and product.

The first axiom is always start, et every non-terminal should start with a lower case letter.

Tokens always are in upper-case. We end the start axiom with the system token EOF, which

is automatically produced by the lexical analyser after reading the input.

grammar Calculator;

// grammar rules

start

: expr EOF;

expr

: expr ’*’ expr

| expr ’+’ expr

| INT

;

//lexical rules. When something is unmatched, we skip it.

NEWLINE : ’\r’? ’\n’ -> skip;

WS : (’ ’|’\t’)+ -> skip;

INT : (’0’...’9’)+;

UNMATCH : . -> skip;

Question. Explain why this grammar is ambigous.

Question. Put this grammar in a file (say Calculator.g4) and compile (follow the instructions

in the install.txt file).

Question. Use the grun tool to visualise the syntactic tree (see install.txt file for the corresponding

command line). A use case with grun :

grun Calculator ’start’ -gui

Analyse the results with the following expressions :

42

24+ 24- 6

5*8+2* 1

5+2 * 8+ 3

6 *4 / 5+38

Question. We use attribute grammars to evaluate such arithmetical expressions, by associating

each rule with an attribute. We can do the following for the addition rule for instance.

expr returns [int val]

: a=expr ’+’ b=expr {$val=$a.val +$b.val;}

Complete so that the result of the evaluation of the arithmetical expression is printed.

To obtain the text of a token, use the token’s text attribute, and the token’s attribute int to

get the integer value held by the token if the text is a valid numeric string. See https ://github.com/antlr/antlr4/blob/master/doc/actions.md

for the set of token’s and rules attributes,

given by ANTLR and also the syntax for adding your own attributes and retreiving their values.

Question. Analyse the output of the syntactic tree (with grun) and of the result of the

evaluation of the expression 5 + 2 ? 8 + 3. Permute now in expr the order in which the rules for

the sum and product are given, and do the same as above with the expression 5 + 2 ? 8 + 3.

Do we have same outputs ? Why ?

2 Calculator

The objective is to enrich our calculator language to manipulate complex arithmetical

expressions with variables and types, loops, printing and reading, and conditional structures.

We recommend to use standard techniques to deal with ambiguity. For instance, for

arithmetical expressions, division and product have high priorities on sum and substractions,

with same priorities we evaluate from left to right. We recommend to read again

https ://github.com/antlr/antlr4/blob/master/doc/left-recursion.md to know which rules are

used by default by ANTLR to resolve ambiguity.

We will add more and more rules to our grammar Calculator.g4.

2

Question. Enrich the grammar to include divisions, substractions and parenthesized expressions.

Test with the following expressions (they all evaluate to the same value). We suppose

from now that every expression is ended by a line break, but you can enrich by adding for

instance the possibility to end them also with semi-colons (you encapsulate them in a token

EndOfInstruction).

42

24+24-6

5*8+2*1

6*4/5+38

42+1+2+-3

5*8+2*-1/-1

(5*6*7*11 + 2)/11*5-1008

(5*6*7*11 + 2)/(11*5)

(5*6*7*11 + 2)/11/5

(5*6*7*11 + 2)/(11/5)-1114

Question. We have seen in exercises’ lecture rules for manipulating boolean expressions

with comparisons between arithmetical expressions. Complete these rules to add comparisons

between boolean expressions and add the rules to the Calculator grammar. We recall that

the comparisons operations are == (equality), <> (difference), < (less than), > (greater

than), <= (less or equal than) et >= (greater or equal than), and the operators for boolean

expressions are and, or and not. The rule should also accept parenthesized boolean expressions.

Test with the following expressions.

true and false or true and not true

true or false and false

true or (false or false)

false or (true and not false)

(not true and false) or (true and not false)

false or not true or true and not 42 == 42

false or 42 <= 40

false or (5*6*7*11 + 2)/11*5-1008 <= 5*8+2*-1/-1

50 <> 51

42+1+2+-3 <> 5*8+2*-1/-1

(5*6*7*11 + 2)/11/5 < (5*6*7*11 + 2)/(11/5)-1114

(5*6*7*11 + 2)/11/5 <= (5*6*7*11 + 2)/(11/5)-1114

5*8+2*1 > 6*4/5+38

5*8+2*1 >= 6*4/5+38

42 == (5*6*7*11 + 2)/11/5

3 Code Generation

The goal is to use the MVaP machine to execute the programs written in our language,

and then we need to generate a machine code instead of evaluating directly the expressions.

The target language for the code generation will be the MVaP language.

3

Reminder. If you did not yet install MVaP, please follow the instructions in install.txt.

Read also the MVaP-Cheat-Sheet.pdf file for the set of instructions of the MVaP machine.

Until now, we have associated an integer attribute with each rule to store the result of the

evaluation. For the code generation with target language MVaP, we will instead use a string

accumulator to store the MVaP code.

Question. Modify the grammar so that it generates, for each rule, the associated MVaP

code, instead of evaluating its value.

Tests. Test your parser to check that you really generate the right MVaP, you can use the

expressions above. Run the MVaP generated code to check that the results are correct (see

install.txt on how to run MVaP machine). The tool grun can be quickly too heavy for

testing. Instead, you can use the following to automate the tests :

import java.io.*;

import org.antlr.v4.runtime.*;

public class MainCalculator {

public static void main(String args[]) throws Exception {

CalculatorLexer lex;

if (args.length == 0)

lex = new CalculatorLexer(new ANTLRInputStream(System.in));

else

lex = new CalculatorLexer(new ANTLRFileStream(args[0]));

CommonTokenStream tokens = new CommonTokenStream(lex);

CalculatorParser parser = new CalculatorParser(tokens);

try {

parser.start(); // start the first axiom of the grammar

} catch (RecognitionException e) {

e.printStackTrace();

} catch (Exception e) {

e.printStackTrace();

}

}

}

4 Global Variables

We would like to be able to manipulate global variables and types. We will manipulate

only int, float and bool types. Values by default for variables is 0. In our language, any variable

should be declared before first use. One should be able to assign to variables constants, e.g.,,

x = 4, as well expressions, e.g.,, x = 6 ? y (in this latter we are expecting that the value of y

is computed and multiplied with 6 before assigning it to x).

We will follow the C language structure as follows : a program in our language starts with

declarations of variables first and then can do computations. See below, a proposition for the

structure of the file Calculator and a beginning of the rules.

4

grammar Calculator;

@parser::members {

private int _label = 0;

/** generator of label names */

private int nextLabel() { return _label++; }

}

start : calcul EOF;

calcul returns [ String code ]

@init{ $code = new String(); } // we initialise $code, to use it as accumulator

@after{ System.out.println($code); } // we print the MVaP code produced by the parser

: (decl { $code += $decl.code; })*

NEWLINE*

(instruction { $code += $instruction.code; })*

{ $code += " HALT\n"; }

;

EndInstruction

: (NEWLINE | ’;’)+

;

decl returns [ String code ] // declarations of variables

: TYPE IDENTIFIER EndInstruction

{

// to be completed

}

| TYPE IDENTIFIER ’=’ expression EndInstruction

{

// to be completed

}

;

instruction returns [ String code ] // an instruction other than declaration

: expression EndInstruction // boolean or arithmetical expressions

{

//to be completed

}

| assignment EndInstruction // assigning values to variables

{

// to be completed

}

| EndInstruction

{

$code="";

}

;

assignment returns [ String code ] // Assign an expression

5

: IDENTIFIER ’=’ expression

{

//to be completed

}

;

// lexer

TYPE : ’int’ | ’float’ | ’bool’; // three types

IDENTIFIER : ; //to be completed, any sequence of [a-zA-Z0-9_], starting always with [a-zA-Z] or _

Question. To deal with variables, you will probably need symbol tables to be able to decide

whether an identifier has been already used, and also to be able to recover the value of a variable

whenever it is used, in particular know its address in the stack (recall that the declarations are

done at the beginning, and then if you declare variables first in your MVaP code, you are able

to compute their addresses in the stack). To keep track of the tuples (id-variable,type-variable,

adress-in-stack), you can use for instance an HashMap, declared and initialised in the header

@parser : :members.

Modify now the Calculator grammar to generate MVaP code for all the rules we have

written so far (arithmetical and boolean expressions), and for declarations and assignments of

global variables.

Use the test collection to test your grammar. Each of the wanted behaviour has a test file

with a collection of expressions to be tested. From now on, it is better to use the test set to

check whether your proposal corresponds to the wanted behaviour.

Adding Input and Output operations. To ease the tests and the use of our language,

it might interesting to include a mechanism for reading from the system input and writing

on the system output. We recall that the MVaP machine has two dedicated instructions READ

and WRITE. The following rules allow to deal with input and output operations. Add the io

rule to the grammar file and modify also the instruction rule so that one can readln and

println in our language.

io returns [ String code ]

: READ ’(’ IDENTIFIER ’)’

{

int at = // to be completed, adress of the variable $IDENTIFIER.text;

$code=" READ" ;

$code+="\n"; // new line after READ

$code+=" STOREG" +" " + at + "\n";

}

| WRITE ’(’ expression ’)’

{

$code = $expression.code

+ " WRITE" + "\n"

+ " POP" + "\n";

}

;

WRITE : ’println’;

6

READ : ’readln’;

5 Blocks

In order to deal with conditional structures and loops with several instructions, we need

to be able to encapsulate a set of instructions in a block, so that we make a difference between

the instructions to execute inside the conditional structure, and the others outside.

Question. Add to the grammar the possibility to manipulate a block of instructions, encapsulated

between { and }. The beginning of the rule can be the following :

block returns [ String code ]

@init{ $code = new String ()}

: // to be completed

6 While Loops

Let’s enrich our language with while loops. The wanted syntax is the following, with cond

a boolean expression :

while (cond)

instructions

Question. Add the necessary rules to our grammar file. We are expecting that you allow

the two following forms.

while (cond)

// 1 single instruction.

// or a block of instructions

while (cond) {

//several instructions

}

7 Conditional Structures

Let’s now add jumps to our language with the if . . . then else . . . construction. The intended

syntax is the following (cond still a boolean expression) :

if (cond)

instructionthen

else instructionelse

The else is not mandatory. Also, as with the while instruction, the set of instructions

instructionthen and instructionelse might be a single or a block of instructions.

Question. Add the necessary rules for manipulating the if . . . then else . . . instruction.

7


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。