Building a compiler for a custom MIPS-like instruction set Vittorio Giovara Fabio Margaglia Francesco Meo Lorenzo Simone
document revision 1.2
Contents 1 Introduction
2
2 Instruction List 2.1 Instruction Rules . . . . . . . . 2.1.1 Arithmetical Operations 2.1.2 Logical Operations . . . 2.1.3 Memory Operations . . 2.2 Jump Operations . . . . . . . . 2.2.1 Branch Operations . . . 2.2.2 Shift Operations . . . . 2.2.3 Special Operations . . . 3 Compiler structure 3.1 The JFlex scanner . . . . . 3.2 The CUP parser . . . . . . 3.2.1 Instruction Encoding 3.2.2 Error Handling . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . . . . . . . . and Emulation . . . . . . . . .
1
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
3 4 4 4 4 4 4 5 5
. . . .
6 6 7 8 8
Chapter 1
Introduction The aim of this report is to provide documentation about the use of the instruction set of a custom MIPS-like processor and about the internal code of the compiler. This document will list the implemented instructions and their assemblylike structure. Afterwards it will present the use of the compiler itself as well as the explanation of the involved code about the scanner and parsing technique. The compiler comes with an emulator, that preforms simulated execution of the compiled code.
2
Chapter 2
Instruction List • NOP
• ST8
• ADDU
• JMP
• ADDS
• JAL
• SUBU
• JZ
• SUBS
• JNZ
• MULU
• JLZ
• MULS
• JGZ
• DIVU
• JLEZ
• DIVS • JGEZ
• AND
• LLS
• OR
• LRS
• XOR
• ALS
• LD32 • LD16
• ARS
• LD8
• LR
• ST32
• RR
• ST16
• MOV
3
2.1 2.1.1
Instruction Rules Arithmetical Operations
parameter 1 : Destination Register (Rd 1 or Rt 2 ) parameter 2 : Source Register (Rs ) parameter 3 : Source Register (Rt ) or Immediate
2.1.2
Logical Operations
parameter 1 : Destination Register (Rd or Rt ) parameter 2 : Source Register (Rs ) parameter 3 : Source Register (Rt ) or Immediate
2.1.3
Memory Operations
LOAD parameter 1 : Destination Register (Rt ) parameter 2 : Source Register (Rs ) parameter 3 : Immediate STORE parameter 1 : Source Register (Rt ) parameter 2 : Destination Register (Rs ) parameter 3 : Immediate
2.2
Jump Operations
parameter : Immediate or Register
2.2.1
Branch Operations
parameter 1 : Register (Rs ) parameter 2 : Immediate or Register (Rt ) 1 2
in register to register operations in register to immediate operations
4
2.2.2
Shift Operations
parameter 1 : Destination Register (Rd ) parameter 2 : Source Register (Rs ) parameter 3 : Register (Rt )
2.2.3
3
Special Operations
MOV parameter 1 : Destination Register (Rt or Rd ) parameter 2 : Immediate or Register (Rt ) NOP No arguments needed.
3
Defines the shift or rotate amount
5
Chapter 3
Compiler structure The compiler needs 3 arguments to be passed by command line: Input file: The source file for the assembly-like code; Verbosity level: Selects the quantity of ouput message printed; Emulation level: Turns on or off the emulation. The verbosity level is a number between 0 (meaning no output at all) and 4 (meaning that every single step of the scanner and of the parser are notified); error messages are always printed. When none is specified, a default is set to 1. The emulation level is a flag that manages the emulation execution, 0 deactivates this feature, 1 activates it and 2 gives information about the program counter. When none is specified, a default is set to 0. The output file on which the binary encoding is saved is always named code.bin.
3.1
The JFlex scanner
The scanner initially defines two additional contexes for handling the comments: one for single line comments (active from // to a new line), and the other for multi line comments (enclosed in /* and */). Then it describes the syntactic rules, using the following macros: nl The new line symbol for different operating systems; ignored List of charaters or symbols that can be safely ignored; register The structure for registers (capital R followed by a number between 0 and 31); integer Structure for integers (optional sign followed by a natural number); 6
separator The character that separates the instruction arguments; eol The character that concludes the instruction. The actual operations are not defined by macros but are aggregated: for example mathematicatical operations that can be signed or unsigned are described by the same rule (eg. the addition is described as ADD(U|S) not as ADDU and ADDS but inside that rule the two possible symbols are differenciated. Every rule passes a symbol to CUP, with some attributes: the number for the register and the kind of operation for the base instructions.
3.2
The CUP parser
The parser defines a terminal symbol for each possible operation, and the set of arguments (INT, COMA, REG, EOL); then it describes the productions in the following way: • op −→ any possible operation translates from terminal symbol to non-terminal saving the string attribute; • register −→ REG passes the attribute to top level ; • arguments −→ register COMA register COMA INT EOL for register-to-immediate instructions −→ register COMA register COMA register EOL for register-to-register instructions; −→ register COMA INT EOL or register COMA register EOL for conditional branch instructions; −→ register EOL or INT EOL for unconditional branch instructions; −→ EOL for the no-operation instruction. • line −→ op arguments joins the operation with its arguments; • instruction −→ line renames the non terminal symbol and discards the attributes; • list −→ instruction or list instruction allows to parse more than one instruction; 7
• program −→ list renames the non terminal symbol and produces the starting symbol.
3.2.1
Instruction Encoding and Emulation
When line gets produced (operation and arguments are available) the instruction is both encoded and emulated. According to the type of arguments in the production arguments a status variable (flag) is set up and it changes the output encoding and emulation result. In order to perform the encoding, a file named rules encoding is loaded in the program and generates an hash map in which each instruction is associated with its binary code; the string used to determine the type of operation is used as a key to the hash map. When an instruction is encoded the integer values are converted in binary string with the function printBin() and placed in the output string. As for the emulation, the registers and the cache memory are emulated using two integer vectors of size 32 and 2561 respectively and both are initialized with initRegandMem(). Then using the same structure of the encoding section it proceeds to perfom a the detected operation by getting the integer values of the register (used as vector index) and immediate numbers and to print on screen the output result. The emulation is capable of handlint unsigned integer numbers (whose native type is absent in Java) through the use of unsignedOperation() function.
3.2.2
Error Handling
The main focus of error handling is to force the parser to continue reading from the file instead of aborting execution; so after an error the parser must not stop. In order to achieve this behaviour, special terminal symbol error is used and inserted in productions that can cause an unexpected behaviour. As a matter of fact the error symbol is present in the arguments production for correctly handling wrong argument formats like inserting more than three arguments. However error correction is also obtained using once again the flag structure: since a different argument format gives a different value to this variable, it is possible to recognise a wrong argument format just by checking its value: for example in mathematical operations that can handle only two registers and an immediate (flag is 0) or three registers (flag is 1), only those two values are accepted while all the other configurations are rejected.
1
this is due to the actual processor that has such dimesion because of synthesis constraints.
8