Assembler
CS 230 이준원
1
System Software •
•
components – translator • assembler • compiler • interpreter – system manager • operating system – other utilities • loader • linker • DBMS, editor, debugger, ... purpose of this course – understand how to build system software – understand how these components work
2
Issues in System Software •
not many in this area – mature area
•
advanced architectures complicates system software – superscalar CPU – memory model – multiprocessor new applications – embedded systems – mobile/ubiquitous computing
•
3
Assembler Overview •
• •
functions – translate programs written in assembly language to machine code • mnemonic code to machine code • symbols to addresses – handles • constants • literals • addressing 32 bit constant or address 32 bit offset
4
Assembler Overview (cont’d) •
•
pass 1: loop until the end of the program 1. read in a line of assembly code 2. assign an address to this line • increment N (word addressing or byte addressing) 3. save address values assigned to labels • in symbol tables 4. process assembler directives • constant declaration • space reservation pass2: same loop 1. read in a line of code 2. translate op code using op code table 3. change labels to address using the symbol table 4. process assembler directives 5. produce object program 5
Data Structures for Assembler add $t0, $t1, $t2
•
000000 01001 01010 01000 00000 100000
op code table – looked up for the translation of mnemonic code • key: mnemonic code • result: bits – hashing is usually used • once prepared, the table is not changed • efficient lookup is desired • since mnemonic code is predefined, the hashing function can be tuned a priori – the table may have the instruction format and length • to decide where to put op code bits, operands bits, offset bits • for variable instruction size • used to calculate the address
6
Data Structures for Assembler (cont’d) .text .globl main
•
symbol table – stored and looked up to assign address to labels • efficient insertion and retrieval is needed • deletion does not occur – difficulties in hashing • non random keys – problem • the size varies widely
main: la lw lw
$t0, array $t1, count $t2, ($t0)
loop: lw $t3, 4($t0) ble $t3, $t2, loop2 move $t2, $t3 loop2: add $t1, $t1, -1 add $t0, $t0, 4 bnez $t1, loop … …. .data array: count: string1:
.word 3, 5, 5, 1, 6, 7, ….. .word 15 .asciiz “\nmax = “
7
Symbol Table Construction .text .globl main main: la lw lw
$t0, array $t1, count $t2, ($t0)
loop: lw $t3, 4($t0) ble $t3, $t2, loop2 move $t2, $t3 loop2: add $t1, $t1, -1 add $t0, $t0, 4 bnez $t1, loop … …. .data array: count: string1: bad:
symbol name
value
main
0
loop
12
loop2
24
… array
408
count
468
string1
472
bad
478
.word 3, 5, 5, 1, 6, 7, ….. .word 15 .asciiz “\nmax = “ .word 7 8
Assembler Algorithm: pass1
begin if starting address is given LOCCTR = starting address; else LOCCTR = 0; while OPCODE != END do ;; or EOF begin read a line from the code if there is a label if this label is in SYMTAB, then error else insert (label, LOCCTR) into SYMTAB search OPTAB for the op code if found LOCCTR += N ;; N is the length of this instruction (4 for MIPS) else if this is an assembly directive update LOCCTR as directed else error write line to intermediate file end program size = LOCCTR - starting address; end
9
Assembler Algorithm: pass2
begin read a line; if op code = START then ;; .globl xxx for MIPS write header record; while op code != END do ;; or EOF begin search OPTAB for the op code; if found if the operand is a symbol then replace it with an address using SYMTAB; assemble the object code; else if is a defined directive add $t0, $t1, $t2 => 000000 01001 01010 01000 00000 100000 convert it to object code; add object code to the text; read next line; end write End record to the text; output text; end
10
Program Relocation 0 . . jump to 1004 .
1076
. . jump to 1004 .
1004 5000
6076
program is loaded at 0
•
. . jump to 1004 .
program is loaded at 5000
motivations for relocation – a program may consists of several pieces of codes that are assembled independently – when a program is assembled, it is impossible to know the exact location where the program starts
11
Program Relocation (cont’d) •
distances from the origin of a program do not change – make the address relative to the origin – provides loader with information about • which address needs fixing • length of address field – the loader change those addresses as • distance + start address of a program – only absolute addresses need to be changed
12
Literals •
•
usage – encoded as an operand (similar to the immediate in MIPS, but different) • load $7, =X’0A7F’ – simple way to declare a constant – assembler does • declare a constant with a label • use the label to use the value comparison with immediate – literal is an assembler directive • immediate is a machine recognizable data – full word can be used for literals • immediate: full word – (opcode, registers) – values are obtained from data memory - slow • immediate data is within the instruction itself
13
Literals (cont’d) •
•
literal pool – assembler collects all the literals into one or more literal pools – default location is at the end of the program • for better code reading – programmer can declare a place (LTORG) • to use PC-relative addressing • to keep data close to instruction optimization – make one literal for the same value • compare character string or value? – x’454F46’ = c’EOF’
•
• value comparison needs evaluation literal table – name(label), operand value, operand length, address in the table – name and value are all used as a key 14
Literal Handling Algorithm pass 1 at a recognition of a literal search LITTAB by name if found but different value, error else if the same value, no action else if not found insert a new literal (no address yet) if the code is LTORG or END allocate each literal assigning an address pass 2 replace each literal with the address in the LITTAB if these addresses are absolute, prepare modification for relocation
15
Symbol Defining Statement •
•
MAXLEN EQU 4096 – makes program structure better – easier to modify a single location – easier to remember than numbers – registers can be given meaningful names – (maxlen = 4096) in MIPS assembler – searches SYMTAB and replace the symbol with the value in the table – resulting object code is the same as using the value instead of symbol – remember that with 2 passes there is restriction X Y
EQU Y EQU 100
• X cannot be defined in pass 1
16
Expressions BUFFER: .space 4096 ; reserve 4096 bytes here BUFEND: ; set current location to BUFFEND (MAXLEN = BUFEND – BUFFER) ; calculate the size of the buffer
• •
allows simple arithmetic operations in symbol definition operands may have relative values for relocation – relative values should be modified by the loader later • we need to know which is relative – symbol table needs a type field to discern absolute symbols from relative symbols
17
Expression Rules •
•
basic – constant is absolute – address is relative using expressions – expression with absolute arguments is absolute – expression that has multiplication and division is absolute – relative_1 - relative_2 is absolute • dependencies on starting address are canceled out – all the other expressions having relative terms are neither relative nor absolute (error?) • constant - relative • relative_1 + relative_2 • 3 x relative_1
18
Program Blocks source
object code
block 0 block 1 block 2
block 0 assembled
block 0
block 1
block 1 block 2
block 2
19
Program Blocks (cont’d) •
•
motivation – programmer’s view may be different from machine’s view • affects only efficiency not functionality – addressing can be simplified • large data area can be moved to the end of code while source code places it close to the instructions that use this data data structure and algorithm – block table (name, block number, address, length) – pass 1 • maintain separate LOCCTR for each block – each label is assigned address relative to the start of the block that contains it
• SYMTAB stores block number for each symbol • store starting address of each block in block table – pass 2 • assign address to each symbol by adding the relative address to the block starting address 20
Control Sections •
•
control section is a part of program that can be assembled independent of other parts – a large problem can be divided into many control sections – each control section can be developed independently – each control section can be modified independently symbols defined in other control sections – called external – assembler prepares those symbols – loader & linker resolves the value of external symbols
21
Control Sections (cont’d) •
•
a table prepared by assembler – define record • name of symbol defined in this control section • relative address of the symbol – refer record • name of external symbols – modification record • starting address of field to be modified • length of this field • name of external symbol loader – for every external symbol • find the relative address from the define record • add the starting address of the control section where the symbol is defined • modify the field 22
One-Pass Assembler • •
•
problem – forward reference: reference to symbols that are not defined yet why do we need one-pass assembler? – fast • useful for program development and testing • university computing environment load-and-go assembler – writes the object code on memory not on disk file – since it is on memory it is easy to modify a part of object code
23
One-Pass Assembler (cont’d) •
•
•
one-pass assembler for load-and-go – stores undefined symbols in the SYMTAB with the address of the field that references this symbol – when the symbol is defined later, look up the SYMTAB and modify the field with correct address • there may be many places to be modified what if object code is written on disk? – bring back the text to memory • efficiency of one-pass assembler cannot be justified – make loader to modify the address at loading time • modification record again optimization – require all the data declaration be placed at the beginning of the program • reduces reference resolution
24
Multi-Pass Assembler •
support forwarding reference even though it is bad for program readability
1.(A = B/2) 2.(B = C-D) .... 8. C ..... 9. D ..…
at 1, store in a table two tuples (A, 1, B/2, 0) 1: one symbol is missing 0: no other symbol depends on A (B, *, , &LB) *: don’t know how many symbols missing yet LB: list of symbols that depend on B (now, there is only A in this list) at 2, insert (C,*, ,&LC), (D,*, ,&LD) LC and LD contains only B modify (B,*, ,&LB) as (B,2,C-D,&LB) after 8 from LC, B is found change 2 to 1 in the B tuple meaning one symbol remains to be defined after 9 from LD, B is found now evaluate B with defined C, D values since B is done from LB, A is found now A can be evaluated 25