Introduction to Assembly Language 2nd Semester SY 2009-2010 Benjie A. Pabroa
What is Assembly Language
"High"-level languages such as BASIC, FORTRAN, Pascal, Lisp, APL, etc. are designed to ease the strain of programming by providing the user with a set of somewhat sophisticated operations that are easily accessed
Assembly as Low-level language The lesson we derive is this: a very low-level language might be very flexible and efficient (in terms of speed and memory use), but might be very difficult to program in since no sophisticated operations are provided and since the programmer must understand in detail the operation of the computer Assembly language is essentially the lowest possible level of language.
Built-in Features the ability to read the values stored at various "memory locations", the ability to write a new value into a memory location, the ability to do integer arithmetic of limited precision (add, subtract, multiply, divide), The ability to do logical operations (or, and, not, xor), and the ability to "jump" to programs stored at various locations in the computer's memory.
Features not included The ability to perform graphics and the ability to access files ability to directly perform floating-point arithmeti
Assembly vs High Level Lang
FORTRAN code to average together the N numbers stored in the array X(I):
INTEGER*2 I,X(N) INTEGER*4 AVG . . .
AVERAGE THE ARRAY X, STORING THE RESULT AS AVG:
AVG=0 DO 10 I=1,N AVG=AVG+X(I) AVG=AVG/N . . .
Assembly vs High Level Lang ; ; ; mov dx,0 ; ; ; mov ax,0 ; ; mov si,offset x ; ; ;
mov cx,n
cx is used as the loop counter. It starts at N and counts down to zero. the dx register stores the two most significant bytes of the running sum use ax to store the least significant bytes ; use the si register to point to the currently accessed element X(I), starting with I=0
Assembly vs High Level Lang addloop: add ax,word ptr [si] ; add X(I) to the two least ; significant bytes of AVG adc dx,0 ; add the "carry" into the two ; most significant bytes of AVG add si,2 ; move si to point to X(I+1) loop addloop ; decrement cx and loop again ; if not zero div n ; divides AVG by N mov avg,ax ; save the result as AVG
Assembly vs High Level Lang
writing it required intimate knowledge of how the variables x, n, and avg were stored in memory.
PC System Architecture
Microprocessor ◦ Reading instructions from the memory and executing them Access memory Do arithmetic and logical operations Performs other services as well
PC System Architecture
1971:
◦ Intel’s 4004 was the first microprocessor—a 4-bit CPU (like the one from CS231) that fit all on one chip.
1978:
◦ The 8086 was one of the earliest 16-bit processors.
1981:
◦ IBM uses the 8088 in their little PC project.
1989:
◦ The 80486 includes a floating-point unit in the same chip as the main processor, and uses RISC-based implementation ideas like pipelining for greatly increased performance.
1997:
◦ The Pentium II is superscalar, supports multiprocessing, and includes special instructions for multimedia applications.
2002:
◦ The Pentium 4 runs at insane clock rates (3.06 GHz), implements extended multimedia instructions and has a large on-chip cache.
PC System Architecture..
Memory ◦ Store instructions(program) or data ◦ It appears as a sequence of locations(or addresses) Each address – stored a byte
◦ Types: ROM Stored byte may only be read by the CPU Cannot be changed
RAM Stored byte may be both read and written(changed) Volatile – all data will be lost after shutdown
Both types are random access
The Process of Assembly
Assembly language is a compiled language ◦ Source-code must first be created with a texteditor program ◦ Then the source-code will be compiled ◦ Assembly language compilers => assemblers
Auxiliary Programs
◦ First: text-editor(source code editor) ◦ Second: assembler
Assembles source code to generate object code in the process.
◦ Third: Linker
Combines object code modules created by assembler
The Process of Assembly.. ◦ Fourth: Loader Built-in to the operating system and is never explicitly executed. Takes the “relocatable” code created by the linker, “loads: it into memory at the lowest available location, then runs it.
◦ Fifth: Debugger Environment for running and testing assembly language programs.
The Process of Assembly.. Linker
Relocatable Code
Source Code
Loader
RAM
Assem bler
Object Code
Other Object Code1 Other Object Code2
DOS and Simple File Operation
DOS ◦ provides the environment in which programs run. ◦ Provides a set of helpful utility functions Must be understood in order to create program in DOS
Making an assembly Source Code
You can use the edit command in DOS or just use the notepad.
AH
AL
BH
BL
CS
CH
CL
DS
DH
DL
SS
SP
ES
BP SI DI
Bus Cont rol Unit
ALU CU
1
Flag Register
2 3 4
Instruction Pointer
CPU Registers
Assembly language ◦ Thought goes into the use of the computer memory and the CPU registers
Register ◦ Like a memory location in that it can store a byte (or work) value. ◦ No address in the memory, it is not part of the computer memory(built into the CPU)
CPU Registers
Importance of Registers in Assembly Prog. ◦ Instructions using registers > operating on values stored at memory locations. ◦ Instructions tend to be shorter (less room to store in memory) ◦ Register-oriented instructions operate faster that memory-oriented instructions Since the computer hardware can access a register much faster than a memory location.
◦
CPU Registers (8086 family) AX BX CX DX SI DI BP
The Accumulator The Pointer Register The Loop Counter Used for multiplication and Division The “Source” string index register The “Destination” String index register Used for passing arguments on the stack
SP The stack pointer IP The Instruction pointer CS The “code segment” register DS The “data segment” register SS The “stack segment” register ES The “Extra segment” FLAG register The flag register
Segment Registers CS
Code Segment
16-bit number that points to the active code-segment
DS
Data Segment
16-bit number that points to the active data-segment
SS
Stack Segment
16-bit number that points to the active stack-segment
ES
Extra Segment
16-bit number that points to the active extra-segment
Pointer Registers IP
Instruction Pointer
16-bit number that points to the offset of the next instruction
SP
Stack Pointer
16-bit number that points to the offset that the stack is using
BP
Base Pointer
used to pass data to and from the stack
General Purpose Registers AX
Accumulator Register mostly used for calculations and for input/output
BX
Base Register
Only register that can be used as an index
CX
Count Register
register used for the loop instruction
DX
Data Register
input/output and used by multiply and divide
Index Registers SI
Source Index
used by string operations as source
DI
Destination Index used by string operations as destination
CPU registers ◦ AX, BX, CX, & DX – more flexible that other Can be used as word registers(16-bit val) Or as a pairs of byte registers (8-bit vals)
◦ A General purpose registers can be “split” AX = AH + AL BX = BH + BL CX = CH + CL DX = DH + DL
◦ Ex: DX = 1234h, then DH = 12h and DL = 34h
Flag Registers Consist of 9 status bits(flags) Flags – because it can be either
◦ SET(1) ◦ NOT SET(0)
Flag Registers Abr. OF
Name Overflow Flag
bit nº 11
Description indicates an overflow when set
DF
Direction Flag
10
used for string operations to check direction
IF
Interrupt Flag
9
if set, interrupt are enabled, else disabled
TF
Trap Flag
8
if set, CPU can work in single step mode
SF
Sign Flag
7
if set, resulting number of calculation is negative
Flag Registers.. Abr.
Name
bit nº
Description
ZF
Zero Flag
6
if set, resulting number of calculation is zero
AF
Auxiliary Carry
4
some sort of second carry flag
PF
Parity Flag
2
indicates even or odd parity
CF
Carry Flag
0
contains the left-most bit after calculations
Test it
You want to see all these register and flags? ◦ ◦ ◦ ◦
go to DOS Type debug type "r" The you’ll see all the registers and some abbreviations for the flags. ◦ Type "q" to quit again.
Memory Segmentation
How DOS uses memory ◦ databus = 16-bit it can move and store 16 bits(1 word = 2 bytes) at a time.
◦ If the processor store 1 word (16-bits) it stores the bytes in reverse order in the memory. 1234h (word) ---> memory 34h (byte) 12h (byte) Memory value: 78h 56h derived value 5678h
Memory Segmentation..
Computer divides it memory into segments ◦ Standard in DOS ◦ Segments are 64KB big and have a number ◦ These numbers are stored in the segment registers (see above). ◦ Three main segments are the code, data and stack segment Overlap each other almost completely Try type d in the debug 4576:0100 -> memory address where 4576 – segment number; 0100 – offset
Memory Segmentation..
Segments overlaps ◦ The address 0000:0010 = 0001:0000 ◦ Therefore, segments starts at paragraph boundaries A paragraph = 16 bytes So a segment starts at an address divisible by 16
◦ 0000:0010 => 0h:10h => 0:16 Memory Location: (0*16)+16 = 0+16 = 16
(linear
address)
◦ 0001:0000 => 1h:0h => 1:0 Memory Location: (1*16)+0 = 16+0 = 16 address)
(linear
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$"
My First Program
.code
main proc mov ax,seg message mov ds,ax
mov ah,09 lea dx,message int 21h
mov ax,4c00h int 21h main endp end main
Names
Identifiers
◦ An identifier is a name you apply to items in your program. the two types of identifiers are "name", which refers to the address of a data item, and "label", which refers to the address of an instruction. The same rules apply to names and labels ◦
Statements
◦ A program is made of a set of statements, there are two types of statements, "instructions" such as MOV and LEA, and "directives" which tell the assembler to perform a specific action, like ".model small“ or “.code”
Statements
Here's the general format of a statement:
indentifier - operation - operand(s) - comment
◦ ◦ The identifier is the name as explained above. ◦ The operation is an instruction like MOV. ◦ The operands provide information for the Operation to act on. ◦ Like MOV (operation) AX,BX (operands).
◦ The comment is a line of text you can add as a comment, everything the assembler sees after a ";" is ignored.
Statements
Example ◦ MOV AX,BX ;this is a MOV instruction
How to Assemble
The source code can only be assembled by an assembler or and the linker. ◦ A86 ◦ MASM ◦ TASM – we will use this one
Install TASM
Then use the tasm.exe and tlink.exe
How to Assemble • The Assemble – To assemble Type the ff. on the command prompt: • cd c:\tasm\bin • tasm – tasm c:\first.asm
• tlink
– tlink c:\tasm\bin\first.obj or – tlink first.obj
– To run call the .exe on the command prompt: • Example in our program(First.asm)
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main
Dissecting Code
.model small ◦ Lines that start with a "." are used to provide the assembler with information. ◦ The word(s) behind it say what kind of info. In this case it just tells the assembler that the program is small and doesn't need a lot of memory. I'll get back on this later.
.stack ◦ This one tells the assembler that the "stack" segment starts here. The stack is used to store temporary data.
◦
.data ◦ indicates that the data segment starts here and that the stack segment ends there.
.model small .stack .data
message db "Hello world, I'm learning Assembly !!!", "$"
.code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h
main endp end main
Dissecting Code..
.code
◦ indicates that the code segment starts there and the data segment ends there.
◦
main proc ◦ ◦ ◦ ◦
Code must be in procedures, just like in C or any other language. This indicates a procedure called main starts here. endp states that the procedure is finished. endmain main : tells the assembler that the program is finished.
◦ It also tells the assembler where to start in the program. At the procedure called main in this case.
◦
message db "xxxx"
◦ DB means Define Byte and so it does. ◦ In the data-segment it defines a couple of bytes.
◦ These bytes contain the information between the brackets. ◦ "Message" is a name to indentify this byte-string. ◦ It's called an "indentifier".
Memory space for variables ◦ ◦ ◦ ◦
DB (Byte – 8 bit ) DW (Word – 16 bit) DD (Doubleword – 32 bit) Example: foo db 27 ;by default all numbers are decimal bar dw 3e1h ; appending an "h" means hexadecimal real_fat_rat dd ? ; "?" means "don't care about the value“
◦ Variable name Address can’t be changed Value can be changed
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc
mov ax, seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main
Dissecting Code..
mov ax, seg message ◦ AX is a register. You use registers all the time, so that's why you had to know about them before.
◦ MOV is an instruction that moves data. It can have a few "operands“ Here the operands are AX and seg message.
◦ seg message can be seen as a number. It's the number of the segment "message“ in (The data-segment) We have to know this number, so we can load the DS register with it. Else we can't get to the bit-string in memory. We need to know WHERE the bit-string is located in memory.
◦ The number is loaded in the AX register. MOV always moves data to the operand left of the comma and from the operand right of the comma.
The MOV Instruction
Syntax:
◦ MOV destination, source
Allows you to move data into and out the registers ◦ Destination
either registers or mem. Loc.
◦ Source
can be either registers, mem. Loc. or numeric value
Memory-to-memory transfer NOT ALLOWED
The MOV Instruction
Codes we do earlier
foo db 27 ;by default all numbers are decimal bar dw 3e1h ; appending an "h" means hexadecimal real_fat_rat dd ? ; "?" means "don't care about the value“
mov ax,bar otice the size of the source and destination (must match in reg-reg, mov dl,foo mem-reg, reg-mem mov bx,ax Transfers)
mov bl,ch
mov bar,si
mov foo,dh
mov ax,5 onstant must consistent with the destination
mov al,5 mov bar,5 mov foo,5
; load the word-size register ax with ; the word value stored at location bar. ; load the byte-size register dl with ; the byte value stored at location foo. ; load the word-size register bx with ; the byte value in ax. ; load the byte-size register bl with ; the byte value in ch. ; store the value in the word-size ; register si at the memory location ; labelled "bar". ; store the byte value in the register ; dh at memory location foo. ; store the word 5 in the ax register. ; store the byte 5 in the al register. ; store the word 5 at location bar. ; store the byte 5 at location foo.
Illegal Move Statement ◦ MOV AL, 3172 ◦ MOV foo, 3172
Why the code above are Illegal?
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc mov ax, seg message
mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main
Dissecting Code.. ◦
mov ds,ax
◦ Here it moves the number in the AX register (the number of the data segment) into the DS register. ◦ We have to load this DS register this way (with two instructions) ◦ Just typing: "mov ds,segment message" isn't possible.
mov ah, 09
◦ MOV again. This time it load the AH register with the constant value nine.
lea dx, message
◦ LEA - Load Effective Address.
This instructions stores the offset within the datasegment of the bit-string message into the DX register. This offset is the second thing we need to know, when we want to know where "message" is in the memory. So now we have DS:DX.
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message
int 21h mov ax,4c00h int 21h main endp end main
Dissecting Code..
int 21h
◦ This instruction causes an Interrupt. ◦ The processor calls a routine somewhere in memory. ◦ 21h tells the processor what kind of routine, in this case a DOS routine. ◦ For now assume that INT just calls a procedure from DOS. ◦ The procedure looks at the AH register to find out what it has to do. ◦ In this example the value 9 in the AH register indicates that the procedure should write a bit-string to the screen.
mov ax, 4c00h
◦ Load the Ax register with the constant value 4c00h
int 21h
◦ this time the AH register contains the value 4ch (AX=4c00h) and to the DOS procedure that means "exit program". ◦ The value of AL is used as an "exit-code" 00h means "No error"
After running: ◦ Go to DOS and type “FIRST.exe” to debug. ◦ Type d -> display some addresses ◦ Type u -> you will see something 0F77:0000 B8790F 0F77:0003 8ED8 0F77:0005 B409
MOV AX,0F79 MOV DS,AX MOV AH,09
Segm ent Num ber & Offset Machine Code inst ruct ion
0F77:0000 B8790F 0F77:0003 8ED8 0F77:0005 B409 0F77:0000
B8790F
MOV AX,0F79 MOV DS,AX MOV AH,09
MOV
AX,0F79
originally: mov ax, seg message B8 ->mov ax 790F ->number It means that data is store in the segment with number 0F79
The other instruction lea dx,message turned into mov dx,0. ◦ So that means that the offset of the bit-string is 0 --> 0F79:0000. ◦ Try to type d 0F79:0000 ◦ ◦ Calculating other address We will subtract 2 segments from 0F79 = 0F77 2 segments = 32 bit (0002:0000) The other address is 0F77:0020
◦
The Stack The stack is a place where data is temporarily stored The SS and SP registers point to that place like this: SS:SP
◦ So the SS register is the segment and the SP register contains the offset
There are a few instructions that make use of the stack ◦ PUSH - Push a value on the stack ◦ POP - retrieve that value from the stack
The Stack MOV AX,1234H PUSH AX MOV AH,09 INT 21H POP AX
◦ The final value of AX will be 1234h. First we load 1234h into AX, then we push that value to the stack. We now store 9 in AH, so AX will be 0934h and execute an INT. Then we pop the AX register. We retrieve the pushed value from the stack.
So AX contains 1234h again
The Stack MOV AX, 1234H MOV BX, 5678H PUSH AX POP BX
◦ We pushed the AX to the stack ◦ and we popped that value in BX. ◦ ◦ What is the final value of AX and BX?
The Stack It is easy done by the instruction .stack that will create a stack of 1024 bytes. The stack uses a LIFO system (Last In First Out)
The Stack MOV AX,1234H MOV BX,5678H PUSH AX PUSH BX POP AX POP BX
First the value 1234h was pushed after that the value 5678h was pushed to the stack. According to LIFO 5678h comes of first, so AX will pop that value and BX will pop the next. What is the value of AX and BX?
How does the stack look in memory? it "grows" downwards in memory. When you push a word (2 bytes) for example, the word will be stored at SS:SP and SP will be decreased to times. So in the beginning SP points to the top of the stack and (if you don't pay attention) it can grow so big downwards in memory that it overwrites the source code. Major system crash is the result.
Congatulation!!
If you fully understand this stuff (registers, flags, segments, stack, names, etc.) you may, from now on, call yourself a
"Level 0 Assembly Coder"