ARM Architecture and Instruction Set
ARM Microprocessor Core z
z
Ingo Sander
[email protected]
ARM is a family of RISC architectures, which share the same design principles and a common instruction set ARM does not manufacture the CPU itself, but licenses it to other manufacturers to integrate them into their own system
August 31, 2004
ARM Microprocessor Core z z
z
2B1447 Embedded Systems
The ARM Core as part of a system-on-chip
I/O Units
ARM Core Mem
DSP
ASIC
2B1447 Embedded Systems
2
ARM assembly language
The ARM core is widely used in mobile phones, handheld organizers, and many other portable consumer devices Depending on the application ARM processors are available with e.g. z Different Cache Sizes z Different Bus Widths z Varying Clock Speeds Different Versions use different architectures, e.g. z ARM 7: von Neumann z ARM 9: Harvard z The assembly programs are not affected by the underlying architecture
August 31, 2004
z
z
The assembly language reflects the instruction set (almost one to one) z z z z
z
One instruction per line Labels provide names for addresses (usually in first column) Instructions often start in later columns. Columns run to end of line
Example: Loop Cont Label
3
August 31, 2004
MOV r1, #100000 SUB r1, r1, #1 BGE Loop …
; a wait loop
Instructions
Comment 2B1447 Embedded Systems
4
Example Von Neumann Architecture
Databus
Instruct. Register Status Register PC
Read/Write
ALU
... 0x100: Register 2 0x104: 0x108: Register 3 Databus 0x10C: IR LDR R1, =0x400 ... ... LDR R1, =0x400 Status Register 0x400: 0x404: 0x100 PC Read/Write 0x408: R ALU ...
2B1447 Embedded Systems
5
Example Von Neumann Architecture z
August 31, 2004
z z
Register 2 Register 3
Databus
IR LDR R1, =0x400 Status Register PC
0x100 ALU
2 Read/Write
R
... 0x100: 0x104: 0x108: 0x10C: ... ... 0x400: 0x404: 0x408: ...
Addressbus
2B1447 Embedded Systems
6
... 0x100: Register 2 0x104: 0x108: Register 3 Databus 0x10C: IR LDR R2, R1, =0x404 =0x400 ... ... LDR R2, =0x404 Status Register 0x400: 0x404: 0x104 PC Read/Write 0x408: R ALU ... Register 1
Single Memory
Simple CPU August 31, 2004
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 ? ...
Instructions
0x400
Single Memory 2B1447 Embedded Systems
Increment Program Counter Fetch Instruction
Data
Addressbus
2
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 ? ...
Example Von Neumann Architecture
Execute Instruction
Register 1
0x100
Simple CPU
Single Memory
Simple CPU August 31, 2004
Addressbus
Register 1
Instructions
Register 3
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 ? ...
2
0x104
Single Memory
Simple CPU 7
August 31, 2004
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 ? ...
Instructions
Register 2
... 0x100: 0x104: 0x108: 0x10C: ... ... 0x400: 0x404: 0x408: ...
z
Start Address: 0x100 Fetch Instruction
Data
Addressbus Register 1
z
Instructions
z
Consists of CPU and one single memory Memory holds instructions and data
Data
z
Data
Von Neumann Architecture
2B1447 Embedded Systems
8
Example Von Neumann Architecture
Example Von Neumann Architecture
• Execute Instruction
z
Status Register PC
0x104
3 Read/Write
R
ALU
... 0x100: Register 2 0x104: 0x108: Register 3 Databus 0x10C: LDR R2, IR ADD R3, =0x404 R2, R1 ... ... ADD R3, R2, R1 Status Register 0x400: 0x404: 0x108 PC Read/Write 0x408: R ALU ...
2B1447 Embedded Systems
9
Example Von Neumann Architecture z
August 31, 2004
z z
Register 3 IR
Databus
LDR R2, ADD R3, =0x404 R2, R1
Status Register PC
0x108 ALU
5
Read/Write
2B1447 Embedded Systems
Single Memory 2B1447 Embedded Systems
Addressbus
10
... 0x100: Register 2 0x104: 0x108: Register 3 Databus 0x10C: IR STR LDR R3, R2, =0x408 =0x404 ... ... STR R3, =0x408 Status Register 0x400: 0x404: 0x10C PC Read/Write 0x408: R ALU ... Register 1
Single Memory
Simple CPU August 31, 2004
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 ? ...
Instructions
Register 2
... 0x100: 0x104: 0x108: 0x10C: ... ... 0x400: 0x404: 0x408: ...
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 ? ...
Increment Program Counter Fetch Instruction
Data
Addressbus
2 3
0x108
Example Von Neumann Architecture
Execute Instruction
Register 1
2 3
Simple CPU
Single Memory
Simple CPU August 31, 2004
Addressbus
Register 1
Instructions
IR LDR R2, =0x404
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 ? ...
2 3 5
0x10C
Single Memory
Simple CPU 11
August 31, 2004
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 ? ...
Instructions
Databus
... 0x100: 0x104: 0x108: 0x10C: ... ... 0x400: 0x404: 0x408: ...
Data
Register 3
0x404
Instructions
Register 2
2 3
Data
Addressbus Register 1
Data
z
Increment Program Counter Fetch Instruction
2B1447 Embedded Systems
12
Example Von Neumann Architecture
Addressbus Register 1 Register 2 Register 3 IR
2 3 5
Databus
LDR R3, STR R2, =0x408 =0x404
Status Register PC
0x408
0x10C ALU
5 Read/Write
W
... 0x100: 0x104: 0x108: 0x10C: ... ... 0x400: 0x404: 0x408: ...
z ... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ... ... 2 3 5? ...
z
z
2B1447 Embedded Systems
13
Consists of CPU and two single memories In the original Harvard, one memory holds instructions and the other data Register 1 Databus 1
Register 3 Instruct. Register Status Register PC
Addressbus 2 Databus 2
ALU
... 0x100: 0x104: 0x108: 0x10C: ...
... LDR R1, =0x400 LDR R2, =0x404 ADD R3, R2, R1 STR R3, =0x408 ...
... 0x400: 0x404: 0x408: ...
2 3 ? ...
z Instruction Memory
Addressbus 1 Register 2
CPU registers help out: program counter (PC), instruction register (IR), generalpurpose registers, etc.
August 31, 2004
z
z
2B1447 Embedded Systems
14
Harvard allows two simultaneous memory fetches. Harvard can’t use self-modifying code Most DSPs use Harvard architecture for streaming data: z z
Data Memory
z
Separate CPU and memory distinguishes programmable computer.
Comparison von Neumann and Harvard
Harvard Architecture z
Memory holds data, instructions. Central processing unit (CPU) fetches instructions from memory.
Single Memory
Simple CPU August 31, 2004
z
Instructions
z
Increment Program Counter Fetch Instruction
Data
z
The von Neumann architecture
z
greater memory bandwidth more predictable bandwidth
Additional hardware, since two address and data busses are needed
Simple CPU August 31, 2004
2B1447 Embedded Systems
15
August 31, 2004
2B1447 Embedded Systems
16
Programming Model: Registers available in User Mode z
31
The ARM processor has 17 active registers in user mode
r0
r8
r1
r9
16 data registers (r0-r15) 1 processor status registers
r2
r10
r3
r11
r4
r12
r5
r13 sp
r6
r14 lr
r7
r15 cp
z z
z
Generic Program Status Register
The registers r13-r15 have a special task z z z
r13 is the stack pointer (sp) r14 is the link register (lr) r15 is the program counter (pc)
cpsr spsr
August 31, 2004
Negative Zero
z
z z z
2B1447 Embedded Systems
17
bit 0
bit 31
byte 0 byte 1 byte 2 byte 3
little-endian
big-endian 2B1447 Embedded Systems
Extension
Overflow
Interrupt
7
Control
I
F T
0
...
Thumb State Fast Interrupt
Carry
The cpsr (Current Program Status Register) is used to monitor and control internal operations
August 31, 2004
z z
z
2B1447 Embedded Systems
18
The ARM has a Load-Store architecture Data operands must be loaded into registers before they can be processed by an ALU Data is moved between registers by means of Move instructions MOV r1, r2 MOV r3, #1
bit 0
byte 3 byte 2 byte 1 byte 0
August 31, 2004
Status
Data Movement
The ARM uses 32-bit addresses A Word is 32 bits (4 bytes) long An Address refers to a byte (not a word) The ARM processor can be configured to use a little-endian or big-endian memory system z Little-endian: lowest-order byte resides in the low-order bits of a word z Big-endian: lower-order byte resides in highest bits of the word bit 31
...
not available in user mode!
Addresses and Endianness z
Flags
N Z C V
z
19
; r1 = r2 ; r3 = 1
Data is moved between memories by Load and Store Instructions
August 31, 2004
2B1447 Embedded Systems
20
Single Register-Memory Transfers z
z
z
z
Example for Load Before: r0 = 0x00000000 r1 = 0x00070000 mem32[0x00070000] = 0x00000005
Data operands must be loaded into registers before they can be processed by an ALU The Load and Store instructions can be combined with different addressing modes The basic Load instruction is LDR (load word into register), but there are variations that work on byte (LDRB), halfword (LDRH) and signed bytes (LDRSB) The basic Store instruction is STR (save word from a register), variations are STRB och STRH
August 31, 2004
2B1447 Embedded Systems
LDR r0, [r1] After: r0 = 0x00000005 r1 = 0x00070000 21
August 31, 2004
2B1447 Embedded Systems
Useful addressing modes: Preindexing
Useful addressing modes: Preindexing with Writeback
Before: r0 = 0x00000000 r1 = 0x00007000 mem32[0x00007000] = 0x00001000 mem32[0x00007004] = 0x00002000
Before: r0 = 0x00000000 r1 = 0x00007000 mem32[0x00007000] = 0x00001000 mem32[0x00007004] = 0x00002000
Preindexing: LDR r0, [r1, #4]
Preindexing with Writeback: LDR r0, [r1, #4]! After: r0 = 0x00002000 r1 = 0x00007004
After: r0 = 0x00002000 r1 = 0x00007000 August 31, 2004
22
2B1447 Embedded Systems
23
August 31, 2004
2B1447 Embedded Systems
24
Useful addressing modes: Postindexing
Multiple Register-Memory Transfers
Before: r0 = 0x00000000 r1 = 0x00007000 mem32[0x00007000] = 0x00001000 mem32[0x00007004] = 0x00002000
z
Load-store multiple instructions are used to transfer multiple registers between memory and processor in a single instruction z z
Postindexing: LDR r0, [r1], #4
z
After: r0 = 0x00001000 r1 = 0x00007004
z
August 31, 2004
2B1447 Embedded Systems
25
Example Load Store Multiple Instructions Before: r0 = 0x00000005 r1 = 0x00000006 r2 = 0x00000007 r3 = 0x00007000
z
mem32[0x00007000] = 0x00000005 mem32[0x00007004] = 0x00000006 mem32[0x00007008] = 0x00000007 r3 = 0x0000700C
Such pairs of Load-Store Multiple Instructions can be used to temporarily store registers on the memory.
2B1447 Embedded Systems
2B1447 Embedded Systems
26
The ARM architecture uses load-store multiple instructions to pop and push data from and to the stack Here you have to decide, if the stack is ascending (A) or descending (D) and you use a full (F) or empty (E) stack. Full Stack: Stack Pointer points at last used address Empty Stack: Stack Pointer points at first empty address
z
After (1):
August 31, 2004
August 31, 2004
z
After (2): r0 = 0x00000005 r1 = 0x00000006 r2 = 0x00000007 r3 = 0x00007000 z
There are four addressing modes: IA (increment after), IB (increment before), DA (decrement after), DB (decrement before) Be careful, which addressing mode you select, otherwise you may produce self-modifying code!
Stack Operations
LDMDB r3!, {r0-r2}
STMIA r3!, {r0-r2} MOV r0,#1 MOV r1,#2 MOV r2,#3
LDM (Load Multiple Registers) STM (Save Multiple Registers)
z
27
STMFA sp!, {r5,r7} pushes registers r5 and r7 on an ascending stack and points after the instruction on the memory location where r7 is stored! STMFA sp!, {r5,r7} is equivalent to STMIB r13, {r5,r7}
August 31, 2004
2B1447 Embedded Systems
28
Loading Constants z
Data Processing Instructions
There are two pseudo-instructions to load constants
z
LDR r1, =0x7000 ; loads r1 with constant 0x7000 ADR r2, label ; loads r2 with address for label
z
Data processing instructions manipulate data within registers (Move, Arithmetic, Logical, Comparison, Multiply) If the S suffix is used the CPSR flags N, Z, C, V are updated z z
August 31, 2004
2B1447 Embedded Systems
29
Data Processing Instructions and CPSR
August 31, 2004
z z
z z z
2B1447 Embedded Systems
2B1447 Embedded Systems
30
Data Processing Instructions
MOVS r1, #1 ⇒ NZCV = 0000 MOVS r2, #-1 ⇒ NZCV = 1000 ADD r3, r2, r1 ⇒ NZCV = 1000 ADDS r3, r2, r1 ⇒ NZCV = 0110
August 31, 2004
ADD r1, r2, r3 does not update CPSR ADDS r1, r2, r3 updates the CPSR
31
Move: MOV, MVN Arithmetic: ADD, ADC, SUB, SBC, RSB, RSC Logical: AND, ORR, EOR, BIC Comparison: CMP, CMN, TST, TEQ Multiply: MUL, MLA, SMLAL, SMULL, UMULL, SMLAL, UMLAL
August 31, 2004
2B1447 Embedded Systems
32
Formats for data processing instructions z
Basic format
z
SUB r3, r2, r1 z
; r3 = r2 – r1
Immediate Operand SUB r3, r2, #3
z
The Barrel Shifter
; r3 = r2 - 3
z
Preprocessing (Barrel-Shifter) SUB r3, r2, r1, LSL #1 ; r3 = r2 – (r1 * 2)
z z
The barrel shifter allows an initial shift operation before it enters the ALU Shift Operations: LSL, LSR, ASR, ROR, RRX Example: MOV r3, r4, LSL #3 ;r3 = 8 * r4
Rn
Rm
Barrel Shifter Result N Arithmetic Logic Unit
Rd August 31, 2004
2B1447 Embedded Systems
33
Branch Instructions z
z z
z
BEQ label
z
; Branch to label, if Z = 1
2B1447 Embedded Systems
34
The BL (Branch and Link) instruction can be used for subroutines, since it writes the returnaddress to the link register
BL subroutine ... subroutine ... ; code for subroutine MOV pc, lr ; return by moving lr to pc
The address label is stored in the instruction as a PC-relative offset and must be within 32MB of the branch instruction
August 31, 2004
2B1447 Embedded Systems
Subroutines
The branch instruction is used to change the flow of execution (if-then-else, for-loop, while-loop) Branch Instructions: B, BL, BX, BLX Branches are often used with conditions (EQ, NE, CS, CC, MI, PL, VS, HI, LS, GE, LT, GT, LE) z
August 31, 2004
35
August 31, 2004
2B1447 Embedded Systems
36
Conditional Execution z
Not only branch instructions can be used with conditions z
z
Summary
ADDEQ r4, r5, r6 is only executed if Z = 1
z z z
Conditional Execution helps to design shorter programs that do not use so much memory
z
z
August 31, 2004
2B1447 Embedded Systems
37
ARM is a family of microprocessor cores Load/store architecture Most instructions are RISCy, operate in single cycle Some multi-register operations take longer
All instructions can be executed conditionally
August 31, 2004
2B1447 Embedded Systems
38