Arm

  • Uploaded by: api-3765806
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Arm as PDF for free.

More details

  • Words: 1,802
  • Pages: 39
ARM

1

3. What Is ARM?  Advanced RISC Machine  First RISC microprocessor for commercial use  Market-leader for low-power and cost-sensitive embedded applications  Architectural simplicity which allows  Very small implementations which result in  Very low power consumption

2

3 ARM Architecture  Typical RISC architecture: • Large uniform register file • Load/store architecture • Simple addressing modes • Uniform and fixed-length instruction fields

 Enhancements: • Each instruction controls the ALU and shifter • Auto-increment and auto-decrement addressing modes • Multiple Load/Store • Conditional execution

3

3 Pipeline Organization  Increases speed – most instructions executed in single cycle  Versions: 

3-stage (ARM7TDMI and earlier)



5-stage (ARMS, ARM9TDMI)



6-stage (ARM10TDMI)

4

3 Pipeline Organization  3-stage pipeline: Fetch – Decode - Execute  Three-cycle latency, one instruction per cycle throughput

i n s t r u c t i o n

i

Fetch i+1

Decode

Execute

Fetch

Decode

Execute

i+2

Fetch

Decode

Execute cycle

t

t+1

t+2

t+3

t+4

5

3 Pipeline Organization  Pipeline flushed and refilled on branch, causing execution to slow down  Special features in instruction set eliminate small jumps in code to obtain the best flow through pipeline

6

3 Operating Modes  Seven operating modes: 

User



Privileged: • System (version 4 and above) • FIQ • IRQ • Abort • Undefined

exception modes

• Supervisor

7

3 Exceptions Priorit y

IV Address

Supervisor

1

0x00000000

Undefined instruction

Undefined

6

0x00000004

Software interrupt

Supervisor

6

0x00000008

Prefetch Abort

Abort

5

0x0000000C

Data Abort

Abort

2

0x00000010

Interrupt

IRQ

4

0x00000018

Exception

Mode

Reset

Fast interrupt FIQ 3 0x0000001C Table 1 - Exception types, sorted by Interrupt Vector addresses

8

3 ARM Registers  Current Program Status Register (CPSR)  Saved Program Status Register (SPSR)  On exception, entering mod mode: 

(PC + 4)

LR



CPSR



PC  IV address



R13, R14 replaced by R13_mod, R14_mod



In case of FIQ mode R7 – R12 also replaced

SPSR_mod

9

3 ARM Registers FIQ

Supervisor

Abort

IRQ

Undefined

R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 (PC)

R0 R1 R2 R3 R4 R5 R6 R7_fiq R8_fiq R9_fiq R10_fiq R11_fiq R12_fiq R13_fiq R14_fiq R15 (PC)

R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13_svc R14_svc R15 (PC)

R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13_abt R14_abt R15 (PC)

R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13_irq R14_irq R15 (PC)

R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13_und R14_und R15 (PC)

CPSR

CPSR SPSR_fiq

CPSR SPSR_svc

CPSR SPSR_abt

CPSR SPSR_irq

CPSR SPSR_und

System & User

10

3 Instruction Set  Two instruction sets: 

ARM • Standard 32-bit instruction set



THUMB • 16-bit compressed form • Code density better than most CISC • Dynamic decompression in pipeline

11

3 ARM Instruction Set  Conditional execution: 

Each data processing instruction prefixed by condition code



Result – smooth flow of instructions through pipeline



16 condition codes:

EQ

equal

MI

negative

HI

unsigned higher

GT

signed greater than

NE

not equal

PL

positive or zero

LS

unsigned lower or same

LE

signed less than or equal

CS

unsigned higher or same

VS

overflow

GE

signed greater than or equal

AL

always

CC

unsigned lower

VC

no overflow

LT

signed less than

NV

special purpose

12

3 ARM Instruction Set ARM instruction set

Data processing instructions

Block transfer instructions

Data transfer instructions

Branching instructions Multiply instructions Software interrupt instructions

13

3 Data Processing Instructions  Arithmetic and logical operations  3-address format: 

Two 32-bit operands (op1 is register, op2 is register or immediate)



32-bit result placed in a register

 Barrel shifter for op2 allows full 32-bit shift within instruction cycle

14

3 Data Processing Instructions  Arithmetic operations: 

ADD, ADDC, SUB, SUBC, RSB, RSC

 Bit-wise logical operations: 

AND, EOR, ORR, BIC

 Register movement operations: 

MOV, MVN

 Comparison operations: 

TST, TEQ, CMP, CMN

15

3 Data Processing Instructions e.g.:

if (z==1) R1=R2+(R3*4) compiles to EQADDS R1,R2,R3, LSL #2 ( SINGLE INSTRUCTION ! )

16

3 Data Transfer Instructions  Load/store instructions  Used to move signed and unsigned Word, Half Word and Byte to and from registers  Can be used to load PC (if target address is beyond branch instruction range)

LDR

Load Word

STR

Store Word

LDRH

Load Half Word

STRH

Store Half Word

LDRSH

Load Signed Half Word

STRSH

Store Signed Half Word

LDRB

Load Byte

STRB

Store Byte

LDRSB

Load Signed Byte

STRSB

Store Signed Byte

17

3 Multiply Instructions  Integer multiplication (32-bit result)  Long integer multiplication (64-bit result)  Built in Multiply Accumulate Unit (MAC)  Multiply and accumulate instructions add product to running total

18

3 Multiply Instructions  Instructions: MUL

Multiply

32-bit result

MULA

Multiply accumulate

32-bit result

UMULL

Unsigned multiply

64-bit result

UMLAL

Unsigned multiply accumulate

64-bit result

SMULL

Signed multiply

64-bit result

SMLAL

Signed multiply accumulate

64-bit result

19

3 Software Interrupt  SWI instruction Forces CPU into supervisor mode  Usage: SWI #n 

31

28 27 Cond

24 23 Opcode

0 Ordinal

Maximum 224 calls  Suitable for running privileged code and making OS calls 

20

3 Branching Instructions  Branch (B): jumps forwards/backwards up to 32 MB  Branch link (BL): + saves (PC+4) in LR

same

 Suitable for function call/return  Condition codes for conditional branches

21

3 Thumb Instruction Set  Compressed form of ARM   

Instructions stored as 16-bit, Decompressed into ARM instructions and Executed

 Lower performance (ARM 40% faster)  Higher density (THUMB saves 30% space)  Optimal – (combining two sets) – supported

“interworking” compiler

22

3 THUMB Instruction Set  More traditional:  

No condition codes Two-address data processing instructions

 Access to R0 – R8 restricted to 

MOV, ADD, CMP

 PUSH/POP for stack manipulation 

Descending stack (SP hardwired to R13)

 No MSR and MRS, must change to ARM to modify CPSR (change using BX or BLX)  ARM entered automatically after RESET or entering exception mode  Maximum 255 SWI calls

23

4. Creating Assembly listing  The following lines of code produce assembly listing #arm-elf-gcc -02 -fomit-frame-pointer -c -o test.o test.c #arm-elf-objdump -d test.o > test.txt

24

4. Efficient C Programming for ARM

•Data

Types •Loops •Register allocation •Function Calls •Pointer Aliasing •Structure Arrangement •Endianness •Inline Functions and assembly •Portability Issues

25

4. Data Types

Data Type

Implementation

Char Short Int Long Long long

Unsigned 8-bit Signed 16-bit Signed 32-bit Signed 32-bit signed 64-bit

26

4. Rules for efficient use of Data-types For local variables which work from registers, don’t use char or short because they are stored in 32-bit space and extra instructions are needed to contain the range of char. Signed or unsigned int are recommended •

• For array and global data held in main memory, use type with smallest size. • Use Explicit cast when reading global values into local variables • Avoid char or short for function arguments and returns.

27

4. Loops • Use loops that count down to zero. This saves the space used by a register to hold the comparison value. Comparison with zero doesn’t add to overhead • If it is known that minimum one loop will occour, usage of do-while saves overhead by avoiding the zerocomparison. • Use unsigned loop counters and use i!=0 rather than i>0, to avoid usage of cmp instruction abd reduce overhead • If loop over head is high, and repeat count is low it helps if we Unroll the loop

28

4. Efficient Register Allocation Try to limit the number of local variables in the internal loop of functions to 12. The compiler should be able to allocate these to ARM registers.  You can guide the compiler as to which variables are important by ensuring these variables are used within the innermost loop. 

29

4. Calling Functions Efficiently  Try to restrict functions to four arguments. This will make them more efficient to call. Use structures to group related arguments and pass structure pointers instead of multiple arguments.  Define small functions in the same source file and before the functions that call them. The compiler can then optimize the function call or inline the small function.  Critical functions can be inlined using the _inline keyword.

30

4. Efficient Structure Arrangement  Lay structures out in order of increasing element size. Start the structure with the smallest elements and finish with the largest.  Avoid very large structures. Instead use a hierarchy of smaller structures.  For portability, manually add padding (that would appear implicitly) into API structures so that the layout of the structure does not depend on the compiler.

31

4. Bit-fields  Avoid using bit-fields. Instead use #define or enum to define mask values.  Test, toggle, and set bit-fields using integer logical AND, OR, and exclusive OR oper-ations with the mask values. These operations compile efficiently, and you can test, toggle, or set multiple fields at the same time.

32

4. Endianness and Alignment  Avoid using unaligned data if you can.  Use the type char* for data that can be at any byte alignment. Access the data by reading bytes and combining with logical operations. Then the code won't depend on alignment or ARM endianness configuration.  For fast access to unaligned structures, write different variants according to pointer alignment and processor endianness.

33

4. Division  Avoid divisions as much as possible. Do not use them for circular buffer handling.  If you can't avoid a division, then try to take advantage of the fact that divide routines often generate the quotient n/d and modulus n%d together.

34

4. Inline Functions and Assembly Use inline functions to declare new operations or primitives not supported by the C compiler.  Use inline assembly to access ARM instructions not supported by the C compiler. Examples are coprocessor instructions or ARMv5E extensions. 

35

GNU For ARM Development Arm-elf-gcc Objcopy Ld make

36

5. Arm-elf-gcc  

The arm compiler component of gcc. Typical usage:  



arm-elf-gcc -mcpu=arm7tdmi -mthumb -O2 -g -c hello_world.c arm-elf-gcc -mcpu=arm7tdmi -mthumb -o hello_world hello_world.o -lc

-mcpu specifies the processor type, -m to select thumb or arm mode.

37

5. ld  It’s the linker used to link arm applications.  Typical usage: − ld -g -Tlpc2104-rom.ld -Map=main.map --cref -o helloworld helloworld.o boot.o crt0.o -lc –lm.  -T for linker script, -Map produces map file, -l for libraries.

38

5. Objcopy  Converts .obj to downloadable .hex format  Typical usage: Objcopy --output-target ihex main main.hex

39

Related Documents

Arm
November 2019 39
Arm
November 2019 32
Arm
November 2019 46
The Arm
July 2020 9
Arm-dsp
May 2020 6