CS220 March 28, 2007
Inline • Inline function: – Replacing function call directly with function body. The performance overhead of a function call is avoided.
inline int add(int a, int b) { return a+b; } Compiler replaces c=add(3,5); with c=3+5;
Inline Assembly • Reasons: – optimization – access to processor specific instructions – system calls
• Format: – __asm__(“asm code”); – The instructions must be enclosed in quotation marks. – If more than one instruction is included, the new line character(‘\n’) MUST be used to separate each line of assembly language code. Often, a tab character(‘\t’) is also included to help indent the assembly language code to make lines more readable. – volatile modifier can be placed in the asm statement to indicate that no optimization is desired on that section of code. • __asm__ volatile (“assembly code”);
Basic Inline #include <stdio.h> int a = 10; int b = 20; int result; int main() { __asm__( "pusha\n\t" "movl a, %eax\n\t" "movl b, %ebx\n\t" "imull %ebx, %eax\n\t" "movl %eax, result\n\t" "popa" ); printf("the answer is %d\n", result); return 0; }
Global variables a and b are Initialized result is not
Basic inline assembly can access global variables, but not local variables
Assembly code .globl a .data .align 4 .type a, @object .size a, 4 a: .long 10 .globl b .align 4 .type b, @object .size b, 4 b: .long 20 .section .rodata .LC0: .string "the answer is %d\n“ .text .globl main .type main, @function
main: pushl %ebp movl %esp, %ebp #APP pusha movl a, %eax movl b, %ebx imull %ebx, %eax movl %eax, result popa #NO_APP pushl result pushl $.LC0 call printf addl $8, %esp movl $0, %eax leave ret .size main, .-main .comm result,4,4
.comm declares a common memory area for data that is not initialized
Extended Inline Assembly •
__asm__ ( “assembly code” : output locations : input operands : changed registers );
• • •
#must use %% prefix for registers
Output locations: A list of registers and memory locations that will contain the output values from the inline assembly code Input operands: A list of registers and memory locations that contain input values for the inline assembly code Changed registers: A list of any additional registers that are changed by the inline code
Using Registers Explicitly #include <stdio.h> int main() { int data1 = 10; int data2 = 20; int result; __asm__ ( “imull %%edx, %%ecx\n\t” “movl %%ecx, %%eax” : “=a”(result) : “d”(data1), “c”(data2) ); printf(“The result is %d\n”, result); return 0; }
.section .rodata .LC0: .string "The result is %d\n" .text .globl main .type main, @function main: pushl %ebp movl %esp, %ebp subl $12, %esp movl $10, -4(%ebp) movl $20, -8(%ebp) movl -4(%ebp), %edx movl -8(%ebp), %ecx #APP imull %edx, %ecx movl %ecx, %eax #NO_APP movl %eax, -12(%ebp) pushl -12(%ebp) pushl $.LC0 call printf addl $8, %esp movl $0, %eax leave ret .size main, .-main
Constrains • • • • • • • • • • • • • • • • • •
a Use the %eax, %ax, or %al registers. b Use the %ebx, %bx, or %bl registers. c Use the %ecx, %cx, or %cl registers. d Use the %edx, %dx, or %dl registers. S Use the %esi or %si registers. D Use the %edi or %di registers. r Use any available general-purpose register. q Use either the %eax, %ebx, %ecx, or %edx register. A Use the %eax and the %edx registers for a 64-bit value. f Use a floating-point register. t Use the first (top) floating-point register. u Use the second floating-point register. m Use the variable’s memory location. o Use an offset memory location. V Use only a direct memory location. i Use an immediate integer value. n Use an immediate integer value with a known value. g Use any register or memory location available.
Output Modifier • + The operand can be both read from and written to. • = The operand can only be written to. • % The operand can be switched with the next operand if necessary. • & The operand can be deleted and reused before the inline functions complete.
Using Registers Placeholder #include <stdio.h> int main() { int data1 = 10; int data2 = 20; int result; __asm__ ( “imull %1, %2\n\t” “movl %2, %0” : “=r”(result) : “r”(data1), “r”(data2) ); printf(“The result is %d\n”, result); return 0; } __asm__ ( “imull %1, %0” : “=r”(result) : “r”(data1), “0”(data2) );
.section .rodata .LC0: .string "The result is %d\n" .text .globl main .type main, @function main: pushl %ebp movl %esp, %ebp subl $12, %esp movl $10, -4(%ebp) movl $20, -8(%ebp) movl -4(%ebp), %edx movl -8(%ebp), %eax #APP imull %edx, %eax movl %eax, %eax #NO_APP movl %eax, -12(%ebp) pushl -12(%ebp) pushl $.LC0 call printf addl $8, %esp movl $0, %eax leave ret .size main, .-main
Digit constraints __asm__ ( "incl %0" :"=a"(var) :"0"(var) );
• %eax is used as both the input and the output variable, "0" here specifies the same constraint as the 0th output variable.
Changed Registers/Clobber list #include <stdio.h> int main() { int data1 = 10; int result = 20; __asm__ ( “movl %1, %%eax\n\t” “addl %%eax, %0” : “=r”(result) : “r”(data1), “0”(result) : “%eax” ); printf(“The result is %d\n”, result); return 0;
}
movl $10, -4(%ebp) movl $20, -8(%ebp) movl -4(%ebp), %ecx movl -8(%ebp), %edx #APP movl %ecx, %eax addl %eax, %edx #NO_APP movl %edx, %eax
Clobber List • inform gcc that we will use and modify them ourselves, so that gcc will not assume that the values it loads into these registers will be valid. • Explicitly or implicitly used registers need to be specified • Input/output registers don’t need to list
Using Memory #include <stdio.h> int main() { int dividend = 20; int divisor = 5; int result; __asm__("divb %2\n\t" "movl %%eax, %0" : "=m"(result) : "a"(dividend), "m"(divisor) ); printf("The result is %d\n", result); return 0; }
movl $20, -4(%ebp) movl $5, -8(%ebp) movl -4(%ebp), %eax #APP divb -8(%ebp) movl %eax, -12(%ebp) #NO_APP
More on constrains #include <stdio.h> int main() { char input[30] = {“This is a test message.\n”}; char output[30]; int length = 25; __asm__ volatile ( “cld\n\t” “rep movsb” : : “S”(input), “D”(output), “c”(length) void *memcpy( void *dest, const void *src, unsigned int n) ); { printf(“%s”, output); __asm__ volatile( return 0; "cld\n\t" rep movsb\n\t" } : : "c" ((unsigned int) n), "S" (src), "D" (dest) ); return dest; }
Appendix: Common directives •
.align integer, pad –
•
The .align directive causes the next data generated to be aligned modulo integer bytes. Integer must be a positive integer expression and must be a power of 2. If specified, pad is an integer bye value used for padding. The default value of pad for the text section is 0x90 (nop); for other sections, the default value of pad is zero (0).
.ascii "string" –
The .ascii directive places the characters in string into the object module at the current location but does not terminate the string with a null byte (\0). String must be enclosed in double quotes (") (ASCII 0x22). The .ascii directive is not valid for the .bss section.
•
.bss
•
.bss symbol, integer
– –
•
Define symbol in the .bss section and add integer bytes to the value of the location counter for .bss. When issued with arguments, the .bss directive does not change the current section to .bss. Integer must be positive.
.byte byte1,byte2,...,byteN –
•
The .bss directive changes the current section to .bss.
The .byte directive generates initialized bytes into the current section. The .byte directive is not valid for the .bss section. Each byte must be an 8-bit value.
.comm name, size,alignment – –
The .comm directive allocates storage in the data section. The storage is referenced by the identifier name. Size is measured in bytes and must be a positive integer. Name cannot be predefined. Alignment is optional. If alignment is specified, the address of name is aligned to a multiple of alignment.
•
.data
•
.double float
– –
•
The .data directive changes the current section to .data. The .double directive generates a double-precision floating-point constant into the current section. The .double directive is not valid for the .bss section.
.file "string" –
The .file directive creates a symbol table entry where string is the symbol name and STT_FILE is the symbol table type. String specifies the name of the source file associated with the object file.
Common directives cont’ •
.float float –
•
.globl symbol1, symbol2, ..., symbolN –
•
The .string directive places the characters in string into the object module at the current location and terminates the string with a null byte (\0). String must be enclosed in double quotes (") (ASCII 0x22). The .string directive is not valid for the .bss section.
.text –
•
The .long directive generates a long integer (32-bit, two’s complement value) for each expression into the current section. Each expression must be a 32–bit value and must evaluate to an integer value. The .long directive is not valid for the .bss section.
.string "string" –
•
The .lcomm directive allocates storage in the .bss section. The storage is referenced by the symbol name, and has a size of size bytes. Name cannot be predefined, and size must be a positive integer. If alignment is specified, the address of name is aligned to a multiple of alignment bytes. If alignment is not specified, the default alignment is 4 bytes.
.long expression1, expression2, ..., expressionN –
•
The .ident directive creates an entry in the .comment section containing string. String is any sequence of characters, not including the double quote ("). To include the double quote character within a string, precede the double quote character with a backslash (\) (ASCII 0x5C).
.lcomm name, size, alignment –
•
The .globl directive declares each symbol in the list to be global. Each symbol is either defined externally or defined in the input file and accessible in other files. Default bindings for the symbol are overridden. A global symbol definition in one file satisfies an undefined reference to the same global symbol in another file. Multiple definitions of a defined global symbol are not allowed. If a defined global symbol has more than one definition, an error occurs. The .globl directive only declares the symbol to be global in scope, it does not define the symbol.
.ident "string" –
•
The .float directive generates a single-precision floating-point constant into the current section. The .float directive is not valid in the .bss section.
The .text directive defines the current section as .text.
.zero expression –
While filling a data section, the .zero directive fills the number of bytes specified by expression with zero (0).