8087 Ndp Paper P174 Palmer

  • Uploaded by: Hariprakash
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View 8087 Ndp Paper P174 Palmer as PDF for free.

More details

  • Words: 8,599
  • Pages: 8
THE INTEL~8087 NUMERIC DATA PROCESSOR

John Palmer

Intel Corporation This paper describes a new device, the Intel~ 8087 Numeric Data Processor, with unprecedented speed, accuracy and capability. Its modified stack architecture and instruction set are explained and i l l u s t r a t i v e examples are included. The 8087, which conforms to the proposed IEEE FloAting-Point Standard, is a coprocessor in the Intel~8086 fami l y . I t supports seven data types: three REAL, three INTEGERand one packed BCD format, and performs a l l necessary numeric operations from addition to logarithmic and trigonometric functions.

new applications, most notably interval arithmetic [1]. The 8087 provides an unprecedented level of capability, safety and r e l i a b i l i t y with high performance and low cost and is a prime example of the almost incredible p o s s i b i l i t i e s in combining software and architectural expertise with VLSI processing capability. 2.0

8087 OVERVIEW

The 8087 consists of a stack of registers for holding operands and results, a set of registers constituting i t s environment and a set of instructions. The stack is a set of 8 registers, each 80 bits wide. Associated with the stack is a three b i t stack pointer, TOP, and with each stack element a two b i t tag. (Both the tags and TOP physically belong to the ENVIRONMENT but w i l l be shown with the stack.) The stack elements are numbered relative to TOP (ST(i) means the i t h stack element from the top of stack) as shown bel~.

STACK

TAGS 1.0

INTRODUCTION ¢- SIGN

The Intel~)8087 is a high performance general purpose nume~c data processor. I t is a part of the InteNJ8086 family and can be used with either the 8086 or the 8088 to extend their instruction sets by over 120 numeric data manipulation operations. The 8087 is not a peripheral but a coprocessor; i t monitors the instruction stream and when an 8086/8088 ESCAPE instruction is read, the 8087 takes over the bus and interprets and executes the ESCAPE instruction as one of i t s own instructions. This t i g h t l y coupled coprocessing interface permits the 8087 to execute numeric instructions while the 8086 executes any others. The concurrent instruction execution increases the throughput of the system. Furthermore, the 8087 is the only chip that must be added to an 8086 (8088) system to provide numeric capab i l i t y that exceeds software in speed by more than a factor of 100. The 8087 is intended to be general purpose and satisfy a very wide range of needs for mathematical computation. I t is fast enough for a great many s c i e n t i f i c and s t a t i s t i c a l calculations; i t is accurate enough for business and commercial computation; and i t is precise enough for entirely

CH1494-4/80/0000-0174 $00.75 © 1980 IEEE 474

o -7,=1 5 EXPONENT 51GNIFleRND ST{~)

ST (a) ST(K) ST(o) STBCK

i

II

ST(G)

5T(5)

ST(4) The tag f i e l d is used to detect u n i n i t i a l i z e d stack elements and to designate special values (e.g. zero) for microcode optimization. The value represented in a register has 64 bits of precision and a range of about 10±4900 (15 b i t exponent). A more complete description of the register values w i l l be given in Section 3.

The 8087 environment consists of seven words as i l l u s t r a t e d below.

i t causes an exception t h a t generates an i n t e r r u p t . There are four types of 8087 i n s t r u c t i o n s : the CORE set, the EXTENDED set, the SPECIAL FUNCTION set and the ADMINISTRATIVE set. The core set i n cludes load and store of the stack values and ar i t h m e t i c operators: add, s u b t r a c t , m u l t i p l y , d i vide and compare. The extended set is f o r loading and s t o r i n g three special formats (see Section 3). The special f u n c t i o n set includes square root and transcendental f u n c t i o n support. The administrat i v e i n s t r u c t i o n s are used f o r context switching and processor c o n t r o l . Most of the i n s t r u c t i o n s w i l l be described in more d e t a i l as the 8087 design goals are explained.

B Z TOP~ C AISIN -iP U O (~!DI ST/~TU5 CONTROL WORD TAG WOR.D

~NST~UCT ion

RDDIZESS DATA

The STATUS word consists of the EXCEPTION f l a g s (0-7) and the STATUS b i t s (8-15) where the meanings are (* indicates a f i e l d reserved f o r f u t u r e use):

3.0

The 8087 is designed to achieve several major goals. F i r s t , the 8087 conforms to an improved and expanded version of I n t e l ' s standard f o r f l o a t i n g - p o i n t a r i t h m e t i c C2]. Second, the 8087 provides s i g n i f i c a n t l y more c a p a b i l i t y than mainframe and minicomputer f l o a t i n g - p o i n t processors and consequently has a p p l i c a t i o n s beyond s c i e n t i f i c computation. T h i r d , the 8087 is convenient to use in assembly language and easy to generate code f o r in high level language. And f i n a l l y the capabili t i e s of VLSI are used to provide a l l t h i s f u n c t i o n a l i t y with high performance and e f f i c i e n c y in a s i n g l e device.

EXCEPTION FLAGS I D Q 0 U P N

: : : : : : :

i n v a l i d operation denormalized operand d i v i s i o n of nonzero by zero overflow underflow inexact ( p r e c i s i o n ) indicates a pending i n t e r r u p t

STATUS BITS Z,C,A,S : c o n d i t i o n code b i t s f o r various i n s t r u c t i o n s (e.g. COMPARE) TOP B

DESIGN GOALS

3.1

F l o a t i n g - P o i n t Standard

The I n t e l f l o a t i n g - p o i n t standard, called the REALMATH standard, was o r i g i n a l l y specified in 1977 C2] and implemented in several products (FPAL, SBC-310, FORTRAN-80, BASIC-80). At about t h a t time an IEEE committee was formed to propose a f l o a t i n g p o i n t standard f o r microprocessors. I n t e l was i n v i t e d to p a r t i c i p a t e and offered i t s standard f o r consideration.

: stack p o i n t e r

: indicates whether the 8087 is BUSY (used f o r synchronization)

The CONTROL WORD consists of EXCEPTION MASKS and CONTROL BITS. For each exception there is a mask which i f reset allows an i n t e r r u p t to be generated ( i f M = O) but i f set the i n t e r r u p t is suppressed and the 8087 executes a d e f a u l t exception handling procedure (on chip) and continues (the procedure w i l l be explained in Section I I I ) . The M mask is the 8087 i n t e r r u p t enable/disable b i t . The CONTROL BITS have the f o l l o w i n g meaning

At the time t h i s paper was w r i t t e n i t had become apparent t h a t the m a j o r i t y of the committee had agreed on a revised and expanded version of I n t e l ' s standard [ 3 ] . The standard s p e c i f i e s data formats, rounding algorithms and exception handling.

PC : precision control - r e s u l t s are rounded to one of three p r e c i s i o n s : Temporary Real (64 b i t s ) , Long Real (53 b i t s ) , Real (24 b i t s ) .

The standard s p e c i f i e s and the 8087 supports three f l o a t i n g - p o i n t data types: Real ( s i n g l e p r e c i s i o n ) , Long Real (double p r e c i s i o n ) and Temporary Real (extended p r e c i s i o n ) . A l l formats are binary and each has a biased exponent. The values represented by the three formats are shown below.

RC : rounding control - r e s u l t s are rounded in one of four d i r e c t i o n s : unbiased round to nearest, round towards + ~ , round towards - ~ , round towards zero.

,~m

~

o

IC : i n f i n i t y control - there are two types of i n f i n i t y a r i t h m e t i c provided: a f f i n e and p r o j e c t i v e . The TAG word contains tags describing the contents of the corresponding stack elements. The i n s t r u c t i o n and data pointers are the addresses o f an i n s t r u c t i o n (and i t s referenced data i f any) i f

'Tq I~hL

t75

6~

0

I

RERL.. TOTRL. L.E N ~TH EXPoNENT LENGTH EXPO~E~,,I'r

3 E bits '3 bits p.,1 _ [

LONGR£BL T g M R REAL

~4 bi-t-5 I I bi'i~ ~.,o_ I

i.

I :

i n v a l i d operation t h i s exception is signaled by stack overflow or underflow, the use of a NAN as an operand and several other cases as l i s t e d in ~3]

2.

D:

denormalized operand at least one operand is denormalized

3.

Q :

zero divisor the dividend is f i n i t e and nonzero while the divisor is zero

4.

0 :

overflow the exponent of the result is too large for the destination's format

5.

U :

underflow the exponent of the r e s u l t is too small f o r the d e s t i n a t i o n ' s format

6.

P :

inexact result the delivered result is not equal to the completely precise result but has been rounded

80 bit5

15 bits ~'~- I


VALU4 e.-O

[e,,o, JJ...I INFINITY e.-/l"-I,-F':O e : l l , . . I , ~ : O NOT'-A- e:~l...i ,-C¢O e:ll-"l ,.4~0 NUrvaBEI~.

e.tl...I,i.l,.F':O e,Jl..-I,i.I,-~-O

C.aN)

The Temporary Real format (identical to the 8087 register format) is intended to hold intermediates and to support accurate Long Real calculations. I t has an e x p l i c i t leading b i t ( i ) in the significand thus allowing unnormalized arithmetic. However, the algorithms are designed so that normalized operands w i l l always yield normalized results. The algorithms specified by the standard require that the completely precise result of an operation be rounded to the nearest representable number, breaking ties by rounding to the nearest even number. This default mode of rounding is called "unbiased round to nearest". There are ,optional "directed rounding" modes that are speci f i e d to yield 1.

the nearest neighbor less than or equal to the true result.

2.

the nearest neighbor greater than or equal to the true result.

Since the default response to overflow and zero divisor is to set the result t o n , the 8087 supports two modes of i n f i n i t y arithmetic: I.

a f f i n e - there are two i n f i n i t e s , one ( - ~ ) less than a l l other numbers and one (+cx:~) greater

2.

projective - there is only one i n f i n i t y (the sign o n - - i s ignored) which closes the number system analogous to the point a t ~ o n the Reimann sphere.

These two modes require the representation of two zeros (±0) which are "equal" in comparison and a l l other operations except division where*I/+O=,loc~ +I#O:-~. The mode of i n f i n i t y arithmetic is determined by a f i e l d (IC) in the CONTROLword.

The 8087 provides these rounding modes as controlled by a f i e l d (RC) in the CONTROLWORD. The 8087, which does a l l c a l c u l a t i o n s in Temporary Real format, has another f i e l d in the CONTROL word f o r s p e c i f y i n g the precision to which a r e s u l t is rounded (PC). Thus, the p r e c i s i o n of r e s u l t s is independent of the p r e c i s i o n of operands and, though held in Temporary Real format and bene f i t t i n g from extended range, may be forced to Real, Long Real or Temporary Real. This control is provided f o r languages t h a t do not a l l o w extended p r e c i s i o n intermediates and to allow the same code to be run under d i f f e r e n t precision sett i n g s as an aid to e r r o r estimation.

There are instructions that support the standard by controlling rounding, precision and i n f i n i t y arithmetic and by permitting complete exception handling. These instructions load and store either the control word or the entire environment and store the exception flags. The features and instructions discussed above support the Intel floating-point (REALMATH) standard but additional capability is also desired.

The standard also specifies that a l l exceptions must be detected and that an implementation should permit exception handling. The 8087 supports this by detecting six types of exceptions and by generating an interrupt i f the exception is not masked. I f an interrupt is generated, the interrupt procedure (exception handler) has available the exception flags, a pointer to the instruction causing the interrupt and a pointer to the datum i f memory was addressed. The six exceptions, each of which has an associated "sticky" flag (once set i t remains set until reset by software), are listed below.

3.2

Capability Extension

The 8087, by supporting the required and optional aspects of the standard and by supporting several features not mentioned by the standard, s i g n i f i c a n t l y extends the capabilities of the 8086 family beyond that expected from a typical floatingpoint processor. These extensions include additional data types, provision of exact arithmetic, support for interval arithmetic and special functions.

176

The 8087 addresses seven d i f f e r e n t data types using a l l of the 8086 addressing modes. These data types are: 1.

Real (32 b i t s )

2.

Long Real (64 b i t s )

3.

Temporary Real (80 b i t s )

4.

Integer Word (16 b i t s 2's complement)

5.

Integer

6.

Long Integer (64 b i t 2's complement)

7.

Packed BCD Integer (80 b i t s , 18 d i g i t s and sign)

Start at a and proceed clockwise u n t i l b is reached; a l l numbers covered belong to I. The signs on zero and i n f i n i t y permit us to have open or closed intervals when zero or i n f i n i t y is an end point with the sign denoting which case pertains. I f an endpoint is neither zero nor i n f i n i t y then the interval is always closed. A complete d e f i n i tion of interval arithmetic cannot be given here; however, we can l i s t some of i t s uses. In addition to i t s obvious a b i l i t y to bound rounding errors, interval arithmetic can be used to estimate the effect of noise in data, to compute confidence intervals and to do worst-case analysis.

(32 b i t 2's complement)

In a d d i t i o n to exact and i n t e r v a l a r i t h m e t i c , the 8087 provides several special i n s t r u c t i o n s f o r e f f i c i e n t evaluation of many important mathematical f u n c t i o n s with unprecendented accuracy. One of these i n s t r u c t i o n s is square root. I t overwrites the contents of the top of stack with i t s c o r r e c t l y rounded (according to RC and PC) square root. Besides being c o r r e c t l y rounded the square root operation is as f a s t as the d i v i d e i n s t r u c t i o n . Thus algorithms need not be contorted to remove square roots.

A l l of the data types, when used as operands, are f i r s t converted (without rounding e r r o r ) to Temporary Real and the r e s u l t of the operation is also returned as Temporary Real. Thus the 8087 a r i t h m e t i c u n i t only has to work with one kind of data. When r e s u l t s are desired in one of the other formats, they are a u t o m a t i c a l l y converted to t h a t type before they are stored in memory.

There are two i n s t r u c t i o n s to aid in argument reduction f o r transcendental f u n c t i o n e v a l u a t i o n : DECOMPOSE and REMAINDER. The decompose i n s t r u c t i o n overwrites the contents of the top of stack with the i n t e g r a l value of i t s exponent in Temporary Real format, decrements the stack p o i n t e r and loads i n t o the new top of stack the value of the s i g n i f i cand of the o r i g i n a l stack top scaled between I and 2 (or -1 and -2 i f negative). The operation is i l l u s t r a t e d below.

The provision of exact a r i t h m e t i c is accomplished by i n c l u d i n g the inexact exception (P) along with i t s mask. I f a rounding e r r o r is committed, the c o r r e c t l y rounded r e s u l t is delivered and the P f l a g is set. I f the mask (PM) is zero an i n t e r r u p t is generated, otherwise execution simply continues. This permits f i n a n c i a l accounting functions to be performed w i t h o u t fear of roundoff e r r o r . Exact a r i t h m e t i c is also useful in doing c o e f f i c i e n t " p r e c o n d i t i o n i n g " [see 4].

A

The support of i n t e r v a l a r i t h m e t i c is considered one of the most important features of the 8087. As stated by W. Kahan [ 5 ] :

Top

"No other feature would enhance safe numerical computation more than the provision of INTERVAL as a data type in FORTRAN as r e a d i l y accessible as INTEGER or REAL."

sl p lil

( I f the o r i g i n a l top of stack is zero then both r e s u l t s are zero.) The remainder i n s t r u c t i o n is f o r reducing arguments of periodic f u n c t i o n s to a primary range. I t c a l c u l a t e s the exact remainder (no roundoff error) of the top two stack elements:

This new INTERVAL data type, which the 8087 supports through the rounding modes (RC) and the signed zeros and i n f i n i t i e s , can be represented as an ordered pa~r: INTERVAL, I = [a,b~. If a~b then I includes a l l numbers between a and b; but i f a > b then I includes a l l numbers x where x ~ a or x ~ b . An i l l u s t r a t i o n may help c l a r i f y the concept. Consider the set of numbers as a c i r c l e with the two cases described above pictured as

o

A

REM = (TOP) modulo (next-of-TOP) The remainder is returned to the stack top and the next-of-TOP ( " d i v i s o r " ) is not changed. Since the execution of a f l o a t i n g - p o i n t remainder could be very lengthy, the remainder i n s t r u c t i o n is a c t u a l l y a primitive: the r e s u l t is e i t h e r the remainder or the p a r t i a l remainder a f t e r a f i x e d number of steps. Thus to compute a remainder requires a software loop that terminates when I(TOP)I is less than I(TOP +I) I. Even by using remainder we w i l l not have t r i g o n o m e t r i c functions with period 2'Irsince 'IT'cannot be e x a c t l y represented in the 8087. However, the functions w i l l be e x a c t l y p e r i o d i c with

0

'177

l i n k stage, i t is necessary to explain the 80868087 i n t e r f a c e .

period 2"Ir'* (whereqT'* is the machine approximation to.lr') and thus w i l l obey the i d e n t i t i e s t h a t do not explicitly involveqT'.

The 8086 (8088) has a set of ESCAPE i n s t r u c t i o n s t h a t , in memory addressing mode, cause the 8086 to c a l c u l a t e the address and read the contents of t h a t address. The 8086 ignores the word i t reads and then preceeds to execute subsequent i n s t r u c t i o n s . The 8087 is monitoring the same i n s t r u c t i o n stream and when i t detects an ESCAPE i t knows t h a t i t is being i n s t r u c t e d to do something. I t latches the opcode and i f there was an address c a l c u l a t e d the 8087 captures both the address and the datum read by the 8086. By decoding the i n s t r u c t i o n the 8087 knows how many more words i t meeds from memory and i t increments the address and fetches data u n t i l a l l required data is read. The 8087 then releases the bus and begins c a l c u l a t i n g w h i l e the 8086 continues executing the i n s t r u c t i o n stream. Because of the overlapped coprocessing of the 8086-8087 i t is necessary to preceed 8087 i n s t r u c t i o n s (ESCAPE) with a WAIT i n s t r u c t i o n in order to synchronize the two processors. In place of the WAIT, when the software emulator is to be invoked, an INTERRUPT i n s t r u c t i o n is inserted. There are some other d i f f e r e n c e s between the hardware and software i n t e r f a c e s but they are the same length and use the same addressing mechanism. This permits a compiler to output an external reference instead of the WAIT-ESCAPE and l e t the LINKER f i l l in with e i t h e r WAIT-ESCAPE or INTERRUPT depending on whether the user has an 8087 or desires to use the emulator.

The other i n s t r u c t i o n s provided f o r special functions are TANGENT, ARCTANGENT, EXPONENTIAL and LOGARITHM. The tangent assumes the top of stack, X, i s between zero and'IT'/4 and returns two r e s u l t s as shown: .

A

ToP

.

X

I t TAN

I

/

A

Y

/

T~P

The arctangent works in reverse by using two arguments and r e t u r n i n g one:

:



TOP

=

A y ~

Ily~z

.

ATAN

>O

IT°p " II X--arc'fon(y~)j

"

A X

The exponential i n s t r u c t i o n , which c a l c u l a t e s 2 X -1, assumes t h a t 0 argument on the top of The logarithm f u n c t i o n , uses two arguments and shown

I i

Y

TOP

~

X

_~x~1/2 and overwrites the the stack with the r e s u l t . which computes Y * log2(X), returns a s i n g l e r e s u l t as

x >o [~:y~loq~Cx)l

In a d d i t i o n to software emulation to aid s o f t ware development, the 8087 has an e i g h t level stack of r e g i s t e r s t h a t supports the Temporary Real (80 b i t ) format and makes the 8087 f a r easier to use than other f l o a t i n g - p o i n t processors. A l l calcul a t i o n s are done in t h i s extended format and as long as intermediates are kept in the stack or i t s e q u i v a l e n t memory format ( i f e i g h t is not enough) then the t h r e a t of roundoff damage and r i s k of overflow or underflow is g r e a t l y reduced. Roundoff error is reduced because Temporary Real intermediates are more precise than Long Real data or f i n a l res u l t s by eleven guard b i t s . Most overflows and underflows occur on intermediate c a l c u l a t i o n s and the extended range of Temporary over Long Real (1024900 vs. 10 ±308 ) ensures t h a t on intermediates these exceptions need seldom, i f ever, occur.

i

~"

The e r r o r bound f o r a l l these f u n c t i o n s is about 2 u n i t s in the l a s t place thus a l l o w i n g f o r Long Real arguments to be computed to Long Real accuracy. The p r o v i s i o n of the described special f u n c t i o n s support the goal of increased c a p a b i l i t y . 3.3

Ease of Use

As stated above, ease of use, along with support of the standard and extended c a p a b i l i t y , is a major 8087 goal. We have made the 8087 easy and convenient f o r programmers and automatic code generators by providing software emulation, a deep (8 l e v e l s ) i n t e r n a l stack of very wide precision

The symmetric mixed mode i n s t r u c t i o n set also c o n t r i b u t e s to ease of use. The CORE i n s t r u c t i o n s , which include LOAD, STORE & POP, STORE, ADD, SUBTRACT, SUBTRACT REVERSE, MULTIPLY, DIVIDE, DIVIDE REVERSE, COMPARE, and COMPARE & POP, take one operand from the top of stack and a second operand from e i t h e r memory or a stack element. There are thus two forms of CORE i n s t r u c t i o n s : memory addressed and stack addressed. The memory addressed form supports four memory formats in a l l 8086 addressing modes:

(64 bits) and large range (10:1:4900), optimized symmetric mixed mode arithmetic and on chip default exception handling.

The i n t e r f a c e between the 8086 (8088) and 8087 allows f o r software emulation of the 8087 permitt i n g software f o r the 8087 to be developed, debugged and executed on a system containing only an 8086 (8088). In order to run the developed software on an 8087 i t is not necessary to recompile but only r e l i n k . To understand how one can delay the resolution of either 8087 or emulator u n t i l the

Integer Word (16 b i t 2's complement) Integer (32 b i t 2's complement) Real (32 b i t ) Long Real (64 b i t )

~78

The LOAD Integer i n s t r u c t i o n converts an i n t e g e r to Temporary Real format and pushes i t on the stack; the ADD Long Real i n s t r u c t i o n converts a Long Real operand to Temporary Real and adds i t to the top of the stack; and t h e STORE Integer Word i n s t r u c t i o n converts the top of stack to a 16 b i t integer and stores i t in memory ( w i t h o u t a l t e r i n g the contents of the stack).

p h i c a l l y l a r g e r ( i g n o r i n g the sign) otherwise i t generates a special NAN c a l l e d INDEFINITE as the result.

The stack addressed form of the CORE i n s t r u c t i o n s obtains the second operand from one of the stack elements instead of memory. The reference is always r e l a t i v e to the top of stack; thus stack element i , where i:O . . . . . 7, refers to the i t h element of the stack under the top of stack. The stack addressed form has two options f o r the dest i n a t i o n of the r e s u l t . The r e s u l t can e i t h e r overw r i t e the top of stack or replace the contents of the i t h stack element depending on the s e t t i n g of the "di-rection" (D) b i t in the i n s t r u c t i o n . I f the d e s t i n a t i o n is the i t h stack element then depending on the s e t t i n g of another b i t (the "pop" (P) b i t ) the stack is popped or l e f t unaltered. The EXTENDED instructi~on set consists of two memory addressed type of i n s t r u c t i o n s , LOAD and STORE & POP, t h a t support three a d d i t i o n a l memory formats:

3.4

The Temporary Real format is supported f o r extending the 8087 stack to memory when necessary; the Packed BCD format, which is a signed 18 d i g i t i n t e g e r as shown,

°°.

Hod

is used to aid binary-decimal conversion and COBOL type c a l c u l a t i o n s ; and the Long Integer format is supported f o r a p p l i c a t i o n s r e q u i r i n g very wide prec i s i o n exact computation. Again i t is important to note t h a t conversion of these formats to Temporary Real is done with no rounding e r r o r . Another i n s t r u c t i o n , included to make the 8087 easy to use, is in n e i t h e r the CORE nor the EXTENDED set but i t s value is obvious. That i n s t r u c t i o n is EXCHANGEtop of stack with the i t h stack element. This i n s t r u c t i o n has no memory form and ignores the D and P b i t s . A f u r t h e r user convenience in the 8087 is i t s on-chip d e f a u l t exception handling. Though i t is possible to handle exceptions with software, i t is often an onerous task to w r i t e , debug and maintain exception handlers. The d e f a u l t 8087 response to an exception is invoked by masking in the CONTROL WORD t h a t exception. The 8087's response to masked exceptions balances safety With the u t i l i t y of continued c a l c u l a t i o n . Listed below are the d e f a u l t responses to masked exceptions: 1.

Denormalized Operand - the operand is converted to an e q u i v a l e n t unnormalized representation preserving the same number of leading zeros.

3.

Zero D i v i s o r - since the dividend is nonzero the r e s u l t is ± ~ with the sign set in the usual way (XOR of the signs of the operands).

4.

Overflow - the r e s u l t i s ~ w i t h of the overflowed r e s u l t .

5.

Underflow - the r e s u l t is denormalized to f i t the d e s t i n a t i o n ' s format ("gradual underflow" E4J).

6.

Inexact Result - the c o r r e c t l y rounded r e s u l t is returned.

the sign

A l l of the features discussed above: software emu l a t i o n , deep Temporary Real stack, symmetric and powerful i n s t r u c t i o n set and d e f a u l t exception handling, make the 8087 easy and convenient to use; but to be useful i t must also be e f f i c i e n t .

Long Integer (64 b i t 2's complement) Temporary Real (80 b i t ) Packed BCD (80 b i t )

°I

2.

I n v a l i d Operation - i f e i t h e r operand is NAN, the 8087 propagates the l e x i c o g r a -

'179

Effic.iency

E f f i c i e n c y was a major goal in the design of the 8087. An extensive treatment of the i n t e r n a l hardware and algorithms w i l l be given elsewhere, but a b r i e f d e s c r i p t i o n w i l l i l l u s t r a t e our concern f o r performance. The 8087's main ALU is more than 64 b i t s wide. This is to handle e f f i c i e n t l y 64 b i t operands with guard, round and s t i c k y b i t s [ 6 ] and at l e a s t one overflow b i t . I t s s h i f t e r can s h i f t r i g h t or l e f t from 0 to 63 places in one clock cycle. This is useful f o r f o r m a t t i n g , normalizing and denormalizing and f o r the transcendental f u n c t i o n s . For normalizing there is hardware f o r detecting the p o s i t i o n of the most s i g n i f i c a n t one. F i n a l l y , there is special harc~ware to permit m u l t i p l y , d i v i d e , remainder and square root to be calculated r a p i d l y . Approximate speeds of the basic operations f o r stack operands are summarized below: 5MHz Microseconds COMPARE ADD (MAGNITUDE) SUBTRACT (MAGNITUDE) MULTIPLY DIVIDE SQUARE ROOT

5 10 16 16, 24* 38 38

* shorter time i f e i t h e r operand was o r i g i n a l l y Real (32 b i t ) The above timings apply f o r Real, Long Real or Temporary Real operands and r e s u l t s . The p r e v i ously described overlapped i n s t r u c t i o n execution by the 8086 and 8087 also increases throughput. However, more important t h a t absolute execution speeds is the stack with i t s i n t e r n a l addressing

t h a t minimizes memory referencing. There is an i n s t r u c t i o n f o r scaling t h a t is much f a s t e r than multiply. For rapid context s w i t c h i n g , the 8087 has SAVE and RESTORE i n s t r u c t i o n s . The i n s t r u c t i o n set and the hardware to execute i t r a p i d l y give the 8087 very high performance w i t h o u t s a c r i f i c i n g quality. 4.0

CONCLUSION

To i l l u s t r a t e the c a p a b i l i t i e s of the 8087 an extensive set of programs would be very u s e f u l . We w i l l here give two examples t h a t should r e i n f o r c e many of the points made e a r l i e r . The f i r s t example is to c a l c u l a t e the length of a vector. The task is conceptually simple but a r e l i a b l e , robust program f o r the t y p i c a l f l o a t i n g - p o i n t system is hard to produce. With the 8087 i t is easy, almost automatic, to produce such a program.

This program is free from intermediate overflow or underflow problems and unless N is very large i t s only s i g n i f i c a n t rounding e r r o r is in the l a s t i n , s t r u c t i o n - where i t is unavoidable but easy to analyze.

Sy =:Eyi 2

In an o r d i n a r y stack machine, the f i v e values l i s ted above would probably be calculated in f i v e separate passes through the data r e q u i r i n g t h a t each datum be read three times. On the 8087 the f i v e values can be c a l c u l a t e d in one pass through the data, r e q u i r i n g t h a t each datum be read only once. The a r c h i t e c t u r a l feature t h a t permits t h i s increase in e f f i c i e n c y is the a b i l i t y to do a r i t h m e t i c with operands from any stack element. The algorithm is described below.

O.

Clear f i v e stack elements (push zero f i v e times): Mx, My, Sx, Sy, Cxy LOAD X i

2.

Add TOP (Xi) to Mx

3.

Duplicate TOP of stack

4.

Square TOP

Add TOP (Yj) to My

8.

M u l t i p l y TOP (Yi) to Xi

9.

Square TOP

I0.

Add TOP ( Y ~ to Sy and POP

11.

Add TOP ( X i Y i ) to Cxy and POP

12.

Loop to Step I

ACKNOWLEDGEMENTS

BIBLIOGRAPHY I.

Moore, R.E. (1979), "Methods and A p p l i c a t i o n s of I n t e r v a l A n a l y s i s , " SIAM Studies in Applied Mathematics, SIAM, P h i l a d e l p h i a .

2.

Palmer, J. (1977), "The I n t e l Standard f o r F l o a t i n g - P o i n t A r i t h m e t i c , " Proc. COMPSAC, 107-112.

3.

Coonan, J . , W. Kahan, J. Palmer, T. Pittman and D. Stevenson (1979), "A Proposed Standard f o r F l o a t i n g - P o i n t A r i t h m e t i c , " SIGNUM Newsl e t t e r , October, 1979.

4.

Kahan, W., J. Palmer (1979), "On a Proposed F l o a t i n g - P o i n t Standard," SIGNUM Newsletter, October, 1979.

ACTION

1.

7.

There are a great number of people who deserve r e c o g n i t i o n f o r t h e i r c o n t r i b u t i o n to the 8087. The i n i t i a l a r c h i t e c t u r a l design was the j o i n t work of the author and Bruce Ravenel, r e l y i n q h e a v i l y on the advice of Professor W. Kahan. Robert Koehler made s i g n i f i c a n t c o n t r i b u t i o n s to the systems aspects of the 8087 and Janis Baron was responsible f o r designing the assembly language and implementing the emulator. A great deal of c r e d i t must go to Rafi Nave and his team in I n t e l Israel f o r implementing the 8087 and to Dai-Sun Tsien f o r c a r e f u l l y reviewing and checking the implementation. Perhaps most s i g n i f i c a n t of a l l , we acknowledge the management of I n t e l f o r being w i l l i n g to commit s i g n i f i c a n t resources to both implementation and promotion of a standard f o r r e l i a b l e numeric data processing.

Cxy = ~ x i Y i

STEP

LOAD Yi

5.0

The second example demonstrates how several accumulations can be calculated e f f i c i e n t l y in the 8087. I f we have two sets of data, Xi and Yi, t h a t we want to analyze, we very l i k e l y w i l l want means, standard deviations and c o r r e l a t i o n c o e f f i c i e n t s . We thus want to c a l c u l a t e : Sx = ~ x l 2

6.

The I n t e l 8087 Numeric Data Processor, along with i t s design goals of meeting I n t e l ' s REALMATH standard, and providing increased c a p a b i l i t y , ease of use and performance, has been described. We have attempted to balance safety and u t i l i t y and have provided an unprecendented level of c a p a b i l i t y , accuracy and r e l i a b i l i t y in a math processor.

L : = SQRT (SUM)

My = ~ y j

Add TOP (XT) to Sx and POP

The inner loop of t h i s program has only eleven 8087 i n s t r u c t i o n s and has the same properties of r e l i a b i l i t y and robustness as the f i r s t example. I t is also e f f i c i e n t since the minimum computation and memory addressing is done.

Temporary Real : SUM Long Real : X (I), L SUM : = 0 For I = 1 to N Do SUM : = SUM + X ( I ) * * 2

Mx =~Ex i

5.

'180

5.

Kahan, W. (1972), "A Survey of Error Analysis," Information Processing 71, North Holland Publishing Company, 1214-1239.

6.

Yohe, J. (1973), "Roundings in Floating-Point Arithmetic," IEEE Trans. Computers, Vol. C-22, No. 6, 577-586.

t81

Related Documents

Ndp Nsa Ndp
October 2019 10
P174-p181
May 2020 5
Ndp Facts.docx
May 2020 9
Palmer School
December 2019 26
Janet Palmer
December 2019 37

More Documents from "Angus Davis"