Embedded Systems - Comparative Anatomy And Phisiology

  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Embedded Systems - Comparative Anatomy And Phisiology as PDF for free.

More details

  • Words: 9,540
  • Pages: 31
Florida Institute of Technology College of Computer and Electrical Engineering ECE4551 Computer Architecture Fall 2009

Semester Project Paper: The Comparative “Anatomy” and “Physiology” of Embedded Systems. By David Wurmfeld

ECE4551

Page 1 of 31

03/12/2009 03:10:58

Table of Contents Florida Institute of Technology....................................................................................1 Abstract............................................................................................................................3 Fundamental Anatomy of an Embedded Processor.........................................................4 The programmable FSM: The idea that became a computer:..................................4 ROM and RAM: memory structures making up the skeleton of an embedded processor...................................................................................................................5 The Guts of the Processor: the Program Counter, ALU and Control Unit..............5 Putting it all together, the big picture:......................................................................7 Fundamental Physiology of an Embedded Processor....................................................11 The computer program: a recipe for functionality.................................................11 The Program Counter: more than meets the eye....................................................11 The Control Unit: Conductor, logic wizard and traffic Cop:.................................14 “110001110011”: I dig Computer Baby Talk!.......................................................14 Making Bits Work:.................................................................................................15 Instruction Encoding; how many bits do we need?................................................15 Looking Forward: Tools for us Humans:...............................................................16 Temporary storage: Data Memory.........................................................................17 The Arithmetic Logic Unit: Workhorse of the Embedded Processor....................18 Memory Organization and Program Memory:.......................................................19 Memory Organization and Data Memory:.............................................................21 ................................................................................................................................24 The other “fiddly bits”: Input – Output and additional processor functionality........24 Summary....................................................................................................................28 Embedded Processor questions to test for understanding:.........................................29 Answers......................................................................................................................30

ECE4551

Page 2 of 31

03/12/2009 03:10:58

Abstract An embedded system is a difficult animal to describe. In the general interpretation of the term, an “Embedded System” refers to a dedicated computer used to accomplish a pre-defined task. The term “embedded” usually relates to the encapsulated or contained nature of the device. In the modern vernacular however its meaning is becoming less sharply defined. The current idea of an embedded system is expanding to mean any computer system dedicated to a specific purpose. The computer that is the autopilot on a commercial airliner is considered an “Embedded System”, as is the Windows XP powered console for a medical MRI imaging system. Today, the domain of an embedded system is almost limitless, ranging from a full blown LINUX system deployed on a single VirtexIV FPGAi chip with a PowerPC microprocessor core and custom integrated peripherals to a 4-bit data security chip glued onto the front of a smart bank card. Indeed, today’s embedded systems may not have any “pins” to speak of, they may be pre-compiled “cores” or “software templates” of hardware architectures designed to implement and complement common computer resources. These cores are purely software in nature, describing hardware architecture using a “Hardware Description Language” or HDL, and only take on a physical manifestation when implemented within a particular ASIC or FPGA scheme. These cores are often referred to as Intellectual Property or IP. The domain of the typical embedded system however is dominated by single chip microcontrollers with fewer than a dozen Input-Output (I/O) pins. To understand the scope of embedded systems then it becomes necessary to understand the resources available (chip/core architecture) as well as the tool chain used to exploit those resources. This paper will endeavour to describe, using a top-down approach, the animal that is the Embedded System; its comparative “Anatomy and Physiology” or how the architecture and behaviours differ between three different real architectures, selected by their market share. 2006

2007

3.42, 31%

3.675, 33%

3.8, 30%

4.9, 39%

3.9, 36%

32-bit

16-bit

3.9, 31%

8-bit

32-bit

16-bit

8-bit

Microcontroller Market (in Billions of US Dollars) 2006 vs. 20071 According to WSTS2, the lowly 8-bit microcontroller dominated the microcomputer chip market with monthly sales toping 250 million units per month in 2000, followed 1

www.emittsolutions.com Microcontroller market trend report by Emitt Solutions Inc. ExtremeTech online, http://www.extremetech.com/article2/0,2845,1156706,00.asp, Embedded Processors Part one, September 2009, Quoting WSTS, World Semiconductor Trade Statistics, http://www.wsts.org 2

ECE4551

Page 3 of 31

03/12/2009 03:10:58

by the 4-bit microcontroller at 100 million per month. The so-called “hot” processors, 16 and 32 bit barely pull in 50 million units per month. This paper will confine the domain to 8-bit and 16-bit microcontrollers, concentrating on how they compare with one another, from the 30,000-foot view down to the register level3. In addition to the “anatomy” or architecture of the embedded systems, the “physiology” or behaviour, from high level constructs down to bits in silicon will be outlined and compared. To synthesize the disparate facts and processes into meaningful information, the embedded systems outlined will be compared from a simple performance metric, using a “Gedanken experiment”4 to explore the performance of three hypothetical embedded systems.

Fundamental Anatomy of an Embedded Processor The heart of any embedded system is the computer core driving it. It manages the data flow throughout the system, on chip5 and off chip. This construct constitutes a revolution in logic design. Historically, digital logic has been combined to form meaningful representations of the world, for example, an alarm system could be modelled by representing the doors and windows to be monitored as elements in the design, using registers and states to describe the behaviour when a door or window is opened at the wrong time. These so called Finite State Machines (FSM) were used to create the original embedded systems, with dedicated chips implementing logic functionality (NAND, NOR…) all interconnected pin to pin to accommodate the data flowing into and out of the machine. This dedicated functionality proved cumbersome for anything but complex control systems, custom designed for a single, specific task. Difficult to design, produce and maintain, a different solution was needed.

The programmable FSM: The idea that became a computer: Early in the history of electronic devices the idea of a re-configurable system to use the same hardware to accomplish many different tasks was developed. This so-called “compute-or” idea first took shape in 1936 as the Z1 computer designed by Konrad Zuse6. The first re-configurable or programmable machine, Doctor Zuse is credited with designing and building the first machine to truly solve floating-point problems using binary representation7. With these initial architectures and the technological revolution of the transistor, the physical manifestation of a computer shrunk from room sized behemoths to refrigerator sized boxes down to a single board comprised of only a few chips. Like its predecessors, the computer needed all the elements of a traditional Finite State Machine with a new twist; the ability to change states by following bit patterns; bit patterns found in configurable structures, structures not hard wired into the design. 3

For the purposes of this paper, we are taking as faith the silicon topologies and processes used to implement registers work and are well described in other tomes. 4 Thought Experiment: http://en.wikipedia.org/wiki/Thought_experiment 5 For the initial part of this discussion, we will refer to a chip as the fundamental embedded system building block. Later we will expand that definition to include the concept of a microcontroller “core”. 6 About.com: Inventors - http://inventors.about.com/library/blcoindex.htm 7 Technical Institute of Berlin; http://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.html

ECE4551

Page 4 of 31

03/12/2009 03:10:58

This created the possibility of a new state machine paradigm, one of an Infinite State Machine.

ROM and RAM: memory structures making up the skeleton of an embedded processor. This new topology could “execute” a pre-configured – “programmed” sequence of states based on the contents of two new hardware constructs: read-write volatile data memory and read-mostly program memory. The term “volatile data memory” refers to memory that will not retain its contents after the power is removed. The term “Read mostly memory” refers to memory that can be pre-configured or programmed with a sequence of bits (bits, that we will later see are the patterns that represent computer instructions) that is persistent or non-volatile and will be available after the power has been removed. For the sake of brevity and tradition, non-volatile memory is referred to as ROM8, whereas volatile memory is referred to as RAM9.

The Guts of the Processor: the Program Counter, ALU and Control Unit. Keeping with the comparative anatomy theme, every computer is built using these two memory structures in one form or another. They are used to provide long and shortterm storage for data and instructions. The third element necessary to the operation of the computer is the control unit. It is a multi-function module that controls and synchronizes the flow of data from the outside to the inside, and between memory elements and the outside world. Like a policeman directing traffic, the control unit directs when and where data will move. It also records state information for use by other operations. The key to the control module is the program counter or PC. It is a special purpose register 10 that holds the memory address of the next instruction to be executed. The width in bits of the PC corresponds to the maximum number of instructions that can be addressed by the computer. The number of instructions that can be addressed is 2 n locations. The control module “fetches” the instruction from ROM, “pointed” to by the PC. It then translates the instruction into a sequence of control signals that route the data from and to the appropriate location.

8

ROM is the acronym for Read Only Memory, meaning not writable but readable. In practice, these memories are writable at least once, to configure the memory. Typically, they are implemented using FLASH technology, allowing multiple write cycles using proper programming equipment. 9 RAM is the acronym for Random Access Memory, which is a misnomer as all addressable memory is random access by definition. It traditionally refers to memory that looses all data when the power is off, and is typically of a static or dynamic nature. 10 A “register” is a fixed width, volatile memory element, used to store intermediate information. This “information” may be the next address to execute, or the flag bits used to configure the built in A/D converter.

ECE4551

Page 5 of 31

03/12/2009 03:10:58

The next module important to the operation of the computer is known as the Arithmetic-Logic Unit or ALU. It is responsible for performing various arithmetic operations on the data, like addition or subtraction as well as various logic operations like AND, OR, and NOT. In more sophisticated microcontrollers, the ALU may also include a “perfect exchange register”, where word-oriented operations can be carried out, like “swap bytes”, swapping the upper and lower bytes of the word, or “swap nibbles”; single cycle operations11 that make short work of bit intensive operations. The ALU can also test data for various states for the control unit to record. All ALU operations are directed by the control unit, which is in turn directed by the instructions found in the program memory. The following structures are the building blocks of all embedded computer systems: •









Program memory – read mostly, stores instructions and constant data (data that does not change over time). Non-Volatile, data is retained after the power is turned off. Typically it is organized as an addressable matrix of α x β bytes, where α is the memory address width and represents 2α locations, and β represents 1 or 2 bytes of memory width. Data memory – read/write; stores the results of instructions and state interactions. Volatile, looses all data when power is turned off. Typically it is organized as an addressable matrix of α x β bytes, where α is the memory address width and represents 2α locations, and β represents 1 or 2 bytes of memory width. Program counter – a special purpose volatile memory element (usually a dedicated register) that holds the address of where the processor is in its instruction sequence, usually the address of the next instruction to be “fetched”12 executed. The width of the PC corresponds to the maximum number of instructions the computer can hold. Control unit – A dedicated Finite State Machine (FSM) that takes as its inputs the instruction from the program memory, translating the bit pattern into actions manifested as synchronized control signals and states to the other modules in the computer. Arithmetic-Logic-Unit – A dedicated FSM that takes as inputs control signals and “chunks” of data13 (usually whole bytes or words14) and gives as outputs the results of the operation, in similar chunks of data.

11

It will be seen that embedded processors are “theme” oriented, that is as a motor controller, or communications controller or a sensor controller, and the architecture correspondingly includes “special functionality” (read dedicated registers/operations) that make those features efficient compared to doing it manually in software. 12 The term “fetch” normally associated with a ball and the family dog is a good analogy in this context as the verb describing the action of retrieving the instruction from program memory. It involves looking it up, getting it physically and bringing it back. 13 Historically, the term “data word” is the fundamental width of the registers and program memory native to the microprocessor. In the domain of processors we are outlining, it dependent on the architecture. Wikkipedia: http://en.wikipedia.org/wiki/Word_%28computing%29 14

ECE4551

Page 6 of 31

03/12/2009 03:10:58

The anatomy of any embedded processor is made up of these five modules. It is the way in which these elements are arranged that ultimately describe the behaviour or how a specific controller actually executes a program of instructions. The next section explores the fundamental “physiology” or how it actually works, of an embedded processor.

Putting it all together, the big picture: All embedded processors we encounter have various combinations of RAM, ROM and register resources. Typically, for the domain of embedded processors we are exploring, this memory will reside physically on the silicon from which the “chip15” is constructed.

Figure 1 photomicrograph of physical microcontroller elements on silicon die.16

Other types of embedded processors have enough pins to support accessing memory off chip. Starting out, we will address those topologies that have memory built into the chip. The following figures illustrate three different embedded processors, the ARM-7, the Atmel 89C2051 and the Microchip PIC18F1330 8-bit microcontrollers. Starting out lets look at the “anatomy” of these processors. In Figure 4, the PIC18 processor, it is easy to identify the modules we have described so far, ALU, Program Counter, Control Unit “Instruction Decode and Control”, Program Memory and Data Memory. You will also notice there are many other modules in the processor we haven’t discussed yet but may be able to guess at the

15

The word “chip” is loosely used to mean those devices built from “chips” of silicon wafer, mounted onto a leaded carrier, providing the pins that allow connection to the circuit. 16 Image copyright © © 2009 Micro Control Journal. All rights reserved. (http://www.mcjournal.com/articles/arc105/arc105.htm)

ECE4551

Page 7 of 31

03/12/2009 03:10:58

functionality. Suffice it to say they all support the movement and modification of data, which ultimately is the only purpose of any processor, embedded or not. The other processor illustrations are not so straightforward to interpret. In Figure 2, it is clear what the data and program memory is, the ALU and program counter as well but where is the Control Unit? It is there, just split into several blocks, each illustrating a function the control unit must accomplish, like “PC incrementer”, “program address register” and “stack pointer”. As long as you understand these functions are common to all embedded processors, it is not difficult to interpret the block diagram of any microcontroller.

Figure 2: Atmel 89C2051 Architecture17

The diagram in Figure 3 stretch the “simple block diagram” concept, but with some digging it is possible to catch the “islands of functionality18”. In the CPU diagram, it lays out a stylized arrangement of registers, implying interconnection and the existence of a control unit connecting them all. As this is a model of the CPU core, the memory is not illustrated. The memory is laid out a little differently, both the program and data memory share the same address, but the program counter and flag register are clearly there along with many other, yet to be understood specialized registers. We will get to those later; for now the important idea here to grasp is although it may seem like these 17

Image © 2009, Atmel, http://www.atmel.com/dyn/products/product_card.asp?part_id=1938 “Islands of Functionality” refer to isolated group of registers, FSM and other structures that perform a single job, like the ALU, a timer, or an A/D controller module. 18

ECE4551

Page 8 of 31

03/12/2009 03:10:58

three processors are dramatically different, they actually differ only by the specifics of how the various modules interact, and not so much by the modules they have.

Figure 3: The ARM-7 core architecture

ECE4551

Page 9 of 31

03/12/2009 03:10:58

Figure 4: The Microchip PIC18F1330 Architecture19

To put the embedded processor anatomy tutorial in perspective, keep in mind all embedded processors have one simple goal of existence: In a deterministic manner (meaning repeatable) move and modify data according to a list of instructions. So far we have outlined where data exists and what is used to access and modify it. Now that we can understand what these modules are, from a generic concept to actual examples of real machines, the next big step is to understand how these modules interact. The next question to answer is: Exactly how does an embedded processor execute an instruction? 19

Image © 2009, Microchip, http://www.microchip.com/wwwproducts/Devices.aspx?dDocName=en022957

ECE4551

Page 10 of 31

03/12/2009 03:10:58

Fundamental Physiology of an Embedded Processor Now that we have outlined the structure of the building blocks of an embedded processor, (memory, ALU, Control Unit, PC…) and briefly illustrated some real processors, it is time to describe their inner workings, i.e. how they behave with one another. Once the general ideas of structure (anatomy) and function (physiology) are understood we can proceed to take a comparative look at how examples of actual embedded processors work performing similar tasks.

The computer program: a recipe for functionality. Although trivial in concept, it bears repeating. A computer “follows” a list of “instructions”, starting at the “beginning” and “executing” each instruction until the “end” of the recipe or program. From our previous glimpse into the anatomy of an embedded processor, we know the program memory (ROM) stores the program instructions in an addressable matrix of bytes or words. The Program Counter (PC) has the vague job of “knowing” which instruction to execute. It is the job of the control unit to know where to start the program (the beginning), fetch, decode and execute the instruction and as long as it isn’t the last instruction (the end of the program), advance to the next instruction, execute it and so on…

The Program Counter: more than meets the eye. In Figure 5 we see our first look at a generic computer, complete with program memory, control unit and program counter. The program counter is connected to the program memory via the program memory address bus. The output of the program memory goes directly into the control unit. The program counter is also connected to the control unit. At any one moment in time, any computer is in the middle of a finite set of cycles, performing mundane tasks like: • • • •

Calculate the address of the next instruction Load the address of the next instruction in the program counter Enable the program memory to use the address the PC is presenting Get the value of the memory location (Fetch the instruction)

For the time being we will focus on these simple but vital tasks. Somehow, the control unit is “smart” enough to know what the next address is to fetch the next instruction. The key to understanding how something works is to “walk a mile in its shoes” as it were, to follow it step by step as it does its job. Let’s consider a simple scenario with two questions. What actually happens when an embedded processor20 is powered up and how does the control unit orchestrate these events? As we are dealing with events that take place in time, it is traditional to illustrate these event relationships that happen in time or in synchrony with a “waveform” chart. A 20

From this point forward, the term “processor” or “embedded processor” or “computer” will all refer to the same thing.

ECE4551

Page 11 of 31

03/12/2009 03:10:58

well-organized chart can illustrate in a single picture what would take pages of text to describe. The following is a generic power up sequence that could apply to almost any embedded processor. It is organized as several rows, each representing a particular signal as it changes in the time domain. The “signal” may represent an actual voltage, or a logical state, for example 0V to 3.3V, or “asserted” or “not asserted”21. The row images of the waveform are linked in time, that is they all start at the same time, and important events are usually labelled. In this example, the first row represents a logic condition of power being applied, (the lower line illustrates the zero or off condition, and the upper line represents the high or on condition) rather than its actual voltage value(s). Important here is the idea that not all signals are valid at all times. The first few hundred microseconds of life: the power up timing waveform. POWER                      power   stable           RESET                              Start       Reset CLOCK                           stable          clock ADDRESS                                  DATA INST                                    Cont FETCH    Fetch 1st Instructionf Power on sequence: 1. Power is applied to the chip (The beginning of time as the chip sees it) 2. The reset signal is asserted, holding the chip in a reset state. 3. In the reset state, nothing happens within the computer, but the computer cycle clocks start oscillating and everything is poised, just waiting for the reset to be released. This is one of the most important times for a computer, without it, the control unit and program counter would be in unknown states22, and could cause the computer to go haywire, not knowing what state it is in, or where to go next. 21

It is more accurate to use the term “asserted/not asserted” to indicate the value of a particular state. “1 or 0”, or “true” or “false” can all imply an implementation of a state. A logic 0 may be represented by anything less than 0.9 VDC in a 3.3V system, and represents the asserted or enabled state of a processor reset signal, which would be logically “true” for its value. 22 By unknown state, consider what is physically happening in time when power is first applied to a transistor circuit. This all happens on the time scale of pico and nanoseconds, but when dozens of transistors are linked together, it can take hundreds or thousands of nanoseconds to settle down into a known state

ECE4551

Page 12 of 31

03/12/2009 03:10:58

Putting the computer in the reset condition gives the processor the time it needs to turn on and set up the physical transistors that make up the hardware to known conditions, subsequently initializing the control unit to a known state. 4. Some time after the power is applied and stable (that is, within the operating range of the processor), the system clock(s) have started and are stable, the reset signal is released, and the control unit starts from its initial state. All this happens in a short period of time to us (2 – 5 milliseconds), a lifetime to a processor that can execute a half a million instructions in a second. 5. The control unit loads a pre-defined address (processor dependent) into the program counter and fetches the first instruction from program memory. The word “fetch12” is often used to describe this control unit cycle, and can be summarized with the following steps: a. Start the fetch cycle: b. Using a FSM, assemble the pieces that will make up the next instruction address. In the case of the first instruction location, no calculation is necessary, it is a fixed location, just use that address. c. On the appropriate system clock edge23, logic OR the address pieces24 together and load the assembled new address into the program counter. d. Enable the new instruction address from the program counter onto the program memory address bus. e. Synchronized with the appropriate clock, and after there has been enough time for the address to be stable and valid, (the last thing you want is the address changing while you are trying to read an instruction from program memory) enable the program memory output onto the instruction bus. f. Synchronized with the appropriate system clock, read the instruction into a holding register within the control unit. g. Set the increment next instruction address increment value to be added to the current instruction address when the next fetch cycle starts. h. Fetch cycle complete, the control unit now has fetched the instruction from the program memory. 6. The control unit is ready to decode the instruction fetched, execute it and start the fetch cycle all over again.

23

For this overview, we are playing fast and loose with the necessity of system synchronism. Assume on faith that every processor cycle that is executed is done in time and in sync with a clock, or clocks, or portions of a clock to insure the data is taken or arrives where it belongs when it is valid to do so. 24 The “pieces referred to will be described in detail later, suffice it for now the pieces may be an offset from the current location and the previous location, along with any increment pending.

ECE4551

Page 13 of 31

03/12/2009 03:10:58

Program Memory

Buses

CU

Data Memory

PC

ALU I/O Port

Figure 5 Control Interconnections

As you can see, there is more to the program counter than meets the eye; it is the signpost the entire computer uses to keep track of where it is in the instruction sequence. It is a lot more than just a simple placeholder; it is an integral player in the instruction fetch cycle as well as an essential element for proper program execution.

The Control Unit: Conductor, logic wizard and traffic Cop: Up to this point we have glossed over many inner workings of the computer, limiting our focus on just how does the computer know what instruction to do next. Do not loose sight of the big picture: There is a program stored in instruction memory (ROM) This ROM holds the instruction sequence that is the program to be “executed”, whatever that means. The control unit is stuffing the program counter with addresses, fetching instructions and doing something with them. The next part of our discussion of embedded processor “physiology” is how the control unit “knows” what to do with the instruction it fetched from the program memory. The Control Unit is the very heart of any embedded processor. Ultimately it is responsible for knowing what instruction to fetch next, how to fetch it, set up for the next instruction, decode and execute the instruction just fetched. It is a relatively complex FSM designed specifically to control the inner workings of the computer according to basic cycle specifications like the fetch cycle mentioned previously, or in real time by decoding the cycle information contained with the instruction.

“110001110011”: I dig Computer Baby Talk! So far, this tutorial has been pretty fast with the information; it is kind of like trying to drink water from a fire hose, possible, but a lot will spill out! So far, what do we know about the inner workings of a computer? ECE4551

Page 14 of 31

03/12/2009 03:10:58

• • •

The computer works by executing instructions in sequence. The program memory stores the instructions to be executed. The control unit, in conjunction with the program counter can fetch instructions from program memory, decode and execute them, whatever that means.

So far so good, before we continue, let’s do a little Boolean algebra review. Recall that the number of permutations a particular binary number has is equal to 2 n where n is the number of bits in the binary number. For example, if we had a 4 digit binary number it has 24 or 16 possible combinations. An 8-bit byte has 28 or 256 possible combinations. This organization is used extensively in computers to allow us to select one from many, or address one memory location from the tens of thousands of memory locations available to us. Like the ubiquitous “Apartment Number” analogy, for every memory location, there is a unique address, just as there is a unique physical address or number for every apartment.

Making Bits Work: Remember the alarm example, where the finite state machine modelled the windows and doors of the house as bit locations to encode the physical world into a digital representation of that world? The dedicated finite state machine circuits decode the binary bit locations to determine what door or window was opened. This is what is being done with the so-called “instructions”, bit patterns are being used to represent places and actions we wish the computer to access or execute. Now we can say it; when a computer “executes” an instruction, it means that particular instruction has a physical meaning associated to its unique bit pattern. That meaning is used to enable the sequence of events that is required to “execute” the meaning or command.

Instruction Encoding; how many bits do we need? This is exactly how the control unit “knows” what to do with the instruction it fetched previously. Each instruction contains an encoded portion indicating what to do, who to do it to and what to do it with. This is a lot to ask a few bits to do. As we will see later when we compare processors, some only have enough bits to encode 64 instructions or 6 bits (what to do) and 8 bits of location or actual data information (what/who to do it with/to). That adds up to an instruction word that is 14 bits wide, implying the program data memory better be at least 14 bits wide. Time for a real example; let’s say our embedded processor has an instruction called “Add”. Its function is to cause the contents of some register (lets call it “a”) to be added with the fixed value 0x14 and the results stored back in register a, wherever that is. The operation code (opcode25) for this instruction could be “110001” in binary, and the fixed (immediate) value might be 0x1426, “00010100”. The entire 25

See definition: http://en.wikipedia.org/wiki/Opcode The traditional prefix for a hexadecimal number is the two character pair “0x”. Each hexadecimal digit is four bits wide, thus having 16 values, from 0 (0000) to F (1111). http://en.wikipedia.org/wiki/Hexadecimal 26

ECE4551

Page 15 of 31

03/12/2009 03:10:58

instruction data would be the concatenation (joining together) of the opcode and the immediate data, for a complete instruction “word” of: “11000100010100”. The control unit is expert at this kind of binary “computer baby talk” and decodes the first six digits as the action to add the number represented by the last 8 digits to some register called ‘a’, wherever that is. Unfortunately, for most of us humans, it is tedious to impossible to manage lists of binary strings representing instructions and data. Some sort of help was needed for our simple minds to handle binary instructions.

Looking Forward: Tools for us Humans: The time has come to peek ahead and introduce the concept of mnemonics, the mapping of a human understandable memory aid onto literal computer constructs27. The trick is to come up with a pseudo language of sorts, with descriptive verbs and nouns representing the operations we would like to perform. Which of the following identical statements is easier to understand? “11000100010100” - or - “add a,#0x14” The first string28 of binary digits is traditionally called “machine language” (computer baby talk) and the second statement is called “assembly language”, a pseudo English patois of suggestive verbs and nouns loosely cobbled together to garner meaning. Here is the beginning of what is called the “Tool Chain”, a very important concept in understanding how computers work. There are tools (actually applications that run on a separate development computer system) that help us translate language a human can understand into machine language a computer can execute; the actual, physical binary pattern stored in program memory. In this simple example, we would create a program using a stand alone text editor or an editor within an IDE29 containing among other things the “add” statement above and use that human readable text file as the input to an application called an “assembler”. The assembler interprets the assembly language “source code” into the appropriate bit pattern. To complete the chain, that bit pattern is then combined with other bit patterns to form an executable bit image. This bit image is then “programmed” or “burned30” into the computer program memory ROM. We will be discussing the tool chain concept in more detail later. At this point in our tutorial accept it on faith that there is indeed a way that humans can create programs that ultimately physically are manifested as bit patterns or instructions inside the processor, ready to execute when the power is turned on. 27

Derived from Wikipedia definition: http://en.wikipedia.org/wiki/Mnemonic Be careful, this is not a binary number; it is a composite representation of opcode and data. 29 “Integrated Development Environment”, a computer application that streamlines the creation of computer programs by integrating the editor, compiler, assembler and linker into a single user interface. 30 “Burning” a ROM is a throw back to when physical metal fuses integral to the memory were burned away using a high current pulse, permanently setting the state for that memory location. The specific mechanisms for memory is beyond the scope of this paper; See http://www.howstuffworks.com/rom.htm/printable for more details. 28

ECE4551

Page 16 of 31

03/12/2009 03:10:58

Figure 6 Simple Tool Chain, Assembly to Bits in Computer

Temporary storage: Data Memory. So far we have explored how the control unit fetches an instruction from program memory and decodes it to perform some task. What is missing is, where does the control unit store temporary data? (Remember register ‘a’?) Not everything is known when the program is compiled into machine language, for example, if the embedded system was a thermostat, or an alarm controller, where does the computer store the current temperature? Where does the computer store what doors are closed? This is the job for Data Memory, a read/write volatile memory that the control unit can use to save intermediate results, or just about anything that can and does change with time. In the strictest sense, the program counter is an example of this type of memory; when the power goes away, the data is lost. Other modules rely on this kind of memory; the ALU uses temporary storage as scratch pad memory to hold intermediate results. The control unit relies on temporary data storage to keep track of the current state of the computer. Data memory is an integral part of the computer architecture, and it is necessary to understand how it interacts with the other modules we have seen so far.

Program Memory

Buses

CU

Data Memory

PC

ALU I/O Port

As you can see from the illustration above, most computer architectures have at least some sort of program memory, data memory, program counter and control unit. We are almost ready to start looking at particular embedded processors. It is first necessary to understand the relationship between the program memory, data memory and dedicated volatile memory elements or registers.

ECE4551

Page 17 of 31

03/12/2009 03:10:58

Before continuing, let’s review the concept of a “register”. It is nothing more or less than an ordered set of bit(s), not unlike a memory location, that can “hold” a bit pattern. A register may be 1 or 256 bits wide. It might hold a single bit from the overflow of a binary addition, or the four bit value that points to a portion of memory called a “file”. Whatever it holds, and however wide it is, it is a volatile memory element, usually “controlled” (set, reset, read…) by the control unit. Some embedded processors for example, have a special dedicated register for everything. This is where the uniqueness of a processor manifests itself; how the functions and data are organized physically on the processor. Registers may be generalpurpose scratch pad to hold any value (say the intermediate result of a logic operation) or a special function register to hold a binary value that corresponds to the artefacts of the last instruction31. As embedded processors contain more and more functionality (timers, serial ports, A/D converters…) it is necessary to have volatile memory elements to keep track of all their settings and status. In some processors, there are over two-dozen separate special function registers just for this purpose. Here is where the similarities end and individual architectures begin to diverge from the generic model. How does the computer organize the needs of program memory, data memory and special function registers? Keep that question close to mind as we continue our exploration of the computer’s last generic element, the ALU.

The Arithmetic Logic Unit: Workhorse of the Embedded Processor. Up to this point we have hinted that there is a module that is used to do math and logic operations managed by the control unit. Indeed, inside every embedded processor is an ALU that can take two or more operands, perform an operation on them (addition, subtraction, multiplication, AND/OR/NOT…) storing the result in data memory or a purpose built register, updating the artefacts associated with that operation; negative, zero, overflow, underflow, divide by zero…)

Figure 7 Simple ALU 31

An “artefact” is a processor state that may change when an operation is performed. For example, if the result of a math operation is negative, a bit or “flag” could be set as an artefact of that operation. There are various flags or artefacts that are updated after each instruction by the control unit.

ECE4551

Page 18 of 31

03/12/2009 03:10:59

The ALU itself is a special purpose finite state machine designed to take control signals and operands and perform the function called out by the control unit. Its features include the operations it can perform, the size of the operations it can handle and the size of the results it can produce. Consider the addition of two 8-bit numbers. The sum could be larger than an 8-bit number can handle so the ALU must be able to accommodate that possibility. The ALU must also be able to provide some sort of floating point functionality (or at least mechanisms to support such operations), usually incorporated as partially hardware, partially custom math libraries for that processor.32 The full complement of math operations take up a lot of processor real estate and compromises need to be made to get the maximum functionality in the minimum space with the best performance possible. It is possible to multiply or divide any two numbers using successive additions (or subtractions) but that would take a long time. Time then would be the compromise over the real estate33 needed to have a hardware multiplier integral to the ALU.

Memory Organization and Program Memory: However you slice it34, every embedded controller operates on two types of memory; the memory that stores instructions (program memory) and the memory that temporarily hold values (data memory). In the previous real chip examples, we can see the PIC18 and AT89 both have separate, distinct memory areas, with separate data and address lines as opposed to the R8C architecture, where the program and data memory is logically one monolithic block, with one common address and one common data bus. It is not the intention of this paper to compare and contrast the ramifications of this level of architectural choices35; we will however explore what they are and how they are used. Recall that it is the control unit that calculates the address for the next instruction to fetch. In the case of the PIC18, this is an address that can accommodate a maximum of 8192 memory locations. As a review, how many bits are needed (the minimum 32

It is mathematically possible to do any math operation just using two bits and a lot of RAM, it would just tale a lot of instructions to orchestrate even a simple 16 bit integer addition. On the other hand, you could dedicate three separate registers, two 16 bit and one 32 bit register to hold the operands and the sum respectfully. Controller architecture is a balance of what space you physically have and what operations can be done in software. 33 It physically takes space in silicon to do anything, store a bit or make a control unit. Each processor designer is faced with the problem of trying to find space for everything the marketing people want in the new version. Compromises are made in performance or size (and power) when design decisions (architectures) are made. 34 Off-hand homage to the so-called bit-slicers of old. 35 As is often the case, the choice of one particular architecture over another has “religious” implications, with each ideology having its priests, each believing in their brand of “the truth”. More often than not, the choice of processor is cost, or number of pins, or “what chip did we use last?” or “how much do the tools cost?” and not some idealized architecture philosophy.

ECE4551

Page 19 of 31

03/12/2009 03:10:59

number) to uniquely address every location in the program memory?36 As it turns out, the internal modules in the PIC18 family of processors are almost identical, and while the PIC18F1330 may have a program memory depth of 8192 locations, each 16 bits wide, the “flagship” of the PIC18 family, the PIC18F8722 has up to 128 kBytes (64k words) of program memory. This common control structure is not by accident. It is to insure instructions written for the least capable member of the family will work on the most capable member. In fact, the address latch for the program memory is a full 20-bits wide, allowing up to 220 or one million memory locations (can we expect future versions of the PIC18 family with more program memory?). For our PIC18 however, we have more than enough address bits to accommodate the 8192 locations (16 kBytes) of instructions and constants37. As we mentioned earlier, the Program counter is much more than a simple register, in this example it is almost a mini ALU in the operations it can perform to assemble the correct address for the next instruction. Keeping with the program memory theme, look at the Atmel 89C2051 memory. Although difficult to read from the simple block diagram, the literature specifies the program memory to be byte wide, with 16 address bits for a maximum of 65532 locations, (64 k, 1k = 1024 locations) With our variant, the 89C2015 has 2k of program memory or 2048 locations, each one byte wide. Consider this simple fact for a moment. Each program memory access of a PIC18F1330 processor returns 16 bits. Each program memory access of the 89C2015 returns half as much data. If both processors are running at the same speed, which one moves more data per unit time? We can’t answer that right now, but keep it in mind when we compare performance between our three embedded processors. To complete the program memory tour, the ARM-7 processor addresses 32-bit wide program memory, unlike the PIC18, which uses a 16-bit address. Very similar processors, but different approaches to how the program instructions are addressed.

36

2n = 8192, log2(8192) = n = 13 bits

37

We include “and constants” on purpose when describing “program memory” as it is the ideal place to store values that are known when the code is assembled, and would overwhelm the limited data memory space. This convenience however comes at a cost, as we will soon see.

ECE4551

Page 20 of 31

03/12/2009 03:10:59

Figure 8: Microchip PIC18F8722

Memory Organization and Data Memory: Previously we explored the anatomy of controllers, describing the blocks or modules that made up the controller. These blocks are indeed islands of functionality, for the most part independent of each other. That means while the control unit is “fetching” the next instruction, the ALU can be logically OR’ing two values, sending the results into a register, and the timer module (described later) can be counting down; all at the same time, all on the same chip of silicon. This ability begs to be used in a parallel fashion, and not simply in the serial “follow the recipe” concept of a computer program. The tricky part is, how does the computer keep track of all these independent operations? More on that later. Going back to our real life examples, consider the monolithic memory architecture of the ARM-7. To fetch an instruction, the control unit updates the program counter, then

ECE4551

Page 21 of 31

03/12/2009 03:10:59

places that address on the address bus. Some time later it reads the instruction from the ROM then executes it. If there are data values to be stored, it places the destination address on the same memory bus, and some time later writes it into the RAM. In this simple example, there is some “dead time”, that time between subsequent operations that could have been used doing things in parallel.

Figure 9 "Princeton" (Von Neumann's) Architecture

Figure 10 "Harvard" Architecture

Look at the Figure 4, on the PIC18 architecture; the program memory has its own address and data busses, distinct and separate from the data memory. This means the control unit can be building the next instruction to be fetched while it is decoding the current instruction, and at the same time writing the results of the last ALU operation into the data memory. The level of parallelism is common to architectures that have separate program and data memory spaces. The figures above illustrate the basic memory architecture differences, the main idea to keep in mind is in the “Princeton” architecture, there is one Program/Data memory address and data bus, tying the program and data memory logically together. In the “Harvard” architecture (hmmm, I

ECE4551

Page 22 of 31

03/12/2009 03:10:59

wonder where they were invented…) the program and data memory are separate and distinct, each with its own address and data busses. Is one architecture better than another? It is a difficult question to answer without knowing the actual application. In a simple case when the computer is reading a thermostat, checking it against a pre-set temperature and deciding if the heat should be turned on, it doesn’t take many cycles to accomplish, and a monolithic memory may be just fine. If more performance were required, a faster processor could be used. If however, many functions were being handled at the same time, like reading the buttons on a USB Nintendo™ controller, processing 8 buttons as well as the angles of the left/right joysticks and sending the information back to the Play station in time to keep from being sliced in half by the Org you are battling, it may be better to have a microcontroller that can do as much as possible at the same time. The organization or architecture of the computer includes how the various volatile memory elements: registers: are organized and controlled. Some architectures use individual, separate registers for everything: see Figure 11. An alternative to having separate physical registers is the model used by the RC8 and PIC18, the registers for the whole computer are contained in data memory as a set of registers, addressed like any other memory element and often organized as “files” or “blocks” of memory. This significantly reduces the complexity of the control unit while maintaining the flexibility of added functionality. For example, consider two processors from the same family; the PIC18F8722 and the PIC18F1330. Using the same register file, architecture, it is possible to accommodate the five timers and 12 A/D modules using the same control unit that the PIC18F1330 uses to maintain 2 timers and 5 A/D modules.

ECE4551

Page 23 of 31

03/12/2009 03:10:59

Figure 11 PPC 405 Core Programmers model copyright 2006, IBM

The other “fiddly bits”: Input – Output and additional processor functionality. To round out the inner workings of the data memory, it is important to consider how our computer actually moves data into and out of the processor chip. In most architecture, an I/O pin is a simple RAM element, usually organized as a register of individually accessible bits that is on the data memory bus. To perform an output operation, the control unit asserts the direction control so the pin will be electrically an output pin. Figure 12 illustrates one bit of the I/O port for the PIC18. The simple port picture illustrated in Figure 4 glosses over the real work the control unit does in orchestrating an I/O operation. This is further complicated when the I/O pin can be an analogue input pin as well as a digital I/O pin.

ECE4551

Page 24 of 31

03/12/2009 03:10:59

Figure 12 Generic PIC18 I/O pin

Figure 13 Digital & Analogue I/O pin

In addition to input and output functionality, most microcontrollers these days have some sort of built in timer capability. These timer modules operate independently once set and started, and provide a much-needed function to count events or time to be used in an embedded application. Again, there is a trade off between chip real-estate and software overhead. Any timer function can be implemented in software using loops and tests, at the cost of having to execute in linear time with the program. No matter how fast a program is it can only be doing one thing at a time. Implementing a timer in hardware however, relieves the burden of maintaining a count.

ECE4551

Page 25 of 31

03/12/2009 03:10:59

Real world timers do more than just count; here the ARM-7 has two, 32-bit timers38, with selectable input count frequencies (multiples of the master clock) as well as input from I/O pins (to count events happening on a pin). It is a simple matter to preload a 16-bit timer39 and count from 0x0000 to 0xffff. If the master 32 MHz clock was used as the count input then the timer would roll over or overflow40 every 2 or so milliseconds. If the fc32 divider is used, then it would overflow every 64 milliseconds. Timers, I/O ports, A/D modules, indeed most if not all computer special function modules need registers to hold configuration parameters (in the case of the timer it may be the flag bit that controls if the timer restarts after an overflow). As we saw in the PIC18, these registers are part of the computer data memory area. In the case of the PPC405, they are individual registers, peppered all over the die. Figure 15 illustrates what is arguably the quintessence of microcontroller technology to date, the Philips LPC2114. Using what we have learned, let’s examine this “animal” closely. The key to evaluating a microcontroller is in answering fundamental structure (anatomy) questions first, and then if the structure is appropriate to the task, then take a closer look at the functionality (physiology) of the beast. First the “bones”; how is the Program counter, Control Unit and ALU arranged? How does the chip get/put data to the outside world? What other goodies are available (timers, USARTS, A/D…)? At first blush, the block diagram of the LPC2114 seems to be missing many essential elements. They are there; it just takes a little digging. This chip represents a new trend in embedded processors, that is a common core of functionality, surrounded by the I/O that makes that particular processor special. Here the core is illustrated in the block named “ARM7-DTMI”. The actual “guts” of the core is illustrated in Figure 14. The Program Counter is implemented by two elements, the “Address Incrementer” and the “Address Register”, which makes sense considering how the program counter normally functions. The other basic processor elements are present, The ALU, the Control Unit, and a data register. Some interesting additions are a 32 x 8 hardware multiplier, and a barrel shifter interfaced with the ALU. These could help floating point operations by speeding up in hardware common math tasks. (Remember the real-estate – performance trade-off discussion? Clearly, this chip is built with the building blocks of speed.) Another interesting element is the 31 x 32 bit register bank, a good place to hold intermediate results, or condition flags; at this point in the investigation it is not clear, but should not come as a surprise when looking at the “physiology” of this beast to discover there are such register locations in the bank.

38

Philips LPC2124 Datasheet. Timers are characterized by how many bits are used to “count” with. 40 A timer typically starts at some preloaded value and count up, one bit at a time until the maximum value is reached. Then, depending on the mode used, a “overflow” flag is set, the initial value reloaded and the cycle started all over again. 39

ECE4551

Page 26 of 31

03/12/2009 03:10:59

This cursory look at the ARM-7 core should convince you that it has the infrastructure to be a hot, 32 bit processor. Now, lets have a look at the I/O.

Figure 14 ARM-7 Core

Looking at Figure 15, we can now see where the program and data memory live, it is a monolithic block, with a single address and data bus. The I/O however, has their own address/data bus, an interesting combination, and should make the I/O modules somewhat independent of the ARM-7core. In this initial look, it is clear this chip has been designed to accommodate almost any I/O scenario you would encounter with an embedded processor.

ECE4551

Page 27 of 31

03/12/2009 03:10:59

Figure 15 ARM-7 (Philips LPC2114) Block Diagram

Summary. In this paper we have explored how any embedded processor is made up of the same elements, and each individual element has basically the same behaviour. These elements are mixed and matched to create the animal known as an embedded processor, and along with the software applications in the tool chain, make up a development environment the designer can use to solve real world problems using embedded processors. The heart of the embedded processor is the concept of an “infinite state machine”, that I, dedicated hardware that can reconfigure functionality by following a set of instructions. This innovation has enabled designers to move away from the direct hardware manipulating bits to looking at problems from a modular perspective. As important as the hardware advances is the tool chain used to design the control software, the ultimate arbiter of any embedded system.

ECE4551

Page 28 of 31

03/12/2009 03:10:59

Embedded Processor questions to test for understanding: 1. What is the difference between a Finite State Machine and an embedded processor doing the same task? 2. How do you physically get the program into the embedded processor? 3. What is the first thing the embedded processor does when the power is turned on? 4. What is the difference between an 8-bit processor and a 32-bit processor? 5. What are the steps needed to translate the following text into machine code? int a = 5; int b = 3; if(a > b) a = a+b; else a = b;

ECE4551

Page 29 of 31

03/12/2009 03:10:59

Answers 1. The FSM has one task it is designed to perform; the embedded processor can be programmed to do an infinite number of tasks. 2. The embedded processor is connected to a chip programmer, either in the circuit (so called in circuit programming) or as individual chips. The machine code file is used by the programmer to change the instruction memory on the chip. 3. At the beginning of time, all embedded processors are held in a reset state, allowing all the internal circuitry a chance to start in a known state. 4. Generally, the difference is in the internal representation of data. In an 8-bit machine, data is moved one byte at a time. In a 32-bot machine, data is moved (in the same time frame) 4 bytes at a time. 5. The text is input to a compiler that translates the Human readable code (in this case “C”) The result of the compiler is assembly code, the processor specific human readable version. Finally, that assembly code is fed into an assembler – linker tool that produces machine readable code, specific to that individual processor.

ECE4551

Page 30 of 31

03/12/2009 03:10:59

i

Fundamentals of FPGAs: http://www.techonline.com/learning/course/210605004

Related Documents