Multi-tasking Concepts- Ht, Dual Core, & Multi-processor

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Multi-tasking Concepts- Ht, Dual Core, & Multi-processor as PDF for free.

More details

  • Words: 4,016
  • Pages: 11
Multi-tasking Concepts: HT, Dual Core, & Multi-Processor

If you’ve been around computers for some time, you can probably remember back to March 2000 when Intel formally introduced the gigahertz processor to the world. Touted as the "world's highest performance microprocessor PC" (as per Intel), it was unbelievably speedy at the time. The 'gigahertz' became a part of common terminology. Just look for it like it’s a product review for performance. Then the processor became faster…faster… and faster. 2Ghz? No problem. 3Ghz? Here you go. Inevitably, we closed in on a 'speed barrier'– a point where it is no longer possible to crank up the speed due to one of many conditions. Because producing newer better processors with the demanding market is essential, the manufacturers were forced to think of other improvements that will give the consumers an incentive to purchase the newer hardware. In came an emphasis on multi-tasking. Chances are, you've probably heard about the new dual core processors. The corporate giants, Intel and AMD, have shifted their emphasis on multitasking using dual core as the primary driving force. However, the concept of computer multi-tasking has been with us for quite a while and it is not limited to dual-core. Multi-processor systems in servers and high-end workstations as well as Intel’s coveted Hyper Threading Technology have been with us for many years. Three technologies, one concept/goal. You may wonder, what’s the difference? Let's take a step back– what exactly is multi-tasking? What do you do when you can multi-task? At the most basic level, it is obviously the ability to do two or more things simultaneously. You are multi-tasking right now – reading this article, and breathing. Likewise, in the more specific picture of processors, multitasking is the ability for the processor to execute (or seem to execute – more on that later) more than one task simultaneously. With that ability, you can listen to your favorite soundtracks, and finish your meeting presentations, and surf the web all at the same time. For most, it would be very hard to live without the ability to multi-tasking on the computer. Yet, this multi-tasking was exactly what the first computers lacked. The original processors could only uni-task. If any peripheral needed to be accessed, the CPU had to come to a grinding halt and allow the peripheral to respond and report back to the processor. This was excruciatingly slow – as with all technological concepts, they improved. Most mainstream computer users probably have a single core single processor inside the case. Yet, you can still do more than one task at the same time with very little, if any, drops in speed. The processor seems to be processing two sets of code at the same time. Allow me to let you in on a little secret – it isn't! A typical single-core singleprocessor cannot process more than one line of code at any given instant. The

processor is quickly switching from one task to the next, creating an illusion that each task is being processed simultaneously. There are two ways basic methods that this illusion is created: Cooperative multitasking and Pre-emptive multitasking. Cooperative Multitasking When one task is already occupying the processor, a wait line is formed for other tasks that also need to use the CPU. Each application is programmed such that after a certain amount of cycles, the program would step down and allow other tasks their processor time. However, this cooperation schema is rather outdated in its use and is hampered by its limitations. Programmers were free to decide if and when their application would surrender CPU time. In the perfect world, every program would respect the other programs running alongside it, but unfortunately, this is not the perfect world. Even when Program A could be using CPU cycles while Program B and Program C are waiting in line, there was no way to stop Program A unless it voluntarily stepped down. As a result of these shortcomings, cooperative multitasking was retired with the release of Windows 95. Apple's Macintosh OS used cooperative multitasking in every operating system release up to the Mac OS 9 because Apple was able to control many of the programs that were loaded onto its systems. Starting OS X, Apple decided to follow everyone else with the new kid on the block – Pre-emptive multitasking. Pre-emptive Multitasking: The inefficiency of cooperative multitasking left the computer industry scrambling for different ideas. Finally, a new standard called pre-emptive multitasking took form. In pre-emptive multitasking, the system has the power to halt or "pre-empt" one code from hogging CPU time. After forcing an interrupt, the control is at the hands of the Operating System, which can appropriately hand CPU cycle/time to another task. Inconvenient interrupt timing is the greatest drawback of pre-emptive multitasking. But in the end, it is better that all your programs see some CPU time rather than having a single program work negligibly faster. Most current systems utilize pre-emptive multitasking including Windows XP and Apple Mac OS X. Pre-emptive and cooperative multitasking are, as mentioned, "illusional" multitasking. There are processors that can physically address two streams of data simultaneously, and these technologies are dual/multi processor, dual/multi core, and simultaneous multi-threading (Intel's Hyper Threading). Before we delve into these technologies, I want to make a few terms clear. I define them here since I will constantly refer to these terms. • • •



Thread: An order of tasks and instructions that the processor executes/processes. Execution Core: The actual part of the processor that is doing the processing. Registers: A very fast memory access to the processor core which stores frequently used values for the CPU. It resides within the actual processor. It acts as the work desk and the hands of the processor. There are many types of registers (FPR, GPR, Data Registers, etc.) but for the purpose of this article, you'll only need to know what they do. On-die/Onboard cache: Refers to the very fast Static RAM (SRAM) that is built into the processor chip. Since it is part of the physical processor, access speeds are faster than the normal system RAM. The onboard cache is the first pool of data which the processor accesses. They are often denoted by the level in which they are accessed. Level one cache is the fastest and the first pool of data accessed.





Level two cache is the second, and so on. I will refer to "cache" as level two cache because the level one cache is very small and rather insignificant. Latency: In short, it's the initiation response time. In the context of this article, latency will describe the difference in time between when the data is requested and when the memory will be able to respond to send the data across the bus. Front Side Bus: The distance between the processor and the memory. If the data the CPU is looking for is not found in the onboard cache (a "cache miss"), the processor looks at the RAM for data. The faster the FSB is, the faster the processor can access data.

Simplified Image of the RAM-CPU

Two processors can easily chew two streams of data treating each one as if it's the processor's only task. Each stream of data or "thread" has its own on-die cache, its own set of pipelines, and most importantly, its own execution core. As is today, dual/multiprocessor systems are mainly seen in high-end workstations and servers. The sheer processing power that it possesses is optimal in processor-intensive systems. In order to take advantage of multi-processor configuration power, the application must be able to stream multiple threads. A good number of these multi-threaded applications tend to be multimedia and graphical. In CPU-intensive tasks, such as 3D rendering and graphics editing, splitting the program into two or more threads of data and having them processed independently of one another would theoretically cut the processing time to either the number of threads that are exported or the number of processors there are. Notice how I said "theoretically" - this would not occur in any real-life situation, but certainly there would be a very noticeable speed increase. However, if an application is not multi-threaded, then only one of the processor could be effectively used. So for example, if a dual Intel Xeon configuration at 2.8 Ghz clock speed were used for a single threaded program (such as many games out there), the setup would act as if only one 2.8Ghz Xeon processor were present. "Two processors always run faster than one processor" is a false statement. Naturally, multi processor capable hardware is very expensive as compared to the normal mainstream components. Some of the costs involved with owning a multiprocessor system include:

• • •

Initial cost of the Processors Need for specific expensive registered memory Power Supply and electrical bill

Furthermore, single multi-processor capable server-line processors tend to be slower than the mainstream desktop line processor at equivalent clock speeds. Processors made specifically for servers and high-end workstations, such as the AMD Opterons or the Intel Xeons, tend to be more conservative on performance speed. Servers need to be reliable. They frequently use registered ECC (Error correcting code) RAM, which cost more, which are slower, but very rarely go wrong. However, when these processors work together, they can provide massive processing power. There are two main types of multi-processor architectures - Symmetrical Multi Processing and Non-Uniform Memory Architecture. Symmetrical Multi Processing (SMP) SMP is the most commonly used type of multi-processor architecture and it is based on a simple concept. The processors share a common front side bus from which to collect and export data. SMP is most commonly found on the x86-based architecture, which includes the Pentium Pro P-6 based processor (Pentium II, Pentium III line) as well as AMD Athlon MP and select Opteron Processors. Due to low costs, lower-scale systems use SMP more commonly than high-end large-scale servers/clusters.

Depiction of SMP

You see that with a virtually symmetrical architecture, latencies remain constant between the processors. But there is a downside to SMP. Almost all modern-day processors work at a much higher speed than the memory they are accessing their data from. That is why there is onboard cache in order to compensate for the timing difference. However, cache misses are very common because the processor will access more data than the amount of data that could occupy the onboard cache at any given instant. As a result, the processor is bottlenecked by the speed of either the front side bus or the memory. SMP furthers the bottlenecking. Because the processors have to share a common front side bus, there will often be "congestion" in the bus. As the saying goes, the chain is only as strong as its weakest link. This bottlenecking is less evident in lower-end, small-scale systems. Large-scale systems with many processors will see some evident lag reminiscent of a Los Angeles rush hour traffic jam. And that is why there is NUMA. Non-Uniform Memory Architecture (NUMA) NUMA is widely considered the more efficient and sophisticated of the two main MultiProcessor technologies. Instead of a shared Front Side Bus in the SMP architecture, NUMA has two (or more) independent front side buses, along with a high-speed bus connecting the CPUs. This is made possible because each processor boasts its own integrated memory controller which allows it to access its own pool of RAM. The biggest advantage with this architecture is that it solves the bottlenecking issue found in SMP; latency is kept to a minimum. The downside, though, is the expense related with using NUMA - as always, you have to pay the bigger bucks to get speed. NUMA is most often found in higher-end AMD Opteron processors.

Depiction of NUMA Systems requiring smaller scale multi-tasking support have now found an answer in multi-core processors. So if you've been following computers to any degree, chances are you've at least heard of dual core systems. It started back in 2001 with the IBM Power4 processors. Since IBM dual core's ground breaking, Intel and AMD have both caught onto the concept and expanded on it, both touting it as the technology of the future. Note that I will refer to the combined technology of dual/multi-core as simply dual core from here on - multi-core will happen, but we are not there yet.

D Image courtesy of PC Per. Notice how thick the die is

Intel Pentium

But what exactly do dual core systems offer? Architecturally, it is simply two processor execution cores on a single die. Threads could be processed in two separate and physical execution cores in parallel. Because both cores have to share some of the processor resources with each other, dual core processors will replicate some but not all of the multi-tasking abilities of a multi-processor system. Intel,

AMD, and IBM's Power PC dual core processors make up a large majority of the dual core processors in the market right now. Here's an architectural overview on each one: Pentium-D/Extreme Edition-based Dual Core In a nutshell, the Intel approach is similar to the SMP for dual cores. Two cores are attached together onto one processor chip. The cores work independently of each other with no communication between them until they reach the front side bus. Each core has its own onboard cache and its own set of registers and architectural state. The cores share an 800Mhz Front Side Bus with each other, which is one of the downsides to the architecture. In order for the two cores to communicate, the signals need to travel off the chip, through the front side bus, and then back up the bus.

Intel Pentium-D Architecture Here are some of the many amenities that Intel has prepared in the Pentium-D: EM64T (Extended Memory 64-bit Technology): 64-bit processor support similar to the AMD64 that AMD offers. With the processor capable of addressing up to 64-bits of data per cycle, the added processing power found in dual core systems helps. XB/EDB (executive disable bit): - know that it's an anti-malware feature available on all the Pentium-Ds. EIST (Enhanced Intel SpeedStep Technology): Intel's version of "on-demand power" is available on all but the 820 model.

The Pentium-D processors target the mainstream users and are priced somewhat reasonably. As of Q4 2005, the lowest Pentium-D is offered at around 270 USD. Many computer builder giants, such as Dell and HP, have unsurprisingly jumped onto the Pentium-D bandwagon. Currently, Intel is looking for a newer dual-core technology to replace the Pentium-D Smithfield Architecture. The Pentium Extreme Edition, not to be confused with Pentium 4 Extreme Edition, is essentially a Pentium-D processor with Intel's Hyper Threading Technology enabled, theoretically providing four logical processors. Like with many of Intel's top-end models, their multipliers can be changed in order to manually increase the net processor clockspeed, a method of overclocking. As imagined, the Pentium-EE has a very heavy price tag. You may wonder, what is Hyper Threading? I will get to that shortly, but for now, know that it is simply a single-core multi-tasking technology (even though I said it doesn't exist). Take note: Celeron-D, despite the "D", is a single core processor. The "D" in both the Pentium-D and Celeron-D, according to Intel, stands for "different". AMD's Dual Core Technology AMD decided to use its integrated memory controller as best it could in dual core. Prior to launch, there was a rumor that AMD will have trouble delivering a dual core processor because of the complication involved with sharing an integrated memory controller. The AMD Opterons and Athlon 64 X2 dual cores silenced all doubts when it was released in 2004. The two cores (labeled CPU 0 and CPU1) have their independent level two cache. While only one memory controller, AMD integrated a System Request interface and a Crossbar switch that will allow the two cores to communicate with each other within the die of the processor. The request would then be sent to AMD's HyperTransport Bus, AMD's special bus connecting the processor with the rest of the computer.

AMD Dual Core diagram courtesy of AMD Currently, the highest Athlon 64 X2 processor is the 4800+ This architectural approach is surprisingly effective. However, the Athlon 64 X2, AMD's mainstream dual core line, is priced fairly high. AMD decided to later merge this dualcore concept with its high-end workstation/server product line, the Opteron. Opteron 2xx and 8xx series boast dual-core capabilities that are very similar to what is found in the Athlon 64 X2. Apple: IBM's last stand Out of nowhere, Steve Jobs, in early June 2005, stood in front of an audience full of reporters, announcing a radical Apple system change from IBM's PowerPC to Intel's line of processors. What left unsaid was that Apple was not done with IBM yet. In late 2005,

Apple announced its PowerMac G5 line with IBM PowerPC Dual-Core G5 processors. Being the frequently preferred platform of graphic designers, audio, and video editors, Apple dove into dual core, offering parallel processing for those who would greatly benefit from it. Apple currently offers dual-core processor G5s on its PowerMacs while offering two dualcore G5s on its flagship model. IBM's PowerPC 970MP, IBM's name for the G5, basically acts like the AMD Athlon 64 X2/Opteron dual core more than the Pentium-D dual core. Each core has its own supply of 1MB level two cache along with two velocity engine units, four floating point units, and four integer units. What does this all add up to? A good deal of processing power. Developed in 2002~2003, Intel's Hyper Threading Technology (or "HT Technology" for short) is a way for two threads to be processed on a single core simultaneously. It is found in all Intel Pentium 4 processors with an 800Mhz or 1066Mhz Front Side Buses, the 3.06Ghz Pentium 4 with 533Mhz Font Side Bus, along with all Pentium-Extreme Edition and Pentium 4-Extreme Edition processors. It is somewhat a "real" single core multitasking, and commonly referred to as Simultaneous Multi-Threading (SMT). When a thread of data is executed on the core, it does not use up all the resources provided. Modern-day processors are almost never one-hundred percent efficient. In reality, one thread will occupy around thirty to sixty-percent of the available processor execution units. Hyper Threading streams an additional thread into the processor, using the vacant execution core resources. It is not quite as effective as dual core in processing the two threads simultaneously. In HT, the overwhelming majority of resources are shared between the two threads being executed.

Images courtesy of Intel It is important to note that Hyper Threading will simulate a single processor when only one thread is present. As soon as the second thread is present, the processor will switch from the Single Thread (ST) to the Multi-Task (MT) mode. Intel came up with a clever way of making Hyper Threading work with the operating system. After all, every new technology needs to be supported by proper software. Processors with HT will present themselves to the operating system as two standalone processors. The Operating System treats the two threads as if they have their own individual resources. The threads will both have shared caches, but can be treated to their own architectural state and APIC (Advanced Programmable Interrupt Controller). Once they make it through, they come onto the execution core to be processed on a single core.

Screen shot of a Pentium 4 2.8 CPU with HT Windows Task Manager sees two logical processors and it will stream two threads for what it thinks as independent and completely parallel processing. The logical processor, for all intents and purposes, is treated like it is acting entirely on its own independent resources. It's doing exactly what Intel wants the OS to do. Also notice in the screenshot that the two logical processors have similar but not identical workloads. This shows that one logical processor can keep working while the other stalls from, among other things, a cache miss. Because of the OS-specific nature on HT, take a note on compatibility. The following are operating systems which are compatible with Hyper Threading: • • • •

MS Windows XP Home MS Windows XP Professional* MS Windows 2000 Professional** Linux Operating Systems released after Nov. 2002 including, but not limited to: o Red Hat Linux 9 and above (Professional and Personal) o SuSE Linux 8.2 and above (Professional and Personal) o RedFlag Linux Desktop 4.0 and above o COSIX Linux

*Including all branches which are based on XP Pro, including but not limited to Tablet Edt., and XP x64 **Compatible, but not optimized The following are operating systems which do not support Hyper Threading • • • •

MS Windows ME MS Windows 98SE MS Windows 98 and needless to say… MS Windows 95

To take full advantage of what HT has to offer, applications must support SMT to benefit from technology. If only a single thread were being streamed into the processor, the processor will act as if Hyper Threading doesn't exist. Multi-Threaded applications are found among those applications that are CPU-intensive, such as video editing, digital image editing, and rendering software. For example, Adobe's Photoshop is multithreaded and will make good use of Intel's Hyper Threading Technology. Hypothetically, the performance boost would be two fold - two threads are simultaneously executed. Due to shared resources, the realistic boost in performance is around 20% on average, with most applications seeing around 5~10% boost. In the scheme of things, however, a 10% speed boost in rendering a heavy image file is a considerable improvement. For example, in a test done by X-bit Labs, HT yielded a 15.8% performance increase on 3DMax 5 rendering (Underwater) over its non-HT counterpart. That amounts to a 45 second increase - imagine how much time you could save when rendering a heavier file.

The original idea behind Hyper Threading was to alleviate the effects of a cache miss. If a thread halts from a cache miss, the other thread can keep working on by. But HT had a far different effect. The technology placed multi-tasking capable systems in the reach of many mainstream users. While dual processors were available in the form of a Pentium-Pro based SMP (such as with the Pentium III), the first affordable multiprocessing came in the form of an efficient single core. No need for an expensive motherboard and a new power supply which will handle two chips chewing off your monthly electricity bill. It was an open incentive for many software companies to make applications multi-threaded, preparing the stage for dual core and dual-processors to enter mainstream. A note on games. As of now, there is still a distinct lack of games that take full advantage of SMT and what Intel's HT has to offer. However, with the emergence of dual-core as a universal technology, we should be seeing more and more games that will utilize multi-threading, and therefore benefit Hyper Threading, at least to a small degree. Some final words… So, is this multi-tasking concept worth buying into? The mainstream low-cost availability of all these multi-tasking tools, whether it'd be dual-core, dual-processor, or even just an Intel Hyper-Threading-enabled processor, have challenged the reasoning of purchasing a processor without one. Is it necessary? For most, hardly so right now. But the computer giant's will to change the market will quickly transform luxury into necessity. Look out for one the next time you go to your local electronic retailer. After all, remember at one point, "640k ought to be enough for anybody" - from the greatest computer pioneer himself, Bill Gates. In case you're interested in further reading… Multi-Processors: • •



Hardware Central - Dual Processor Workstation: Super Highway or Dead End? http://hardware.earthweb.com/chips/article.php/600091 Digit-Life - Non-Uniform Memory Architecture (NUMA): Dual Processor AMD Opteron Platform Analysis in RightMark Memory Analyzer http://www.digit-life.com/articles2/cpu/rmma-numa.html Sourcefuge - What is NUMA? http://lse.sourceforge.net/numa/faq/index.html

Dual-Core: • • •

Webopedia: All About Dual-Core Processors http://www.webopedia.com/DidYouKnow/Hardware_Software/2005/dual_core.asp The Inquirer: Athlon 64 and Opteron dual-core explained http://www.webopedia.com/DidYouKnow/Hardware_Software/2005/dual_core.asp Apple: PowerMac G5 Overview http://images.apple.com/powermac/pdf/20051018_PowerMacG5_TO.pdf

SMT/Hyper-Threading:

• • •

Intel HyperThreading Technology Overview http://www.intel.com/business/bss/products/hyperthreading/overview.htm Intel HyperThreading Technology http://www.intel.com/technology/hyperthread/index.htm Digit-Life - Intel HyperThreading Technology Review http://www.digit-life.com/articles/pentium4xeonhyperthreading/

Related Documents

Dual-core
November 2019 24
Dual Core
November 2019 9
Dual Processor Vs Dual Core
November 2019 13
Multitasking
May 2020 8
Multiprocessor
June 2020 10