1
Athlon based dual-processor systems By Stanislav Garmatyuk and Evgeniy Severenovskiy With the release of first Athlons AMD started an aggressive price policy. Since the chip turned to be very good, AMD seized a large part of the PC market at a short time. Now the company is aiming a market of high-performance workstations and servers. According to AMD, first Athlons (Slot A) could work even in dual-processor configurations. But a multiprocessor system needs an appropriate chipset, which has appeared not so long time ago. Earlier the company stated that the processor was stable and showed high performance and good compatibility. After that it set going the production in the quantity enough for penetration a considerable part of the market. Then they released an advanced version of the core - Thunderbird, and a low-end version of the processor - Duron. There is another factor: sales of SMP-systems are beyond any comparison with usual ones in the number. At the time of production of first Athlons, AMD didn't contact manufacturers of SMT systems. Besides, an ordinary user is more greedy for new products than a user of SMT systems.
AMD-760MP chipset Specification:
•
AMD-762 north bridge (system controller); o one- and dual-processor configuration support; o FSB 200/266 MHz; o up to 4 GBytes PC1600/PC2100 Registered DDR SDRAM; o up to 4 DIMMs; o AGP 2.0 with 1X/2X/4X modes support; o PCI 2.2 with 33/66 MHz and 32- and 64-bit modes support;
2
•
AMD-766 south bridge (peripheral bus controller); o north bridge interface by the PCI bus; o dual-channel EIDE-controller with UATA 33/66/100 support; o LPC bus (Low Pin Count) and SM-Bus support; o USB-controller (4 external ports); o built-in IOAPIC, IRQ serialization support; o ACPI and Microsoft PC'99 specs' compatibility.
Here we can see a support not only for a usual 32-bit 33-MHz PCI bus, but also for higher-performance versions: a 64-bit bus working at 33 MHz, and a 64-bit bus at 66 MHz. But the Tyan Thunder K7 we tested had only the first mode: 64 bit/33 MHz. According to the documentation for the AMD-762 a usual south bridge is replaced with the "Future Southbridge". and AMD is just lacking for a finished chip of the south bridge which can work in the 64 bit/66 MHz PCI mode. That is why the boards supporting this mode are not produced either. It should be noted that in the AMD-760MP systems it is necessary to use so called registered memory. In fact, it means that usual PC1600/PC2100 modules won't work in such boards. Besides, we have found a file on the AMD's site named "Incompatibilities between the AMD-762 System Controller and the VIA Technologies VT82C686B "Super South" Southbridge". This document means that we shouldn't wait for boards combining a north bridge from AMD and a south bridge from VIA.
Tyan Thunder K7 - the first dual-processor platform for Socket A
3
This motherboard is a single solution on the AMD-760MP, which is produced on a large scale. But it is not surprising since Tyan is a developer of the reference design of dual-processor boards for AMD. The board belongs to the Thunder series - the eldest line of the Tyan motherboards meant for workstations and servers. In fact, the Thunder K7 has everything for building of a server or a powerful workstation. Look at the specs of the board:
• • • • • •
AMD-760MP chipset (AMD-762 plus AMD-766);
•
2 integrated chips 3Com 3C920 10/100 MBytes LAN and 2 network connectors RJ45;
two Socket A's (Socket 462); 4 slots for Registered DDR DIMM; 5 PCI 64 bit/33 MHz slots 1 AGP Pro slot; ATI RageXL based integrated video adapter (4 MBytes video memory);
4
•
dual-channel Ultra 160 SCSI-controller Adaptec 7899W, 2 68-pin LVD SCSI connectors;
• •
2 IDE connectors (UATA 33/66/100) and one FDD;
• • •
4 USB connectors (2 additional on a separate bracket);
2 COM-ports, LPT-port, PS/2 connectors (keyboard and mouse);
Winbond W83627HF based system monitoring (LPC Super I/O + Hardware Monitor); 8 fan connectors with monitoring of rotation frequency of three of them.
5
The Thunder K7 uses Ethernet-chips from 3Com, while all other boards from Tyan are equipped with Intel 82559 chips. The board has a wealth of integrated devices, which can be disabled if necessary with jumpers. During the tests we disabled a graphics chip and one of network chips. But they kept on heating. It seems that voltage is supplied to them regardless of whether they are used or not. Such solution doesn't seem to be rational. The Thunder K7 uses Phoenix ServerBIOS 2 Release 6.0. The set of options of the BIOS Setup is not rich there is only one interesting function - AMD Super Bypass (which works when only one processor is installed). Fine adjustment of memory and changeable multiplier are absent in this serious board. In fact, all you can do is only to disable unused devices and set the FSB frequency (200/266 MHz) with jumpers. An integrated video chip is connected to a PCI bus, not to an AGP one. Server assemblers do not like AGP video cards, and this was taken into account by the developers of the board. The memory modules are bent at angle of 25 degrees which also means that the board is intended for servers.
Such solution is used for assembling a rackmount server in cases 1U/2U high. But as you know, the AMD Athlon processors, especially those working at 1000 MHz and higher, are exacting as far as a cooling system is concerned. And coolers which are able to ensure a stable work are often higher than memory modules positioned a usual way. That is why there should be used some exotic cooling solutions.
Power supply unit
6 Taking into account the energy consumption of elder Athlons, a power supply unit for multi-processor systems must be very powerful. Today on the Tyan site you can find only two recommended power units from Delta and NMB. Their power is 460 W. In the mode of a peak load one AMD Athlon 1400 MHz consumes 70 W, two take, therefore, 140 W. An additional video card can take as much as 110 W. Besides we have a SCSI controller, a network one, a video chip, a SCSI hard disc with a 10,000 rpm rotation speed... You can see that a system is keen on energy. A standard ATX unit of the required power doesn't suit here. Power to the Thunder K7 is supplied through two connectors - a base 24-pin one and an additional 8-pin one, through which power is supplied for the processors from a separate rectifier. Taking into consideration how elder Athlon are striving for power, such solution looks rational. But such power units are non-standard, that is why they will cost quite much.
Testing technique Since stability and reliability are of the utmost importance for servers, the marker of server systems is conservative. That is why dual Athlon-systems are meant primarily for workstations. And we laid special stress on the tasks peculiar for such computers. We selected tests which use applications from different categories - intensive calculation tasks, work with 2D graphics, visualization and rendering of 3D graphics in professional packets, multimedia data processing, video editing. We used applications optimized for multiprocessor systems. Tests:
7 • • • • • •
Visualization and rendering of a 3D scene in the 3D Studio MAX R4. Rendering in the Lightwave 6.5. Adobe Photoshop 5.5 - set of Grand Pix tests. SPEC ViewPerf 6.1.2 - imitation of working in 6 applications which use professional OpenGL. Stress-test with running of several copies of archiver, each implementing compression of a 256 MBytes file. Clip coding in Windows Media Encoder 7.0.
As an additional test we installed a professional video capture card Pinnacle Targa 3000 into the dualprocessor Athlon system and studied a computer operation as a station of non-linear video editing.
Configurations of test platforms Officially only CPU Athlon MP works in dual-processor configurations, but we couldn't fetch them. That is why we tested the Thunder K7 with usual Athlons working at 1.4 GHz and 1 GHz (both are intended for FSB 266 MHz), which worked flawlessly together. The second variant of the Athlon 1 GHz was added for comparison with the Pentium III of the same frequency. Apart from the base test platform on the Thunder K7 we were given one more dual-processor system on a "heavy" board Tyan Thunder HEsl on the ServerSet III HE-SL chipset from ServerWorks. Both boards Thunder K7 and Thunder HEsl - have similar characteristics - two processors, up to 4 GBytes of registered memory, integrated Ethernet-chip(s) and Ultra 160 SCSI controller, AGP Pro slot, 64-bit PCI bus. The prices are also close, and taking into consideration a small spread in prices for the processors, the finished systems will cost almost identically. Besides, for high-level workstations, which need large memory volumes (2 GB and more), the ServerSet III HE-SL boards are the only choice for today. And new highlevel boards using AMD-760MP will have to compete exactly against such platforms. Besides, we have taken two additional systems:
• •
Pentium III dual-processor one on the VIA Apollo Pro133A (694X) chipset. This platform serves an example of inexpensive dual system on the P-III with lightened possibilities but good efficiency. Uni-processor system with the Pentium 4 1.7 GHz and Intel i850 boards. This platform is a leader in a processor clock speed, and we are mainly interested in comparison of its efficiency with the results of dual-processor systems in different-level applications.
The configurations of the test stands are given in the table. In all systems we used 512 MBytes memory, IBM Ultrastar 36LZX hard discs (10,000 rpm, Ultra 160 SCSI, 4 MBytes cache), and GeForce3 based AGPaccelerator (ASUS V8200 Deluxe with 64 MBytes memory), which showed excellent results in professional OpenGL applications, in operation with texture and wireframe 3D models. One more contestant is a 440BX based dual-processor system. But dual-processor boards on the 440BX are equipped with Slot 1 connectors, and it is diffuclt to find such Pentium III working at 100 MHz. That is why a dual-processor 440BX system was excluded from the tests. The tests were carried out under Windows 2000 Professional with Service Pack 2, DirectX 8.0A and Detonator 12.41 drivers. For each system (except Pentium 4) all tests were run both in uni- and in dualprocessor configurations. For the dual-processor Athlon-system we used Delta DPS465AB-A power unit, for others - FSP Group power unit of 250W or 300W.
8
Test system configurations:
Processor
Athlon
Pentium III
Pentium III
Pentium 4
Clock speed
1.4 GHz/1 GHz
1 GHz
1 GHz
1.7 GHz
Chipset
AMD-760MP
ServerSet III HE-SL
VIA 694X (Apollo Pro 133A)
Intel i850
Motherboard
Tyan Thunder K7 (S2462UNG)
Tyan Thunder HEsl (S2567U3AN)
Tyan Tiger 200 (S2505DNGR)
AOpen AX4T
Memory
512 MBytes Registered PC2100 DDR SDRAM
512 MBytes Registered PC133 ECC SDRAM
512 MBytes PC133 512 MBytes SDRAM PC800 RDRAM
HDD
IBM Ultrastar 36LZX (DDYS-T18350, 10000 rpm, 4 MBytes buffer)
SCSIcontroller
Adaptec AIC-7899W (integrated)
Video card
ASUS V8200 Deluxe (NVIDIA GeForce3, 64 MB)
LSI Symbios 53C1010-66 (integrated)
Adaptec 29160N
Test results Rendering and visualization in 3D Studio MAX R4 3D Studio MAX is the most popular test of 3D modeling, and its algorithms use effectively possibilities of multi-processor systems. We conducted two tests in the 3D Studio MAX R4. The first one was a final rendering of the scene measuring 800 * 600 pixels which contains 32,252 polygons and 16 sources of light. As a result, we obtained a period of time during which each system fulfilled calculations.
All system get nearly double acceleration when the second processor is installed. The Athlon 1.4 GHz is an absolute champion both in uni- and in dual-processor systems. The Athlon 1 GHz goes right after the leader winning from the Pentium 4, and both Pentium III. The Athlon systems look very attractive for
9 users who work with 3D Studio MAX. The efficiency depends only on the calculation power of the processor - no subsystems affect it. Look at two completely different platforms on the Pentium III (ServerSet III HE-SL and VIA 694X) which demonstrates the same performance. The second test uses a video subsystem and measures a speed of displaying of moving objects in projection windows. Working with wireframe models is what all users of this packet are dealing with.
The test was rather heavy: different representations of the scene of 28,688 polygons were displayed in four projection windows. The hardware OpenGL acceleration of the video card was used. Anti-aliasing was enabled for the wireframe mode. Apart from the video card, the test includes the central processor(s) and an AGP bus through which data proceeded to the accelerator. In this case the results achieved with one or two processors do not differ much, and we are giving the data for the dual-processor systems.
10
The Pentium 4 system outscored its nearest competitor - the dual Athlon 1.4 GHz - almost twice, most probably because of optimization of the video card drivers and the "native" AGP from Intel. The Pentium III and Thunder HEsl based system (ServerWorks chipset) has lagged behind the VIA 694X by a little margin.
Rendering in LightWave 6.5 This packet from NewTek is among three most popular programs of 3D modeling and animation. We have included it in the tests since its rendering algorithms differ considerably from those used in the 3D Studio MAX. In all systems we had rendering of the scene of 5 objects containing 10,080 polygons and 3 sources of light and using reflection and refraction effects. The size of the image was 1000 * 1000 pixels. In the rendering settings the number of threads was set to be equal to the number of processors.
The key peculiarity of the LightWave is that it is not a native application for the Wintel platform (Windows/Intel) - initially this test was developed for Silicon Graphics computers. It is a "clear x86" in code, regardless of any specifics of definite processors of Intel or AMD.
11
All processors with the same frequency showed almost identical results, with the Athlon 1.4 GHz taking the first place (both in uni- and in dual-processor groups). The Pentium 4 comes among the last ones despite the highest frequency (in applications not optimized for its peculiarities it doesn't show high scores).
Adobe Photoshop, Grand Pix test The Photoshop is a standard for processing of raster graphics. The Grand Pix tests from Wega Distribution consist of a set of scripts for the Adobe Photoshop 5.5, which enables a full range of its possibilities. This test is more preferable than usage of only several effects ("filters"). We used the first part of the Grand Pix packet which works with a .psd-file of 20 MBytes. I.e. all operations were implemented in the RAM without accessing a hard disc.
12
At the same frequency the Pentium III and Athlon show almost equal results, with the Athlon being a little better both in uni- and in dual-processor configurations. The Pentium 4 1.7 GHz performs a little better than the dual Athlon 1 GHz, but loses to the Athlon 1.4 GHz uni-processor system. The Athlon 1.4 GHz dual-processor system is an absolute leader. The performance gain in the Photoshop on the dual-processor systems is not very big - some 12-14%. As for separate subtests, installation of the second processor contributes into operations of image rotation, operations with gradients and filters Blur/Sharpen (where the gain makes 50%).
Dual-processor system and Photoshop - in detail We have decided to examine in detail how sensitive the Adobe Photoshop is to the second processor in the system. The ITC test lab has been using the Grand Pix tests, developed by Wega Distribution, for a long time already. The tests consist of a set of scripts for the Adobe Photoshop 5.5 which use up the maximum of possibilities of the program. That is why with their help we can check how different operations of the Photoshop are sensitive to different changes in the configuration. On the diagrams you can see the results of two different dual-processor systems, a performance gain with the second processor enabled. The tests are divided into subgroups which correspond to the categories of the Photoshop operations; we are giving the results of the performance for these categories.
13
14
We can see that the situations are identical in both cases:
• • • •
Rotate, Scale, Transform operations - the gain with the second CPU enabled makes 17-18% Gradients, Selections, Layers - 34-35% Sharpen, Blur and Noise filters - 55% Distort and Clouds filters - 8-12%
In all other cases there is no any gain (we don't consider 2-3% as such). So, the popular Photoshop 5.5 can be considered partially optimized for the SMP (Symmetrical Multy Processing). The gain from a dual-processor system will be received far not in all operations, but in some of them it can be quite large. And since many graphics packets are really optimized for the SMP (e.g. 3D Studio MAX, LightWave etc.), a dual-processor station for this purpose is the right solution.
Windows Media Encoder 7.0 This application is taken from the Pentium 4 Application Launcher. It is meant for coding of audio and video data (we coded the AVI format into the WMV format). The Windows Media Encoder 7.0 is optimized for a set of SSE/SSE2 instructions, but it can use the second processor as well.
15
The Pentium 4 1.7 GHz outscored all uni-processor configurations, but it lost to the Athlon 1.4 GHz dualprocessor system (by 13%). The Athlon 1 GHz uni-proccessor system lagged behind the Pentium III uniprocessor system of the same frequency, but when the second processor was enabled, the Athlon won. In dual-processor systems it is more rational to use Athlon than Pentium III in the most of cases.
Archiving with WinAce We ran from 1 to 8 copies of the WinAce successively at intervals of 5 sec, each packing a test file of 256 MBytes (with the maximum compression and maximum dictionary size). A time interval from starting of the first processor till the end of the last one was measured. The test is close to the server of applications in the character of load. For such server the main benefit from usage of the second processor consists not in acceleration of implementation of applications optimized for SMP, but in capability to distribute the load rationally among the processors which is created by a great deal of uni-processor tasks implemented on the server by different users.
16
As the number of processes grows, the time of implementation of the whole tests grows linearly. At the simultaneous implementation of several identical processes the benefit from SMP is proportional to the number of processors. The results of the Pentium III and Athlon 1 GHz are identical in uni- and dual-processor configurations. The scores of the Pentium III 1 GHz on the VIA 694X and ServerWorks HE-SL are combined since the difference is much less than a percent. The systems on the Intel Pentium III and AMD Athlon deal excellently with a great number of simultaneously used applications. The Pentium 4 works much worse in multitask modes, such system can outperform only uni-processor configurations at 1 GHz.
SPEC ViewPerf 6.1.2 Since the performance levels in the most of subtests are similar, we have left for you the results of only two most demonstrative tests - AWadvs-04 (Alias|Wavefront Advanced Visualizer) and DX-06 (IBM Data Explorer). The first thing that catches my eye is a considerable advantage of the Intel Pentium 4 system which is maintained in all other subtests. NVIDIA is constantly optimizing its drivers, squeezing everything possible from the sets of additional instructions, and devoted a special press-release for the support of the SSE2 in the new Detonator 12.41.
17
In the AWadvs tests it is well seen that the ServerWorks chipset has serious problems with some aspects of performance - the results of the uni-processor systems are very low, and the second processor doesn't improve the situation. It is interesting that the second processor is useful only on this system. In all other cases the performance depends only on a video card and a data rate in the AGP bus, i.e. there is nothing to do for the second processor. On the contrary, VIA 694X looks not that bad on the background of the ServerSet HE-SL.
18 In the IBM Data Explorer the situation is the same differing only in the fact that there is no real benefit from the second processor. The Pentium 4 1.7 GHz comes the first, followed by both Athlon based systems (according to the frequencies). In combination with the Pentium III the VIA performed better.
Video capture and processing We tested the dual-processor Athlon system also for video processing. Thus, we installed a highperformance card of non-linear video editing Pinnacle Targa 3000 supplied with a switching control panel ($7,000). The Targa 3000 is able to process up to 4 layers of video or 3 layers of video and 5 graphics layers in a real time mode without preliminary calculations. Having the high performance level, the system requires a hardware platform with a 64-bit PCI bus and a disc subsystem with a high bandwidth, since a video stream can reach 40 MBps at video capturing and 160 MBps when reproducing. The tests were conducted on two of four test systems (on the Thunder K7 and Thunder HEsl), since two other do not support a 64-bit PCI bus. In each system we added 4 SCSI hard discs Seagate Barracuda 36ES 18.4 GBytes, which were combined into a stripped array (RAID 0). A video fragment was converted into a digital form and recorded onto the hard disc to be edited in the Adobe Premiere with overlaying of effects and additional video layers. According to this test, both systems - Thunder K7 with two Athlon 1.4 GHz, and Thunder HEsl with two PIII 1 GHz - turned to be suitable for such tasks. The Athlon-system worked stable, and its response time was much less. For the Thunder K7 we used the same AGP-video card as in other tests. At the same time, on the Thunder HEsl because of some peculiarities of the AGP support in the ServerSet HE-SL chipset a normal work with video is possible only when a video card has a PCI interface. As a result, the dualprocessor Athlon system performs excellently in video editing tasks, making no problems either in adjustment of the equipment or during the operation.
Athlon dual-processor system We have decided to start drawing a conclusion about a dual-processor Athlon system with our impressions. One of the major issues we tried to clarify was not a performance but stability, compatibility and stumbling blocks which can worsen the impression despite its super speed characteristics. For example, installation of an operating system. This procedure must be simple. The Windows 2000 was installed flawlessly on the AMD-760MP system. AMD is now promoting from its site the AMD Drivers Pack which detects a chipset and an operating system and installs all necessary drivers. It is all you need for the correct operation of the OS with this chipset. The system didn't hung during the tests and no applications gave out errors. The AMD-760MP based system worked as any good stable system must work.
Range of application The new platform from AMD is the strongest competitor for Pentium III based systems with the same functionality, i.e. for the products on ServerWorks, Intel i440GX chipset and dual-processor systems on i820/i840 equipped with RDRAM. The VIA 694X (Apollo Pro133A) based computer costs much less than the AMD-760MP based machine, but its functionality will be much less as well: twice less RAM, lack of a 64-bit PCI bus etc. The Apollo Pro266 chipset is still a vague product, its performance is only a bit better than that of the Apollo Pro133A, despite DDR SDRAM support. It means that you won't be able to built a cheap dual-processor system on the AMD-760MP, but as for an expensive one, it makes no sense. The Pentium 4 is well suitable for definite tasks but the circle of them is much narrower than the range of application of dual-processor systems on the AMD Athlon. As for the Pentium 4 Xeon (a server version of the processor), the AMD-760MP looks, as compared with it, more acceptable solution, since a price of a system at least with two Pentium 4 Xeon and the respective size of RDRAM remains too high.
19 High-level dual-processor chipset for the Socket 370 has obtained a very serious competitor - AMD-760MP. The AMD Athlon outscores the Pentium III in the maximum frequency, that is why even if efficiency of the dual processor systems on these processors is the same, you can assemble a more efficient computer on the AMD-760MP at the expense of a CPU with a higher frequency. In all other respects the new AMD chipset also excels, and its single drawback - lack of a 64-bit 66 MHz PCI bus support - will be eliminated very soon.
P. S. You may ask us why we didn't touch a dual-processor Duron. We tried it, it works. After installation of two AMD Duron 800 processors on the Tyan Thunder K7 the Windows 2000 loaded, saw both processors and ran any applications without problems. But does it make sense to assemble such system now? The cost of the board and memory makes the difference between the price for Athlon and Duron almost unnoticeable, and the result is very interesting. First, the Athlon outscores the Duron by 1.5 in the highest frequency; secondly, due to lack of overclocking functions on the board the Duron will work at the rated 200-MHz FSB. It means that such a slightest economy will result in a considerable downgrade in the performance.