HCL Infosystems Ltd
HARD DISK DRIVE Introduction We know that the data in RAM is volatile. Hence it is necessary to store data on some non-volatile medium, for later use. Floppy disks provide an alternative but their capacity is limited. Hard disk were therefore developed which could store large amounts of data reliably and access the data in short time. Hard disks therefore come under the permanent, secondary storage devices. Hard disks are also referred to as fixed disks or Winchester disk drives (WDD). A hard disk requires +5V supply for drive electronics & +12Vmotors. Hard disks were invented in the 1950s. They started as large disks up to 20 inches in diameter holding just a few megabytes. They were originally called "fixed disks" or "Winchesters" (a code name used for a popular IBM product). They later became known as "hard disks" to distinguish them from "floppy disks." Hard disks have a hard platter that holds the magnetic medium, as opposed to the flexible plastic film found in tapes and floppies. At the simplest level, a hard disk is not that different from a cassette tape. Both hard disks and cassette tapes use the same magnetic recording techniques. Hard disks and cassette tapes also share the major benefits of magnetic storage - the magnetic medium can be easily erased and rewritten, and it will "remember" the magnetic flux patterns stored onto the medium for many years. Let's look at the big differences between the cassette tapes and hard disks so you can see how they differ: •
•
• •
•
The magnetic recording material on a cassette tape is coated onto a thin plastic strip. In a hard disk, the magnetic recording material is layered onto a high-precision aluminum or glass disk. The hard disk platter is then polished to mirror smoothness. With a tape, you have to fast-forward or reverse through the tape to get to any particular point on the tape. This can take several minutes with a long tape. On a hard disk you can move to any point on the surface of the disk almost instantly. In a cassette tape deck, the read/write head touches the tape directly. In a hard disk the read/write head "flies" over the disk, never actually touching it. The tape in a cassette tape deck moves over the head at about 2 inches (about 5.08 cm) per second. A hard disk platter can spin underneath its head at speeds up to 3,000 inches per second (about 170 MPH or 272 KPH)! The information on a hard disk is stored in extremely small magnetic domains compared to a cassette tape's. The size of these domains is made possible by the precision of the platter and the speed of the media.
Because of these differences, a modern hard disk is able to store an amazing amount of information in a small space. A hard disk can also access any of its information in a fraction of a second. A typical desktop machine will have a hard disk with a capacity of between 10 and 40 gigabytes. Data is stored onto the disk in the form of files. A file is simply a named collection of bytes. The bytes might be the ASCII codes for the characters of a text file, or they could be the instructions of a software application for the computer to execute, or they could be the records of a data base, or they could be the pixel colors for a GIF image. No matter what it
Page 12 -1
HCL Infosystems Ltd
contains, however, a file is simply a string of bytes. When a program running on the computer requests a file, the hard disk retrieves its bytes and sends them to the CPU one at a time. There are two ways to measure the performance of a hard disk: • •
The data rate - the number of bytes per second that the drive can deliver to the CPU. Rates between 5 and 40 megabytes per second are common. The seek time - the amount of time it takes between the time that the CPU requests a file and the first byte of the file starts being sent to the CPU. Times between 10 and 20 milliseconds are common.
Drive Mechanism A hard disk is made of one or more circular platters. A platter is commonly made of aluminium. The platters are precisely machined to an extremely fine tolerance to make them flat and smooth. On each of the platters is laid a magnetic medium on which data is recorded. The magnetic medium is present on both surface of a platter. The diameter of the platter determines the physical drive mechanism. Now a days 3.5”platters are used. The platters are mounted on a shaft called spindle. The spindle connects to the spindle motor. The spindle motor is a servo controlled DC motor. A servo-controlled motor uses the feedback to maintain a constant and accurate rotation rate. A sensor in the disk drive constantly monitors how fast the drive spins and adjusts the spin rate.
Unlike floppy disks, hard disk platters are kept spinning constantly. This is necessary as the hard disk platters spin at high speed and have large inertia. If the spindle is stopped, it will take some time for the platters to reach the required speed due to inertia, which will increase access time. Constant spinning allows immediate access of data.
Read/Write Heads The data is written to and read from the disk using read/write heads. Since data is written on both surfaces of a platter there are 2 read/write heads, each associated with one surface. Each platter has 2 read/write heads associated with it. Each head is flexibly connected to a rigid arm, which supports the assembly. All the arms are linked to form a single moving unit. Since the platters spin at such high speeds (3600rpm & above), the head does not touch the media. In fact it actually floats 6 micro inches above the platter surface. The head generates a magnetic field corresponding to the current fed to it. This field in turn orients the magnetic domain in the media while storing data on the disk.
Page 12 -2
HCL Infosystems Ltd
Since all the heads are connected to a common spindle, all heads move in unison. Actuators bring the head movements. The head actuator is an electromechanical system that controls head movement. There are 2 types of actuators: 1. Open loop system: These use the band-stepper technology. In this is a stepper motor, which positions the head as required. The head moves one step at a time, which responds to a track. However since there is no feedback and these steps have a discrete step length there a limit to the capacity of the hard disk. Data cannot be packed closer as it would be difficult to read it. 2. Closed loop system: These use a ‘voice coil’ that operates like voice coil in the loud speaker. A magnetic field is generated in the coil of wire by the controlling electronics and this field moves the head in the resulting direction. Since there is constant feedback about the head position, the head can be precisely positioned over the required track. The feedback allows for tighter track spacing, therefore greater capacity. When the system is switched off the platters would stop spinning & the heads would crash on the media. This can destroy the data in that area. To prevent this the heads are taken to an area where no data is recorded. This area is called the ‘Landing zone’. Usually the head is moved to the landing zone, using software, before the system is switched off. This process is called ‘head parking’.
Disk Geometry We’ll now figure out how data is stored on a disk. Track: Each platter in the hard disk has a number of concentric circles on it, extending from the outer surface to the center. These concentric circles are called tracks. Tracks are present on both the surfaces of the platter. The head moves from track to track while accessing data. Track numbering starts from 0. Cylinder: All the corresponding tracks on different platters form a cylinder. That is the track 0 of all platters taken together would form cylinder 0 and so on and so forth. Since all heads are connected together each head will be placed at the same cylinder. Sectors: Each track on the platter is further divided into sectors. Each sector holds 512 bytes of data. Sector numbering starts from one. The number of sectors per track depends on the type of coding used to store data. If MFM (Modified Frequency Modulation) is used, number of sectors is less. RLL (Run Length Limit) allows more number of sectors to be put on a track. Hence sector/track varies from hard disk to hard disk. Here is a typical hard disk drive:
Page 12 -3
HCL Infosystems Ltd
It is a sealed aluminum box with controller electronics attached to one side. The electronics control the read/write mechanism and the motor that spins the platters. The electronics also assemble the magnetic domains on the drive into bytes (reading) and turn bytes into magnetic domains (writing). The electronics are all contained on a small board that detaches from the rest of the drive:
Underneath the board are the connections for the motor that spins the platters, as well as a highly-filtered vent hole that lets internal and external air pressures equalize:
Page 12 -4
HCL Infosystems Ltd
Removing the cover from the drive reveals an extremely simple but very precise interior:
In this picture you can see: •
•
The platters, which typically spin at 3,600 or 7,200 RPM when the drive is operating. These platters are manufactured to amazing tolerances and are mirror smooth (as you can see in this interesting self-portrait of the author... No easy way to avoid that, actually!) The arm that holds the read/write heads. This arm is controlled by the mechanism in the upper-left corner, and is able to move the heads from the hub to the edge of the drive. The arm and its movement mechanism are extremely light and fast. The arm on a typical hard disk drive can move from hub to edge and back up to 50 times per second - it is an amazing thing to watch!
In order to increase the amount of information the drive can store, most hard disks have multiple platters. This drive has three platters and six read-write heads:
Page 12 -5
HCL Infosystems Ltd
The mechanism that moves the arms on a hard disk has to be incredibly fast and precise. It can be constructed using a high-speed linear motor.
Many drives use a "voice coil" approach - the same technique used to move the cone of a speaker on your stereo moves the arm.
Storing the Data Data is stored on the surface of a platter in sectors and tracks. Tracks are concentric circles, and sectors are pie-shaped wedges on a track, like this:
Page 12 -6
HCL Infosystems Ltd
A typical track is shown in yellow; a typical sector is shown in blue. A sector contains a fixed number of bytes -- for example, 256 or 512. Either at the drive or the operating system level, sectors are often grouped together into clusters. The process of low-level formatting a drive establishes the tracks and sectors on the platter. The starting and ending points of each sector are written onto the platter. This process prepares the drive to hold blocks of bytes. High-level formatting then writes the file-storage structures, like the file allocation table, into the sectors. This process prepares the drive to hold files.
The Read and Write Principle Write Pre-compensation: The tracks near the center of the platter have a smaller circumference as compared to tracks at the periphery. However the number of sectors is same irrespective of where the track lies. Therefore size of sectors on track near the center is smaller, which requires data to be packed more tightly. The compatibility of many magnetic media to hold flux transitions falls off as the transitions are packed more tightly. This produces a weaker field, which causes difficulties during reading. This problem is solved by increasing the current given to the head when it writes nearer to the center of the disk. This increased current causes stronger transitions on the disk, which produces a stronger field. This process is called ‘Write pre-compensation’ as it compensates for fall in disk response near the center. Sector Interleave: In DOS, data is read one sector at a time. If more than one sector is to be read at consecutive accesses have to be made. If the sectors on the track are numbered consecutively, then access time of the disk increases. This happens because by the time data read from the first sector is transferred to memory, as the disk is spinning continuously, one or two sectors would have passed under the head.
Page 12 -7
HCL Infosystems Ltd
This means that when the signal to read sector 2 comes the head is over the 4 th sector. Hence to read sector 2 the head has to wait for one complete revolution of the disk. This will be repeated for every sector, if the sector numbering is consecutive. This can be overcome by numbering the 4th sector as sector 2 and the 7th sector as sector 3 and so forth. By doing this, when the signal to read sector 2 comes the head is over sector 2, resulting in an instant access. This reduces disk access time. This technique is called ‘Sector Interleave’. Since two physical sectors are left between logical sector number 1 and 4, this scheme is said to have a interleave factor of 3:1. The primary format procedure establishes the interleave factor by writing the logical sector numbers in the ID fields of each sector. The objective of sector interleaving is to provide an efficient way of file organization with optimum access time of records and minimum idle time. The exact interleave factor should be decided by considering the CPU speed and the disk rotation speed.
DOS Disk Structure Usage and Limitations Boot Sequence: The first sector on the disk is Cyl 0, Head 0, and Sector 1. This is called the Master Boot Sector’ or the ‘Disk Boot Sector’. This sector also contains the partition table, which is 64 Bytes in length. When the system is first powered on, the master boot record is loaded into the memory. The program in the Master Boot Sector looks at the partition table and finds out the active partition. Each partition has its own partition boot sector. The control from the Disk Boot Sector is passed on to the boot sector of the active partition. This active partition’s boot sector in turn loads the Operating System in the active partition and passes the control to the OS. On a DOS partition, we have four areas: Boot Sector, FAT, Root Directory Area and Data Area. The Boot sector in DOS partition is responsible for the loading of DOS (IO.SYS, MSDOS.SYS, and COMMAND.COM). The next few sectors are utilized to store the FAT (File Allocation Table). The size of the FAT depends on the capacity of the hard disk and the version of the DOS used. There are 2 copies of FAT, which are identical to each other. The hard disk Fat is normally 16-bit. The next few sectors are used to store the root directory structure, which holds information about the various files on the disk.
Partitioning of Hard Disk When you partition a hard disk you demarcate different positions of the disk, such that they can be accessed as separate rives. A hard disk with only one partition is accessed as ‘C’ drive. When you partition a hard disk into 2 DOS partitions, the first partition becomes the primary partition and the second becomes extended DOS partition. The extended DOS partition can again contain logical drives. The primary partition is called as Drive C & the other logical drives are accessed as Drive D, Drive E and so on. Which means that physically there is only one drive because there is only one hard disk, but DOS has marked a certain portion of the hard disk as Drive C and the remaining portion as Drive D, Drive E etc. The computer can boot only if the primary partition is made active. Note that DOS allows only the primary partition to be made ‘active’. The primary DOS partition and Extended DOS partition can have its own separate boot sector, FAT record and directory structure.
Page 12 -8
HCL Infosystems Ltd
DOS provides a utility called ‘FDISK’, which can be used to create partitions & logical drives. An important point to note is that if you alter the partition table of an existing drive all data on the drive will be lost. Therefore be very careful when you play around with ‘FDISK’. You might inadvertently destroy the data on the disk.
Hard Disk Logical Structures and File Systems The hard disk is, of course, a medium for storing information. Hard disks grow in size every year, and as they get larger, using them in an efficient way becomes more difficult. The file system is the general name given to the logical structures and software routines used to control access to the storage on a hard disk system. Operating systems use different ways of organizing and controlling access to data on the hard disk, and this choice is basically independent of the specific hardware being used--the same hard disk can be arranged in many different ways, and even multiple ways in different areas of the same disk. The information in this section in fact straddles the fine line between hardware and software, a line which gets more and more blurry every year. The nature of the logical structures on the hard disk has an important influence on the performance, reliability, expandability and compatibility of your storage subsystem. This section takes a look at the logical structures on the hard disk and how they are set up and used for a typical PC installation. I begin with a discussion of different PC operating systems, and an overview of different file system types. I then go into significant detail describing the major structures and key operating details of the most common PC file system, FAT (FAT12/FAT16/VFAT/FAT32). I talk about utilities used for partitioning and formatting hard disks, and also talk a bit about disk compression (even though it is no longer nearly as important as it once was.) I place special emphasis on how to organize the disk for maximum performance--while not getting bogged down in the minutiae of optimization where it will buy you little. Most of the focus in this section is on the FAT family of file systems, because these are by far the most commonly used, and also the ones with which I am most familiar. I do mention alternative file systems, but do not go into extensive detail on them, with one exception. Recognizing the growing role of Windows NT and Windows 2000 systems, a separate, comprehensive section has been added that describes the NTFS family of file systems. If you are mostly interested in reading about NTFS, you may want to skip some of the earlier subsections that describe FAT, and skip directly to the NTFS material. Bear in mind, however, that some of the NTFS discussions build upon the descriptions of FAT, since in some ways the file systems are related. So I recommend reading the section in order, if possible.
FAT Sizes: FAT12, FAT16 and FAT32 Throughout my discussion of file systems, I have referred to the FAT family of file systems. This includes several different FAT-related file systems, as described here. The file allocation table or FAT stores information about the clusters on the disk in a table. There are three different varieties of this file allocation table, which vary based on the maximize size of the table. The system utility that you use to partition the disk will normally choose the correct type of FAT for the volume you are using, but sometimes you will be given a choice of which you want to use. Since each cluster has one entry in the FAT, and these entries are used to hold the cluster number of the next cluster used by the file, the size of the FAT is the limiting factor on how many clusters any disk volume can contain. The following are the three different FAT versions now in use: •
FAT12: The oldest type of FAT uses a 12-bit binary number to hold the cluster number. A volume formatted using FAT12 can hold a maximum of 4,086 clusters,
Page 12 -9
HCL Infosystems Ltd
•
•
which is 2^12 minus a few values (to allow for reserved values to be used in the FAT). FAT12 is therefore most suitable for very small volumes, and is used on floppy disks and hard disk partitions smaller than about 16 MB (the latter being rare today.) FAT16: The FAT used for most older systems, and for small partitions on modern systems, uses a 16-bit binary number to hold cluster numbers. When you see someone refer to a "FAT" volume generically, they are usually referring to FAT16, because it is the de facto standard for hard disks, even with FAT32 now more popular than FAT16. A volume using FAT16 can hold a maximum of 65,526 clusters, which is 2^16 less a few values (again for reserved values in the FAT). FAT16 is used for hard disk volumes ranging in size from 16 MB to 2,048 MB. VFAT is a variant of FAT16. FAT32: The newest FAT type, FAT32 is supported by newer versions of Windows,
including Windows 95's OEM SR2 release, as well as Windows 98, Windows ME and Windows 2000. FAT32 uses a 28-bit binary cluster number--not 32, because 4 of the 32 bits are "reserved". 28 bits is still enough to permit ridiculously huge volumes-FAT32 can theoretically handle volumes with over 268 million clusters, and will support (theoretically) drives up to 2 TB in size. However to do this the size of the FAT grows very large; see here for details on FAT32's limitations. Here's a summary table showing how the three types of FAT compare: Attribute Used For
FAT12
FAT16
FAT32
Floppies and very Small to Medium-sized to small hard disk moderate- sized very large hard disk volumes hard disk volumes volumes
Size of Each FAT Entry
12 bits
16 bits
28 bits
Maximum Number of Clusters
4,086
65,526
~268,435,456
Cluster Size Used
0.5 KB to 4 KB
2 KB to 32 KB
4 KB to 32 KB
Maximum Volume Size
16,736,256
2,147,123,200
about 2^41
FAT Partition Efficiency: Slack One issue related to the FAT file system that has gained a lot more attention over the years is the concept of slack, which is the colloquial term used to refer to wasted space due to the use of clusters for storing files. This began in the mid-1990s when larger and larger hard disks began shipping with most systems. Typically, retail systems were not being divided into multiple partitions, and users began noticing that large quantities of their hard disk seem to "disappear". In many cases this amounted to hundreds of megabytes on a disk of only 1 to 2 GB in size. When the use of FAT32 became more common this problem was less of an issue for a while. Today, with hard disks sized at 40 GB or more commonplace, even FAT32 has problems with slack. Of course the space doesn't really "disappear", assuming we are not talking about lost clusters, which can make space really unusable on a disk unless you use a scanning utility to recover it. The space is simply wasted as a result of the cluster system that FAT uses. A cluster is the minimum amount of space that can be assigned to any file. No file can use part of a cluster under the FAT file system. This means, essentially, that the amount of space a file uses on the disk is "rounded up" to an integer multiple of the cluster size. If you create a file containing exactly one byte, it will still use an entire cluster's worth of space. Then, you
Page 12-10
HCL Infosystems Ltd
can expand that file in size until it reaches the maximum size of a cluster, and it will take up no additional space during that expansion. As soon as you make the file larger than what a single cluster can hold, a second cluster will be allocated, and the file's disk usage will double, even though the file only increased in size by one byte. Think of this in terms of collecting rain water in quart-sized glass bottles. Even if you collect just one ounce of water, you have to use a whole bottle. Once the bottle is in use, however, you can fill it with 31 more ounces, until it is full. Then you'll need another whole bottle to hold the 33rd ounce. Since files are always allocated whole clusters, this means that on average, the larger the cluster size of the volume, the more space that will be wasted. (When collecting rain water, it's more efficient to use smaller, cup-sized bottles instead of quart-sized ones, if minimizing the amount of storage space is a concern). If we take a disk that has a truly random distribution of file sizes, then on average each file wastes half a cluster. (They use any number of whole clusters and then a random amount of the last cluster, so on average half a cluster is wasted). This means that if you double the cluster size of the disk, you double the amount of storage that is wasted. Storage space that is wasted in this manner, due to space left at the end of the last cluster allocated to the file, is commonly called slack. The situation is in reality usually worse than this theoretical average. The files on most hard disks don't follow a random size pattern, in fact most files tend to be small in size. (Take a look in your web browser's cache directory sometime!) A hard disk that uses more small files will result in far more space being wasted. There are utilities that you can use to analyze the amount of wasted space on your disk volumes, such as the fantastic Partition Magic. It is not uncommon for very large disks that are in single partitions to waste up to 40% of their space due to slack, although 25-30% is more common. Let's take an example to illustrate the situation. Let's consider a hard disk volume that is using 32 kiB clusters. There are 17,000 files in the partition. If we assume that each file has half a cluster of slack, then this means that we are wasting 16 kiB of space per file. Multiply that by 17,000 files, and we get a total of 265 MB of slack space. If we assume that most of the files are smaller, and so therefore on average each file has slack space of around twothirds of a cluster instead of one-half, this jumps to 354 MB! If we were able to use a smaller cluster size for this disk, the amount of space wasted would reduce dramatically. The table below shows a comparison of the slack for various cluster sizes for this example. The more files on the disk, the worse the slack gets. To consider the percentage of disk space wasted in this example, divide the slack figure by the size of the disk. So if this were a (full) 1.2 GB disk using 32 kiB clusters, a full 30% of that space is slack. If the disk is 2.1 GB in size, the slack percentage is 17%:
Cluster Size
Sample Slack Space, 50% Cluster Slack Per File
Sample Slack Space, 67% Cluster Slack Per File
2 kiB
17 MB
22 MB
4 kiB
33 MB
44 MB
8 kiB
66 MB
89 MB
16 kiB
133 MB
177 MB
32 kiB
265 MB
354 MB
As you can see, the larger the cluster size used, the more of the disk's space is wasted due to slack. Therefore, it is better to use smaller cluster sizes whenever possible. This is,
Page 12-11
HCL Infosystems Ltd
unfortunately, sometimes easier said than done. The number of clusters we can use is limited by the nature of the FAT file system, and there are also performance tradeoffs in using smaller cluster sizes. Therefore, it isn't always possible to use the absolute smallest cluster size in order to maximize free space. One way that cluster sizes can be reduced is to use FAT32 instead of FAT16, as described in other pages in this section. However, on very large modern hard disks, big partitions even in FAT32 use rather hefty cluster sizes! For further information refer “Hard disk basic” file
Page 12-12