Fragmentation - 1 December 1997
We hadn't intended to write an article on fragmentation, since these articles primarily go to people who already use Diskeeper, but so many of you have asked for the article on fragmentation that we had to write one!
We have stated before that fragmentation is the most significant factor in system performance. Here's why:
An average fragments per file value of 1.2 means that there are 20% more pieces of files on the disk than there are files, indicating perhaps 20% extra computer work needed. It should be pointed out that these numbers are merely indicators. Some files are so small that they reside entirely within the MFT. Some files are zero-length. If only a few files are badly fragmented while the rest are contiguous, and those few fragmented files are seldom accessed, then fragmentation may have no performance impact at all. On the other hand, if your applications are accessing the fragmented files heavily, the performance impact could be much greater than 20%. You have to look further to be sure. For example, if there were 1,000 files and only one of those files is ever used, but that one is fragmented into 200 pieces (20% of the total fragments on the disk), you would have a serious problem, much worse than the 20% figure would indicate. In other words, it is not the fact that a file is fragmented that causes performance problems, it is the computer's attempts to access the file that degrade performance.
To explain this properly, it is first necessary to examine how files are accessed and what is going on inside the computer when files are fragmented.
What's Happening to Your Disks?
Tracks on a disk are concentric circles, divided into sectors. Files are written to groups of sectors called "clusters". Often, files are larger than one cluster, so when the first cluster is filled, writing continues into the next cluster, and the next, and so on. If there are enough contiguous clusters, the file is written in one contiguous piece. It is not fragmented. The contents of the file can be scanned from the disk in one continuous sweep merely by positioning the head over the right track and then detecting the file data as the platter spins the track past the head.
Now, suppose the file is fragmented into two parts on the same track. To access this file, the read/write head has to move into position as described above, scan the first part of the file, then suspend scanning briefly while waiting for the second part of the file to move under the head. Then the head is reactivated and the remainder of the file is scanned.
As you can see, the time needed to read the fragmented file is longer than the time needed to read the unfragmented (contiguous) file. The exact time needed is the time to rotate the entire file under the head, plus the time needed to rotate the gap under the head. A gap such as this might add a few milliseconds to the time needed to access a file. Multiple gaps would, of course, multiply the time added. The gap portion of the rotation is wasted time due solely to fragmentation. Then, on top of that, you have to add all the extra operating system overhead required to process the extra I/Os.
Now, what if these two fragments are on two different tracks? We have to add time for movement of the head from one track to another. This track-to-track motion is usually much more time-consuming than rotational delay, since you have to physically move the head. To make matters worse, the relatively long time it takes to move the head from the track containing the first fragment to the track containing the second fragment can cause the head to miss the beginning of the second fragment, necessitating a delay of nearly one complete rotation of the disk, waiting for the second fragment to come around again to be read. Further, this form of fragmentation is much more common than the gap form.
But the really grim news is this: files don't always fragment into just two pieces. You might have three or four, or ten or a hundred fragments in a single file. Imagine the gymnastic maneuvers your disk heads are going through trying to collect up all the pieces of a file fragmented into 100 pieces!
On really badly fragmented files, there is another factor: The Master File Table record can only hold a limited number of pointers to file fragments. When the file gets too fragmented, you have to have a second MFT record, maybe a third, or even more. For every such file accessed, add to each I/O the overhead of reading a second (or third, or fourth, etc.) file record segment from the MFT.
On top of all that, extra I/O requests, due to fragmentation, are added to the I/O request queue along with ordinary and needful I/O requests. The more I/O requests there are in the I/O request queue, the longer user applications have to wait for I/O to be processed. This means that fragmentation causes everyone on the system to wait longer for I/O, not just the user accessing the fragmented file.
Fragmentation overhead certainly mounts up. Imagine what it is like when there are 100 users on a network, all accessing the same server, all incurring similar amounts of excess overhead.
What's Happening to Your Computer?
Now, let's take a look at what these excess motions and file access delays are doing to the computer.
Windows NT is a complicated operating system. This is a good thing because the complexity results from the large amount of functionality built in to the system, saving you and your programmers the trouble of building that functionality into your application programs, which is what makes Windows NT a truly great operating system. One of those functions is the service of providing an application with file data without the application having to locate every bit and byte of data physically on the disk. Windows NT will do that for you.
When a file is fragmented, Windows NT does not trouble your program with the fact, it just rounds up all the data requested and passes it along. This sounds fine, and it is a helpful feature, but there is a cost. Windows NT, in directing the disk heads to all the right tracks and clusters within each track, consumes system time to do so. That's system time that would otherwise be available to your applications. Such time, not directly used for running your program, is called overhead.
What's happening to your applications while all this overhead is going on? Simple: Nothing. They wait.
The users wait, too, but they do not often wait without complaining, as computers do. They get upset, as you may have noticed.
The users wait for their applications to load, then wait for them to complete, while excess fragments of files are chased up around the disk. They wait for keyboard response while the computer is busy chasing up fragments for other programs that run between the user's keyboard commands. They wait for new files to be created, while the operating system searches for enough free space on the disk and, since the free space is also fragmented, allocates a fragment here, a fragment there, and so on. They even wait to log in, as the operating system wades through fragmented procedures and data needed by startup programs. Even backup takes longer - a lot longer - and the users suffer while backup is hogging the machine for more and more of "their" time.
Fragmentation vs. CPU Speed
A system that does a lot of number crunching but little disk I/O will not be affected much by fragmentation. But on a system that does mainly disk I/O (say a mail server), severe fragmentation can easily slow a system by 90% or more. That's much more than the difference between a 486/66 CPU and a 250MHz Pentium II!
Of course, for the vast majority of computers, the impact of fragmentation will fall somewhere in the middle of this range. In our experience, many Windows NT systems that have run for more than two months without defragmenting have, after defragmentation, at least doubled their throughput. It takes quite a large CPU upgrade to double performance.
Fragmentation vs. Memory
The amount of memory in a computer is also important to system performance; just how important depends on where you are starting from. If you have 16 megabytes of RAM, it's almost a certainty that adding more will tremendously boost performance, but if you have 256 megabytes, most systems would get no benefit from more. Raising the RAM from 32 to 96 megabytes on the author's machine, which does much memory-intensive work, almost tripled performance. We see this as the high end of possible benefit from adding memory. The typical site, in our experience, will see about a 25% boost from doubling the RAM. Again, we generally see more performance improvement from eliminating fragmentation.
(This article was primarily excerpted from Chapter 4 of the book Fragmentation - the Condition, the Cause, the Cure, by Craig Jensen, CEO of Executive Software. It has been modified for application to Windows NT. The complete text of the book is available at this web site.)
Source>>
http://www.execsoft.com/tech-support/NT-articles/article.asp?F=1997120112.htm