Cluster Sizes?

Status
Not open for further replies.
Large cluster sizes can waste space and degrade performance. Every file you store will take up at least one cluster - even if you have 1 byte of actual data in it.

The default cluster size with NTFS is 4096 bytes. With very large disk space FAT32 creates larger cluster sizes to be able to cover the entire space - because it has a limited maximum number of clusters that it can handle.
 
It would help if you would elaborate what sort of clusters you mean.

If you mean the cluster size for a filesystem then larger clusters mean better performance and less fragmentation. A filesystem cluster is the smallest amount of data the OS will read from or write to the disk. With bigger clusters more data will be transferred in one go. Disks read and write much faster when data is in sequence. There will be less fragmentation since files will be held in a smaller amount of chunks.

The downside with cluster size is the waste of disk space. On average you waste half of the cluster size per file. So if you have a million files on a partition with 64k clusters then you waste 32GB of disk space. If you have lots of small files then the waste can be even bigger.

With NTFS cluster sizes different from 4k turn off some features like file compression and could cause compatibility issues.
 
Nodsu... You hit the nail on the head. I was not aware there were more than one kind of cluster. The filesystem cluster is what I was asking about and your explanation pretty much answers my question. Larger is better, but you sacrifice disk space depending on the size of the files you are working with and how many there are. If I understood you correctly.
 
Nodsu is partly right when he says "larger clusters mean better performance". Yes, actual disk reads and writes would be fewer. But first, the data transfer time for each cluster would be greater. And second, each cluster, when read, would take up a larger amount of RAM. Both of those would degrade performance - particularly if you need just a very few bytes on average from that larger cluster. Performance happens in RAM - not on disk. No offense to Nodsu but following his recommendation why not have a cluster size of 1 MB or 10 MB or even larger. Your computer will immediately grind to a halt.

Also, when you have many small files larger clusters mean your data is spread out over a larger area of disk resulting in greater average disk head movement during reads/writes. This can also degrade performance.

If you are dealing with a large number of big binary files (e.g. graphics) then larger cluster sizes may be justified. But for your average user I think, on balance, small cluster sizes (e.g. Windows default of 4096 bytes) are better. For the average user the slowest part of the system is the user himself/herself.

PS: What is EFS?
 
anybody try to defrag a HDD with a cluster size larger than 4K?
average file sizes are? I have my HDD's set at 2K frags a little faster but also reads and write is little faster.
tried to set a clustered drive up at 8K and defrag software would not run.
 
Samstoned: The issue is with your defragmentation software. I have used and defragged partitions with clusters up to 64K with no problems.

---

As for big vs small clusters, some back-of-the-envelope calculations I made for my amusement:

You will have to excuse my use of commas as decimal points.. A matter of habit.

Let's have a 7200RPM drive, the average latency will be 4.2ms. Let's give it a seek time of 10ms. Let's take the media transfer rate as 70MB/s. I would assume this is a rather slowish HD of today - looked up the specs from Hitachi and made them somewhat worse. Someone more interested in all sorts of specs can correct me.

On average it takes 10+4,2=14,2 ms to seek to a position on the drive. It takes around 0,014 milliseconds to read a kilobyte worth of data once we have reached the right position. (We can read 70*1024 kilobytes per second and we inverse that to get the time)

So to seek and read a 4K cluster it takes on average 14,2+4*0,014=14,256 milliseconds. To do the same for a 64K cluster it would take 15,096 milliseconds. The difference is 0,84ms. Meaning that it would take on average 0,84*100/14,256=5,89% more time to read a 64K cluster.

This is the "best case" scenario for small clusters.

The average file size on this computer on my lap is 186KB (Windows builtin defragger reports this nicely).

Best case: files are defragmented.

Time to read a contiguous 186K file 4K at a time with seek: 14,2+47*4*0,014=16,832ms
Same with 64K clusters: 14,2+3*64*0,014=16,888
The difference is almost unnoticeable here.

Worst case: files are completely fragmented.

If we use 64K clusters then this means three clusters and three fragments. The time to read three 64K clusters would be (by our rough average) 3*15,096=45,288ms. With 4K clusters we would get 47 clusters. And the time to read all those would be 47*14,256=670,032ms. This is almost 15 times more time.

So on my system, by very rough averages, the performance loss with 64K clusters would be nil if the drive was defragmented and if I let things go very bad I could gain orders of magnitude speed increase.

---

I am no expert on hard drives and didn't pother to make a technically correct analysis but it was an interesting exercise for me.
If anyone actually bothered to read this and has something to fix/add then feel free.
 
Nodsu,

First, I agree with you on defragging. I defragged my previous FAT32 system with 32 KB clusters without any problems.

Second, when I reformatted it to NTFS with 4 KB clusters I recovered over 1 GB space from about 9 GB of data - just to give everyone an idea of the amount of wasted space.

Third, those are interesting calculations and facts. I will note the facts for future use. And, perhaps I should revisit my previous assumptions.

But fourth, if you allow your disk to become "badly" fragmented cluster sizes will be the least of your problems.

Fifth, I take issue with your "Time to read a contiguous 186K file". That assumes that every time you access a file you read the entire file. In my previous post I touched on this. With a 64 KB file you have to seek and transfer the entire 64 KB of the cluster even if you want to access only just 1 byte from it. So if you are reading and processing most of a big file (as you would with say a graphics file) then as I said before I agree 64 KB would be a better cluster size. Maybe you can get authoritative average file utilization figures somewhere and recalculate. Failing that you could assume that on average half the file is actually accessed - and recalculate.

Sixth, you left RAM utilization entirely out of your considerations. When you grab 64 KB of RAM instead of 4 KB you deny 60 KB to other tasks and apps. I did not explicitly mention it previously but your average paging activity will be higher - both in terms of the number of paging I/O's and in terms of the time for each I/O. As I said performance happens in RAM. That is the most critical bottleneck I think. Do you disagree?

This is a most interesting discussion. I look forward to your reply with great anticipation, truly. :D
 
this has all been covered before but I don't feel like searching the forums for it. For everything you ever wanted to know about cluster sizes, just go to www.ntfs.com and learn. Especially since some of the comments here seem to be coming from somewhere other than a mouth.
 
One of the cornerstones of all sorts of memory caching and optimisations in computing is the locality of data access. This means that when you access a part of memory then it is very likely that you will want to access something near that place very soon.
The optimal size of the locality window depends on the specific application of course.

Occurrances when you want to read only a couple of bytes from a file and then not touch that file in the near future are rare and very often it is easier for programmers to read the whole file to play safe (RAM is cheap). The most used files - system libraries (.dll) and most executables are always loaded whole by the OS kernel.

The amount of RAM allocated to disk buffers does not depend on the cluster size. The OS will have an amount of memory for disk buffering (the amount changes according to some allocation algorithm). Cluster size could only define how many different disk locations may fit in that buffer space. If I had 64MB of disk buffers then that means that I could have 1024 cached disk locations. 1000 open files on a normal system is a lot.

PS
If, according to StormBringer I have some serious misconceptions about things then I'd really appreciate if he pointed them out to me. In private if needed..
 
I don't know how to take StormBringer's comments. They sound dismissive. If so, I don't appreciate them. I know of www.ntfs.com and I had bookmarked it long ago. It did not teach me anything about this that I didn't already know.

FYI, I am a computer consultant with over 30 years of doing systems and database design. Although most of that was on mainframes the principles of performance are the same. Only the sizes and speeds are different.

I agree with you Nodsu when you say it is rare for a user to access just a few bytes of a file. It is equally rare for a lay user (not a programmer) to access all of a large file. I have a couple of large (c. 700 MB) MS Access databases that I access and update every day. There is no way I access or can access all 700 MB every day. MS Access databases can grow up to 2 GB. That is why I suggested you recalculate assuming access to half the file. And, there is no way any program/programmer can read a whole 2 GB database into RAM even to "play it safe". System modules may be loaded whole - because they are small. They are designed to be small precisely for performance. Instead of a few large modules they have created very many more smaller modules each with limited functionality to be loaded only when needed.

Also, it is not all that rare for a user to need just a few bytes from the "next" cluster. If the user is done with info in one cluster and wants to look at even one byte in the next cluster that entire cluster has to be read in into RAM.

And, RAM is not cheap relative to disk space. High-speed cache memory is still less so. If that were not so we would not have/need hard disks, just extremely large RAM's. Further, you cannot increase RAM beyond what the PC is designed to accomodate regardless of how cheap it is.

True, 1000 simultaneously open files is a lot. But you are assuming that any open file will have just one cluster in the system buffer at a time. Not true. Usually upto four or five files are open - each with several clusters in the system buffer.

When the system needs to access a cluster it first looks in the system buffer. If it does not find it already there it will actually read that cluster from disk into the system buffer. But what happens when the system buffer is full? It will overwrite the "least recently used" cluster there. Then if the user/program wants to re-access the overwritten cluster the system has to re-read that cluster from disk. This "hidden" I/O generally increases with larger cluster sizes - as does the other "hidden" I/O i.e. pagefile reads and writes.

Experience has taught me that it is a common mistake to think of performance only in terms of the number of I/O's. You have to take the whole system - particularly RAM, buffering and paging - and the whole process into account. Generally you should "get only the data you actually need, only when you actually need it and, as far as possible, in the order that you actually need it". Bigger is not always better.

StormBringer, you're right. This is "coming from somewhere other than a mouth". It is coming from a real brain and real experience, long experience. :D

PS: When you talk about "locality of data access", etc. you are talking about prefetching. That too will in fact actually get worse with larger cluster sizes. There the system is actually reading clusters that you are "likely" to need but won't necessarily actually need.
 
Gunny said:
I agree with you Nodsu when you say it is rare for a user to access just a few bytes of a file. It is equally rare for a lay user (not a programmer) to access all of a large file. I have a couple of large (c. 700 MB) MS Access databases that I access and update every day. There is no way I access or can access all 700 MB every day. MS Access databases can grow up to 2 GB. That is why I suggested you recalculate assuming access to half the file. And, there is no way any program/programmer can read a whole 2 GB database into RAM even to "play it safe". System modules may be loaded whole - because they are small. They are designed to be small precisely for performance. Instead of a few large modules they have created very many more smaller modules each with limited functionality to be loaded only when needed.

That's why I posted the average file size on my computer. To have some notion of "large" and "small". Also, I don't see why a normal user would have several 700MB databases on her computer. BTW MS Jet database engine accesses its databases in 2KB chunks no matter what cluster size you have. I agree that setting the cluster size accordingly would be a very good idea.

Also, it is not all that rare for a user to need just a few bytes from the "next" cluster. If the user is done with info in one cluster and wants to look at even one byte in the next cluster that entire cluster has to be read in into RAM.

And with smaller clusters that is more likely to happen. 16 times more likely if we use 4K clusters instead of 64K. And every time you look up a cluster you get slammed with the HD access time, which is, like we saw, far greater than the time it takes to read a bigger cluster instead.

And, RAM is not cheap relative to disk space. High-speed cache memory is still less so. If that were not so we would not have/need hard disks, just extremely large RAM's. Further, you cannot increase RAM beyond what the PC is designed to accomodate regardless of how cheap it is.

Loading a 186KB file into a 512MB RAM is cheap.

True, 1000 simultaneously open files is a lot. But you are assuming that any open file will have just one cluster in the system buffer at a time. Not true. Usually upto four or five files are open - each with several clusters in the system buffer.

For any non-database work the several pending clusters will most likely be sequential. It will make no difference if i have one 64KB pending write or 16 4KB sequential ones.

When the system needs to access a cluster it first looks in the system buffer. If it does not find it already there it will actually read that cluster from disk into the system buffer. But what happens when the system buffer is full? It will overwrite the "least recently used" cluster there. Then if the user/program wants to re-access the overwritten cluster the system has to re-read that cluster from disk. This "hidden" I/O generally increases with larger cluster sizes - as does the other "hidden" I/O i.e. pagefile reads and writes.

1000 pending clusters on desktop PC is a lot. And it is very unlikely that a user would saturate the disk buffer with thousands of random requests. The drive would have to be very fragmented for that. And fragmentation is less with large clusters.

Windows memory management uses 4K pages by default and for swapfile the 4K cluster size indeed is optimal.

Experience has taught me that it is a common mistake to think of performance only in terms of the number of I/O's. You have to take the whole system - particularly RAM, buffering and paging - and the whole process into account. Generally you should "get only the data you actually need, only when you actually need it and, as far as possible, in the order that you actually need it". Bigger is not always better.

Any disk activity takes eons compared to any chip activity and the gap is increasing hence the less you bother the disk the faster things will be.

If we suppose that a CPU can do one instruction per clock cycle then a 3GHz one would do some 3000 "MIPS". That is 3 million instructions per millisecond. With the smallest - 1 byte - instructions that would mean churning through 3MB of linear code.

PS: When you talk about "locality of data access", etc. you are talking about prefetching. That too will in fact actually get worse with larger cluster sizes. There the system is actually reading clusters that you are "likely" to need but won't necessarily actually need.

IMO it is better to play safe and grab the data just in case (that was 5% performance loss in worst case, remember?) than get the hit when I actually need that 4097th byte and have to go the disk again (that would mean twice the disk access time).

Ran a performace log on my laptop overnight with one shutdown and startup. Web browsing as the activity. Average bytes per disk operation was some 10K. So for this kind of use I would have to go for 8K or 16K clusters.
 
First of all, Nodsu, I am a "him", not a "her". Which one are you?

Virtually everyone has one or more large and small "databases". A list of your music collection or of football scores are databases. It is just that a lay user does not think of them as databases but they are databases just the same. A database is just an organized way of keeping track of a large number of things you are interested in. There is no point in getting and keeping something if you can't find it when you need it.

I have two MS Access databases. One is to track all the software I have on each of my two PC's. The other is to keep track of stocks and options for investing purposes. These are, I would claim, quite "normal".

I have to defer to your much superior knowledge of the MS Jet DBE but does it access the HD directly? If not, the OS will still read the whole cluster for it and use up a whole cluster's worth of RAM.

On this whole issue of average file size I was not taking issue with you using it in your calculations. I was taking issue with the fact that you implicitly assumed access to the whole average-sized file in your calculations.

I should have said that it is not all that rare for a user to need just a few bytes from the "last" cluster. It is no more likely to happen with a 4 KB cluster than with a 64 KB cluster.

True, in getting to the last cluster you are on average going to do 16 times as many I/O's. But that will have an impact only for a program that is racing through a significant amount of data. However, as I have said before a human user is the slowest part of the system. That is why I was careful to say "lay user (not a programmer)".

RAM is "cheap" in absolute terms but not in relative terms. And, because it is much more limited compared to disk space it is much more "expensive" in terms of performance.

Whether the "pending" clusters, i.e. the clusters already in the buffer, are sequential or not does not make any difference. A buffer of a given size will hold 16 times fewer clusters than 4 KB clusters. While the OS will have to do 16 times fewer reads to fill up the buffer it will very probably need to do many more re-reads for the overwritten "least recently used" cluster. How many more will depend on usage. The same would go for a paging file of any given size.

I full agree that I/O takes "eons" compared to chipset activity. But the very first thing an app does is wait for RAM resources to become available just to get started. Also, when an app needs to do an I/O it has to wait for another "slow" I/O currently under way to finish. During that time the much faster chipset is not doing anything and its speed does not matter. It is those wait times in the RAM bottleneck that kill performance. Except in processor/processing intensive cases the much faster chipset is not productively active most of the time so its faster speed isn't really that significant. But free RAM available for apps is.

RAM is really the bottleneck that most often kills performance. I have looked at all these things and done careful detailed calculations during my 30 years in System and databse design many many times. I still stand by what I said: "You have to take the whole system ... into account".

I did not work on it but on one consulting assignment I got to know about an app where they designed 16 KB records to reduce/minimize I/O's. This was on a mainframe with much faster clock speeds and much more RAM. Also bear in mind these were only 16 KB records, not 64 KB. The computer slowed down to a crawl because the RAM was choked up. I have worked on or known of other cases like that although none of them were quite so drastic.

And, I have found on this and other forums that most topics on performance and memory errors revolve around free/available RAM and the number of continuously running background processes - not on chipset and I/O speeds or on the number of I/O's or on how cheap/expensive RAM and Disk space are in dollar terms.

I won't belabor the prefetch issue for a measly 5% performance difference. :D

About smaller cluster sizes, you can go down to 512 bytes. I think those too will give problems to the lay user. I reason that MS chose 4 KB as the default for the lay user after careful thought and research. It would be nice to know the actual effects of smaller clusters but unfortunately I am not a man of leisure.

It has been interesting but I have already spent too many hours on this topic. I think I will end it here. God, I'm thirsty! :chef:
 
Feel free to not reply..

Gunny said:
Virtually everyone has one or more large and small "databases". A list of your music collection or of football scores are databases. It is just that a lay user does not think of them as databases but they are databases just the same. A database is just an organized way of keeping track of a large number of things you are interested in. There is no point in getting and keeping something if you can't find it when you need it.

I don't think an index of all my music would span 700MB..

I have to defer to your much superior knowledge of the MS Jet DBE but does it access the HD directly? If not, the OS will still read the whole cluster for it and use up a whole cluster's worth of RAM.

That's what I meant. If the OS clusters are the same size as the ones DBE uses then there will be no unneeded data read.

On this whole issue of average file size I was not taking issue with you using it in your calculations. I was taking issue with the fact that you implicitly assumed access to the whole average-sized file in your calculations.

The average file size means that for and average computer (I think my system is pretty "normal") files accessed will be of reasonable size and reading the files whole into RAM for processing would be a feasible tactic for a programmer.

RAM is "cheap" in absolute terms but not in relative terms. And, because it is much more limited compared to disk space it is much more "expensive" in terms of performance.

More data that can be stored does not mean better performance. I do not need to buy 200GB RAM to match a hard drive.

Whether the "pending" clusters, i.e. the clusters already in the buffer, are sequential or not does not make any difference. ...

Data being accessed in sequence faster. And, to flush/fill a buffer 16 times versus once requires 15 times more IO time for same amount of real data, remember?

I full agree that I/O takes "eons" compared to chipset activity. But the very first thing an app does is wait for RAM resources to become available just to get started. Also, when an app needs to do an I/O it has to wait for another "slow" I/O currently under way to finish. During that time the much faster chipset is not doing anything and its speed does not matter. It is those wait times in the RAM bottleneck that kill performance. Except in processor/processing intensive cases the much faster chipset is not productively active most of the time so its faster speed isn't really that significant. But free RAM available for apps is.

You seem to think that in any given moment of time one needs to flush some buffers/swap out processes to have some RAM available.

Also, not all programs work by the trivial model of start, read data, process, write data.

RAM is really the bottleneck that most often kills performance. I have looked at all these things and done careful detailed calculations during my 30 years in System and databse design many many times. I still stand by what I said: "You have to take the whole system ... into account".

That is true, but if a person comes to you and asks "how can I improve my disk performance" then you will answer "buy more RAM"? That is salesperson speak.

I did not work on it but on one consulting assignment I got to know about an app where they designed 16 KB records to reduce/minimize I/O's. This was on a mainframe with much faster clock speeds and much more RAM. Also bear in mind these were only 16 KB records, not 64 KB. The computer slowed down to a crawl because the RAM was choked up. I have worked on or known of other cases like that although none of them were quite so drastic.

I had the impression we were talking about desktop PCs here..

About smaller cluster sizes, you can go down to 512 bytes. I think those too will give problems to the lay user. I reason that MS chose 4 KB as the default for the lay user after careful thought and research. It would be nice to know the actual effects of smaller clusters but unfortunately I am not a man of leisure.

I personally think 4KB clusters were chosen to suit the swapfile and to ward off those people who are willing to sue if they find that their HD holds less data than advertised.

I think claiming that Microsoft makes wise design decisions after very careful thinking is not a good idea BTW :p
 
well,if I ask what caused my OS defrag utility and two good programs not to work,do I have to start a new thread?
PS is there a generic formula for tweaking IDE's scsi's and sata's?
Thank You
 
Nodsu, I think we've flogged this virtually to death. I will end this with just three last, really very last, comments.

First, I wasn't engaging in salesperson speak for more RAM. Just the opposite. I've said there is only a limited amount of RAM a PC can accomodate. You cannot add more RAM beyond that - no matter how cheap it is. That is precisely what makes RAM such a critical bottleneck.

Second, you seem to be switching between programmers/programs and lay users in your comments. I have been careful to distinguish between them. I agree, and have agreed, with the things you say are important for programmers/programs - but they are not so for lay users.

Third, I could not add all the options data to my investment database because of the 2 GB limit for MS Access databases. I am thinking of re-doing it in Oracle for that reason. I am also thinking of setting up a couple more MS Access databases to keep track of other things I'm interested in. However, I submit that having one or more formal databases and/or their large size(s) doesn't make one not "normal".

With respect, I will end this here. We now seem to be arguing for the sake of arguing and not wanting to "admit defeat". And, we are essentially going round and round the same things.

Samstoned, I thought your previous question was just a rhetorical question that seemed to argue against cluster sizes greater than 4 KB. I didn't realize it was a real question.

Yes, you should have started a separate thread, really.

But I will answer it briefly here. Generally, defraggers require a minimum amount of real free space to function. I think in most cases that minimum is 15%.

The amount of free space shown in the disk's Properties panel is a bit misleading. It does not include the space taken up by Norton Protect. If the files in Norton Protect (if you have that, or something like that, active) take up enough space then the defragger may not have enough real free space to operate.

If you have Norton Protect active, try emptying it first and then running your defragger.

I don't think your defragger not running had anything to do with cluster size. Your defragger ran with 4 KB cluster size probably because in reformatting your disk to 4 KB clusters you "emptied" Norton Protect.

If you want to take this further, perhaps you can start a new thread on it. Or maybe someone can shift the relevant posts from here to a new thread and subscribe me to it.

HTH
 
Gunny said:
Third, I could not add all the options data to my investment database because of the 2 GB limit for MS Access databases. I am thinking of re-doing it in Oracle for that reason. I am also thinking of setting up a couple more MS Access databases to keep track of other things I'm interested in. However, I submit that having one or more formal databases and/or their large size(s) doesn't make one not "normal".

MS SQL isn't so bad if you can get a copy. In comparison to Access especially. It's a pretty good implemenation.

Off topic I know. Sorry.
 
Thanks Hoopajoop. I think I may have MS SQL. I have a TechNet Plus subscription but I just haven't gotten around to installing it yet. There's a lot of stuff on it (Virtual PC, Windows Server 2003, etc.) that I want to get into. But I also have Oracle10g and I want to learn that too.

So many things to do... and not enough time. They should triple the number of hours in a day. :D
 
Gunny said:
So many things to do... and not enough time. They should triple the number of hours in a day. :D

Then they'd make you work all of them..Seriously though, if you don't know SQL you'd go far learning standards based first. If you know ANSI SQL you can get a HUGE head start working with oracle, or ms guff. Almost all sql database server implementations support ANSI. Equally almost all have their own little unique bits.

Good luck. It should be fun getting that stuff over to a nice database. I need to spend some more time with 'ole sql.
 
Thanks, Hoopajoop, I do know SQL. It is logical and straightforward and pretty easy - at least for me. I've just never really used it much. But actually using it and PL/SQL is one of my main reasons for wanting to get into Oracle.

But that would only be the start. I want to learn how to operate Oracle and to actually design databases and apps for optimum performance - as I designed high-performance IDMS databases in the past.

I got into computers because I enjoy working on them. So if they made me work all those extra hours I very probably wouldn't mind - particularly if they paid me well to enjoy myself - as they have been doing all these years past!
 
Status
Not open for further replies.
Back