First of all, Nodsu, I am a "him", not a "her". Which one are you?
Virtually everyone has one or more large and small "databases". A list of your music collection or of football scores are databases. It is just that a lay user does not think of them as databases but they are databases just the same. A database is just an organized way of keeping track of a large number of things you are interested in. There is no point in getting and keeping something if you can't find it when you need it.
I have two MS Access databases. One is to track all the software I have on each of my two PC's. The other is to keep track of stocks and options for investing purposes. These are, I would claim, quite "normal".
I have to defer to your much superior knowledge of the MS Jet DBE but does it access the HD directly? If not, the OS will still read the whole cluster for it and use up a whole cluster's worth of RAM.
On this whole issue of average file size I was not taking issue with you using it in your calculations. I was taking issue with the fact that you implicitly assumed access to the whole average-sized file in your calculations.
I should have said that it is not all that rare for a user to need just a few bytes from the "
last" cluster. It is no more likely to happen with a 4 KB cluster than with a 64 KB cluster.
True, in getting to the last cluster you are on average going to do 16 times as many I/O's. But that will have an impact only for a program that is racing through a significant amount of data. However, as I have said before a human user is the slowest part of the system. That is why I was careful to say "lay user (not a programmer)".
RAM is "cheap" in absolute terms but not in relative terms. And, because it is much more limited compared to disk space it is much more "expensive" in terms of performance.
Whether the "pending" clusters, i.e. the clusters already in the buffer, are sequential or not does not make any difference. A buffer of a given size will hold 16 times fewer clusters than 4 KB clusters. While the OS will have to do 16 times fewer reads to fill up the buffer it will very probably need to do many more re-reads for the overwritten "least recently used" cluster. How many more will depend on usage. The same would go for a paging file of any given size.
I full agree that I/O takes "eons" compared to chipset activity. But the very first thing an app does is wait for RAM resources to become available just to get started. Also, when an app needs to do an I/O it has to wait for another "slow" I/O currently under way to finish. During that time the much faster chipset is not doing anything and its speed does not matter. It is those wait times in the RAM bottleneck that kill performance. Except in processor/processing intensive cases the much faster chipset is not productively active most of the time so its faster speed isn't really that significant. But free RAM available for apps is.
RAM is really the bottleneck that most often kills performance. I have looked at all these things and done careful detailed calculations during my 30 years in System and databse design many many times. I still stand by what I said: "You have to take the whole system ... into account".
I did not work on it but on one consulting assignment I got to know about an app where they designed 16 KB records to reduce/minimize I/O's. This was on a mainframe with much faster clock speeds and much more RAM. Also bear in mind these were only 16 KB records, not 64 KB. The computer slowed down to a crawl because the RAM was choked up. I have worked on or known of other cases like that although none of them were quite so drastic.
And, I have found on this and other forums that most topics on performance and memory errors revolve around free/available RAM and the number of continuously running background processes - not on chipset and I/O speeds or on the number of I/O's or on how cheap/expensive RAM and Disk space are in dollar terms.
I won't belabor the prefetch issue for a measly 5% performance difference.
About smaller cluster sizes, you can go down to 512 bytes. I think those too will give problems to the lay user. I reason that MS chose 4 KB as the default for the lay user after careful thought and research. It would be nice to know the actual effects of smaller clusters but unfortunately I am not a man of leisure.
It has been interesting but I have already spent too many hours on this topic. I think I will end it here. God, I'm thirsty! :chef: