Recommendations for RAID 0 Stripe Size

Status
Not open for further replies.

Phantasm66

Posts: 4,909   +8
Okay, I am going to create a RAID 0 Stripe set on my PC, using 2 40 GB hard drives. If this is successful and I feel its beneficial, I am going to move to a 4 disk array using another 2 disks. (All HDDs are identical size, model, make, etc....)

My motherboard is an IWILL KA-266 R with an integrated RAID controller, but if I get to like this I might invest in a more expensive PCI IDE RAID controller card.

My question, to all of those with a home RAID system, is this: What stripe size did you use?

Please bare in mind that the partition created from this array is going to be used for CD and DVD rips, DivX encoding, downloading large files to. Therefore, I am tempted to stray away from the suggested notion of pick the smallest stripe size possible because I will not have a lot of small files on the partition which will occupy this array.

The partition will likely contain a lot of Mp3s as well, but lets remember that the average mp3 is about 4 MB or so. Its nothing like the size of the smallest stripe size available, which is 64K.

I am not really sure what decision to make, and so am asking for advice here.

Its not often that I do that, so here is your opportunity to impress me.

Here is some extra information on the problem:

Stripe Sizes
We suspect that many of you out there are interested in RAID for its performance advantage. Stripe sizes play a very important role in the performance of RAID arrays and thus it is critical to understand the concept of striping before we delve any further into RAID discussion.

As we mentioned before, stripes are blocks of a single file that are broken into smaller pieces. The stripe size, or the size that the data is broken into, is user definable and can range from 1KB to 1024KB or more. The way it works is when data is passed to the RAID controller, it is divided by the stripe size to create 1 or more blocks. These blocks are then distributed among drives in the array, leaving different pieces on different drives.

Like we discussed before, the information can be written faster because it is as if the hard drive is writing a smaller file, although it is really only writing pieces of a large file. At the same time, reading the data is faster because the blocks of data can be read off of all the drives in the array at the same time, so reading back a large file may only require the reading of two smaller files on two different hard drives at the same time.

There is quite a bit of debate surrounding what stripe size is best. Some claim that the smaller the stripe the better, because this ensures that no matter how small the original data is it will be distributed across the drives. Others claim that larger stripes are better since the drive is not always being taxed to write information.

To understand how a RAID card reacts to different stripe sizes, let's use the most drastic cases as examples. We will assume that there are 2 drives setup in a RAID 0 stripe array that has one of two stripe sizes: a 2KB stripe and a 1024KB stripe. To demonstrate how the stripe sizes influence the reading and writing of data, we will use also use two different data sizes to be written and read: a 4KB file and a 8192KB file.

On the first RAID 0 array with a 2KB stripe size, the array is happy to receive the 4KB file. When the RAID controller receives this data, it is divided into two 2KB blocks. Next, one of the 2KB blocks is written to the first disk in the array and the second 2KB blocks is written to the second disk in the array. This, in theory, divides the work that a single hard drive would have to do in half, since the hard drives in the array only have to write a single 2KB file each.

When reading back, the outcome is just as pretty. If the original 4KB file is needed, both hard drives in the array move to and read a single 2KB block to reconstruct the 4KB file. Since each hard drive works independently and simultaneously, the speed of reading the 4KB file back should be the same as reading a single 2KB file back.

This pretty picture changes into a nightmare when we try to write the 8192KB file. In this case, to write the file, the RAID controller must break it into no less than 4096 blocks, each 2KB in size. From here, the RAID card must pass pairs of the blocks to the drives in the array, wait for the drive to write the information, and then send the next 2KB blocks. This process is repeated 4096 times and the extra time required to perform the breakups, send the information in pieces, and move the drive actuator to various places on the disk all add up to an extreme bottleneck.

Reading the information back is just as painful. To recreate the 8192KB file, the RAID controller must gather information from 4096 places on each drive. Once again, moving the hard drive head to the appropriate position 4096 times is quite time consuming.

Now let's move to the same array with a 1024KB stripe size. When writing a 4KB file, the RAID array in this case does essentially nothing. Since 4 is not divisible by 1024 in a whole number, the RAID controller just takes the 4KB file and passes it to one of the drives on the array. The data is not split, or striped, because of the large stripe size and therefore the performance in this instance should be identical to that of a single drive.

Reading back the file results in the same story. Since the data is only stored on one drive in our array, reading back the information from the array is just like reading back the 4KB file from a single disk.

The RAID 0 array with the 1024KB stripe size does better when it comes to the 8192KB file. Here, the 8192KB file is broken into eight blocks of 1024KB in size. When writing the data, both drives in the array receive 4 blocks of the data meaning that each drive only has the task of writing four 1024KB files. This increase the writing performance of the array, since the drives work together to write a small number of blocks. At the same time reading back the file requires four 1024KB files to be read back from each drive. This holds a distinct advantage over reading back a single 8192KB file.

As you can see, the performance of various stripe sizes differ greatly depending on the situation. Just what stripe size should you use?
source: http://www.anandtech.com/storage/showdoc.html?i=1491&p=5
 
Originally posted by Phantasm66
My question, to all of those with a home RAID system, is this: What stripe size did you use?

Please bare in mind that the partition created from this array is going to be used for CD and DVD rips, DivX encoding, downloading large files to. Therefore, I am tempted to stray away from the suggested notion of pick the smallest stripe size possible because I will not have a lot of small files on the partition which will occupy this array.

Ok, I don't have any experience with RAID arrays, but lack of experience hasn't stopped me for giving my .02$ before, so why start now :D

Given the information you gave us, bigger strip size is Good[size=1]TM[/size] :)

You actually gave the answer in the text from Anandtech:

Now let's move to the same array with a 1024KB stripe size. When writing a 4KB file, the RAID array in this case does essentially nothing. Since 4 is not divisible by 1024 in a whole number, the RAID controller just takes the 4KB file and passes it to one of the drives on the array. The data is not split, or striped, because of the large stripe size and therefore the performance in this instance should be identical to that of a single drive.

Reading back the file results in the same story. Since the data is only stored on one drive in our array, reading back the information from the array is just like reading back the 4KB file from a single disk.

The RAID 0 array with the 1024KB stripe size does better when it comes to the 8192KB file. Here, the 8192KB file is broken into eight blocks of 1024KB in size. When writing the data, both drives in the array receive 4 blocks of the data meaning that each drive only has the task of writing four 1024KB files. This increase the writing performance of the array, since the drives work together to write a small number of blocks. At the same time reading back the file requires four 1024KB files to be read back from each drive. This holds a distinct advantage over reading back a single 8192KB file.

As the array will contain many large files, you'll get the best performance with the largest strip size... Any small files will only get the performance of one drive, but as you won't have many of those files (if any, why care too much about it?)

And with the average mp3 size being 4mb, you shouldn't see any slowdown of the array...

Whereas if you set the size smaller, you'll get the same scenario as below...

This pretty picture changes into a nightmare when we try to write the 8192KB file. In this case, to write the file, the RAID controller must break it into no less than 4096 blocks, each 2KB in size. From here, the RAID card must pass pairs of the blocks to the drives in the array, wait for the drive to write the information, and then send the next 2KB blocks. This process is repeated 4096 times and the extra time required to perform the breakups, send the information in pieces, and move the drive actuator to various places on the disk all add up to an extreme bottleneck.

.02$ :)
 
thanks for your reply :)

I am nearing towards a 2MB stripe size. The average size of an mp3 is 4MB. And most files on the partition will be big, like 640MB and stuff...

this differs from an OS partition, or games partition (lots of smaller files) which would be better on a stripe with a much smaller stripe size. in my case, at least for the moment, these things will remain on a non-striped HDD.

its more for divx encoding, perhaps video capture, mp3 ripping, area to store things which will be burned to mp3. mostly these are bigger files like videos and stuff.

i also have a media partition which i loaded with mpeg files of music videos - this will go on the stripe as well.

games are being moved, and as i said no OS will be on the stripe I think. Maybe I will create another stripe with a smaller stripe size for the games and OS later....
 
Stripe Size and Caching
When creating RAID arrays, you must consider the RAID stripe size, write- caching, and read-ahead. Stripe size is the amount of data written by the RAID controller to a disk before writing to the next disk. To maximize overall performance, configure your RAID arrays with stripe sizes that correspond to the size of the anticipated system I/O requests. In general, you want to use smaller stripe sizes for the operating system and transaction logs (8K to 16K), and larger stripe sizes for the databases (128K to 256K). The exact size depends on several fac-tors, including the number of users and estimated size of your database. Changing the stripe setting from the default setting can adversely affect your RAID configuration, including limiting the maximum number of drives you can have in a single array.
source: http://www.devx.com/premier/mgznarch/exchange/2000/02feb00/ee0100/ee0100.asp
 
But should I really be using a 2MB stripe size for my media partition??

Actually no. I read the Anandtech article more fully, and found my RAID device there. On benchmarks it seems like the best stripe size is 512 K overall.

Don't underestimate how important this is if you are making a stripe. Its possible to actually get worse performance if you have a poorly chosen stripe size setting.
 
Just curious but what exactly is the difference between stripe size and block size??? Stupid me, I didn't take these into account when I first made my raid 0 array, I left everything to default, and now I have too much media, etc. to just reformat and start over. I believe I have a 64K block size and 512K stripe size if I'm not mistaken. Anyone knows if this is any good? I have two IBM Deskstars at 60GB in Raid 0.
 
If I would have answererd without reading this thread I would have said 512kb, but now that I did take the time and read it I'll recommend, *drumroll* 512kb anyway ;)

I've also just purcached a RAID 0 array, it consists of two of Maxtors latest oil bearing 7200rpm harddrives @ 20GB, and I must say the thing is fast!

Though my stupid built in Promise controller wont allow me to change the RAID stripe size, it's set default to 64kb :(

Anyway; The only real advice I can give you is try and see, that's the only way to find an optimal stripe size really...

Or you could read this great 200 page artcile about harddrives and you will as me know everything out and in (for a day or two :D)

http://www.storagereview.com/welcom...00/ref/hdd/perf/raid/levels/techCapacity.html
 
I created the stripe today.

I can confirm that there is a very significant performance boost for certain types of functions.

Its well worth thinking about, if not for performance boost or redundancy functions (which I did not implement) but also simply for the ability to add several HDDs into one large one.
 
Just curious, but what stripe and cluster size did you use? Also, how did you go about changing the cluster size in XP assuming your using XP that is. I hear its difficult to change the cluster size without using a 3rd party program.
 
I used a 512 KB stripe size, since this was the one that in all benchmark for my controller with varying types of HDD, functions, operating systems, etc was the one that won out best.

The stripe is all formatted a one great big NTFS drive. I used the default cluster size of 4 KB for the volume which is the default for NTFS volume of this size (around 80 GB) Even although FAT32 has a theoretical limit of several TB, when doing anything with it greater than 32GB it seems to not like it much sometimes, so NTFS seemed like the logical choice. Linux shall not require write access to the partition since I don't use Linux for the functions which shall utilise this array.

I used disk administrator in Windows 2000 Server to create the partition and to format it, I did not use any built in functions in the RAID controller to format or partition (on booting up the stripe just appears to the OS as just one big hard drive.)

Here is some technical data for my stripe:

Volume STRIPE S:
Volume size = 76,340 MB
Cluster size = 4 KB
Used space = 42,225 MB
Free space = 34,114 MB
Percent free space = 44 %

Volume fragmentation
Total fragmentation = 0 %
File fragmentation = 1 %
Free space fragmentation = 0 %

File fragmentation
Total files = 28,193
Average file size = 1,932 KB
Total fragmented files = 11
Total excess fragments = 13
Average fragments per file = 1.00

Pagefile fragmentation
Pagefile size = 0 bytes
Total fragments = 0

Directory fragmentation
Total directories = 1,804
Fragmented directories = 200
Excess directory fragments = 1,004

Master File Table (MFT) fragmentation
Total MFT size = 30,023 KB
MFT record count = 30,006
Percent MFT in use = 99 %
Total MFT fragments = 2

All seems fine as far as I can tell.

I have yet to do anything serious with it like capture video to it, rip DVD to it, etc which will really let me see how much better it is than just spanning some volume across two HDDs but so far certainly any operating involving analysing it or writing to it are certainly faster.
 
I run 32k stripe size on my new drives. Last system I ran 64k stripes. Since they're different drives I have no comparison. From what I've read you'll get best performance with either 32k or 64k stripes. Most of your system files are actually very small so you'll be reading a bunch of big stripes just to access a 20k DLL. Why do a 4 meg read for each Icon on your desktop??? You'd have to have a really messed system to have an MP3 fragmented into 4096 noncontiguous pieces. I have a drive volume with 650 MP3s and there is 0% fragmentation. MP3s are rather slow data streams compared to disk transfer rate. I get 90 MB/s transfer rate in read and write with the Atto benchmark.
 
Its usage for gaming is kind of limited, as far as I recall. I am sure I read an article once that said that. Games aren't really hard drive intensive in general - all you would really improve is loading times. Once all that stuff is loaded into RAM, its then up to the mobo, CPU and graphics to determine game performance.

The RAID volumes I use are mainly for redundancy, and for speed. Speed with HDDs is important for functions like multimedia (video capture, or just dealing with large files in general, etc.)
 
Raid Stripe Size Formula

I may have discovered a formula for determining an appropriate stripe size.

If you go into XP disk defragmenter and click on Analyze for your array, it should return a value of the average file size.

Take this number, and divide it by 2 x # of harddrives in the array, then round DOWN to the nearest available stripe size you can choose.

Let's assume that the average file size on your harddrive is 512Kb.
Divide that by a factor of the above calculated number. In my case, I have 4 harddrives in my array, so I would choose 8.

So, 512Kb divided by 8 = 64K (the optimal stripe size for my average files).

Let's say you had 2 harddrives in the array, then using the above formula, you would choose 128K stripe size.

This is contingent on whether you are just storing data and seldom using it or are actively using the data. If just storing Data, then put the operating system on it's own harddrive, not on the array. OS=small stripe size, data=big stripe size in your case.

Another important OVERLOOKED factor is the Cluster Size of the formatted harddrives.

In my humble opinion, I would set the cluster size for 1/2 of the stripe size when formatting. This should reduce disk fragmentation and can appreciably decrease drive maintenance.

I have done several experiments, and smaller stripe sizes for the OS can make it really fly! But with smaller stripe sizes, you also have to defragment daily (not my cup of tea). Personally, I am using a 64K stripe with a 32K cluster. My fragmentation is pretty normal as opposed to a single harddrive. And I wouldn't suggest going with any stripe size less than 8K - you go from extreme speed at first to a snails pace in a few hours of heavy use, without defragmentating. Highpoint can let you go lower than that, and up to 2048Kb if desired.

It works quite nicely except on boot up, but that could be the umteen things I have running...

I would experiment further, but I wore out my XP disk and had to buy another one - lol. And Microsoft doesn't like it when you have to keep activating numerous times for your experiments.

To the guy with the MP3's and such, I suggest the following combination -

4Mb average filesize = 512 stripe size
Cluster size when formatting under XP should be 1/2 x 512 or 64K cluster size (max cluster size allowed under XP).


This should give you very nice performance :hotbounce and you shouldn't have to defragment too often. :bounce:

By all means, email me. I would love to hear how well this works for you.

Zolar1@hotmail.com

PS and anyone else's comments as well too. Always trying to refine the formula.
 
OOPS! Forgot something...

My above formula is used on 'realized' drives in an array - meaning a RAID 0 'Sees' all the drives, and a RAID 5 'sees' all the drives, minus one. For RAID 10, use 50% of the total number of drives for calculations.

Sorry if I confused anybody. :giddy:
 
Phantasm66 said:
Its usage for gaming is kind of limited, as far as I recall. I am sure I read an article once that said that. Games aren't really hard drive intensive in general - all you would really improve is loading times. Once all that stuff is loaded into RAM, its then up to the mobo, CPU and graphics to determine game performance.
Yes and no.
High end games such as Half-Life 2 all rely on loading a great deal of config, texture and model/skin files into RAM, that load time can be quite considerable, especially when running the game with all of the bells and whistles turned on.

I grant that the CPU, RAM timing, chip set and especially the display adapter play the biggest part in playing the games, but the load times can be impacted in a VERY significant way when using RAID 0.
 
further discussion of stripe size and cluster size . . .

Zolar1 said:
I may have discovered a formula for determining an appropriate stripe size.

If you go into XP disk defragmenter and click on Analyze for your array, it should return a value of the average file size.

Take this number, and divide it by 2 x # of harddrives in the array, then round DOWN to the nearest available stripe size you can choose.

Let's assume that the average file size on your harddrive is 512Kb.
Divide that by a factor of the above calculated number. In my case, I have 4 harddrives in my array, so I would choose 8.

So, 512Kb divided by 8 = 64K (the optimal stripe size for my average files).

Let's say you had 2 harddrives in the array, then using the above formula, you would choose 128K stripe size.

This is contingent on whether you are just storing data and seldom using it or are actively using the data. If just storing Data, then put the operating system on it's own harddrive, not on the array. OS=small stripe size, data=big stripe size in your case.

Another important OVERLOOKED factor is the Cluster Size of the formatted harddrives.

In my humble opinion, I would set the cluster size for 1/2 of the stripe size when formatting. This should reduce disk fragmentation and can appreciably decrease drive maintenance.

By all means, email me. I would love to hear how well this works for you.

Zolar1@hotmail.com

PS and anyone else's comments as well too. Always trying to refine the formula.

another thread on this topic has been started at

https://www.techspot.com/vb/showthread.php?p=318183#post318183

in particular user Nodsu presents the following opinion on this issue -

Nodsu said:
This is a useless formula (from Zolar1).

First, it assumes that you actually use all your files. I would rather assume that the 80:20 rule applies here. Second, it assumes that you always want to read/write the whole file instead of parts of it. So, for this kind of überoptimising, the file size is a useless parameter. One may be able to use the size of average file access, but you would have to profile your IO habits to get that number.

is there a utility which determines the average file size that is frequently used on a given partition?

I know Norton's speeddisk program moves frequently accessed files to the front of the disk during a HDD defrag, but I know of no way of finding WHICH files it labels as being frequently accessed.

Nodsu said:
Knowing a bit more about how disk access works in operating systems, I would recommend another formula:

stripe size = cluster size / number of RAID-0'd drives

A filesystem cluster is the smallest amount of data a program can read from the disk (or write). Even if you request only a single byte, the OS will generously read in the whole cluster (4 kilobytes by default for NTFS). So, if you have two RAID-0 drives, a stripe size of 2K would make sure that you always have each drive processing the sama amount of data.

Mind you, for this to work, the stripe size has to work out exactly. Cluster and stripe sizes are always in power-of-two multiples of 512 bytes. So if you have an odd number of hard drives in RAID-0, you cannot possibly get an optimal stripe size.

Nodsu went to elaborate a bit further -

Nodsu said:
IMHO Zolar1 does not have a clear idea of how hard drives and filesystems work. He claims that stripe size and disk fragmentation are somehow related, which is just plain wrong. Most likely he has the cause and effect mixed up.

Fragmentation is a filesystem issue, and has absolutely nothing to do with the underlying hardware. You get exactly the same fragmentation no matter how you configure your RAID stripes underneath. Fragmentation does depend on the cluster size though, so if you follow Zolar1's advice and tie the size of clusters to the size of stripes, then you get worse fragmentation with smaller stripes because your clusters will be smaller too (by your own choice), not because the stripes are somehow affecting the filesystem itself.

If we bring filesystem configuration into the formula, assuming that you are actually going to tweak NTFS, then you should first consider your filesystem clusters and only then the RAID stripes. If you make your clusters bigger, you waste disk space, but you will get less fragmentation and more performance. Alas, any other cluster size than 4K will also deny you some NTFS features like file compression. Once you get your cluster size right, decide on the stripes.

Or, if you are not fond on maths and planning, do like I do - set the stripe size to the smallest possible value. This will make sure that no matter the cluster size, the request will always cover the maximum amount of drives. Even if the disk access will span the array several times, the end result will be the same (unless you have a very, very, very stupid RAID controller/driver) - all the drives will do the same amount of work.

what I infer from Nodsu's comments is that he recommends a stripe size of 32 kb and a cluster size of 64 kb (max for NTFS) if the user is not worried about wasted file space and is running RAID-0 with two HDD's.

since his advice is different than yours, I would like to see your reply.

joelwest
 
Status
Not open for further replies.
Back