Benchmarking filesystems

Status
Not open for further replies.

Crazy

Posts: 138   +0
Hey,

I'm doing my internship atm and one of the projects I have to do is setting up a clustered file-system.
I'm using glusterfs, so now I want to run some benchmarks on it.

The question is, what tools are best to use and more importantly, how to use them.
For benchmarking I found bonnie++ and iozone(I'm using fedora8 atm). But how to use it properlly?

The thing I have to do is write many small files to the filesystem. They will get written to different storage servers.

I'm new to linux stuff, so any help will be appriciated!
 
Hum.. man bonnie++? man iozone?

The sad truth is that there is no "proper" way to run these things, because it all depends on what and how you are benchmarking. The main guideline would be to come up with a set of benchmarks that are the same and fair for all test cases.

Also, bonnie and iozone are artificial benchmarks and therefore almost useless. You should consider setting up something real-life-like and get some useful results with a realistic workload based on some assumed deployment scenario for your filesystem.
 
To get close to what is happening now in production would be to create allot of new files from 100kb to 1mb in /home/import where I mounted the filesystem.
Timing that would create the best benchmark.

But how to do it?, I'm pretty inexperianced with linux :(, still learning new things each day :)

The place I'm doing my internship is here http://www.nomadesk.com/
I would be testing it to see if it's doable in production.

What do you mean with "artificial benchmarks"?, the thing I'm trying to test is how glusterfs scales, so write allot of files, very fast. Have a gigabit switch available, so bandwidth isn't an issue.

Thanks for the help :)
 
Ok, so i've made this:

Code:
clear

for ((  i = 0 ;  i <= 1000;  i++  )) do
  `dd if=/dev/zero of=/home/import/1Mfile$i bs=1M count=1`
done
rm -f /home/import/*

But how can I hide the output?, I execute the script with "time ./<script>".
But how to suppress the output from 'dd'?, I tried adding '> /dev/null' but that didn't work.
Any ideas?, I'm new at this kind of stuff :p


* EDIT *

Ok, nvm. Found it.
Code:
`dd if=/dev/zero of=/home/import/1kfile$i bs=1KB count=1 2>/dev/null`

Had to add a '2'. Typical, i searched/tried for a while, didn't find it, and the moment I ask for help I found the solution :p


Have another question:
How can I get the time inside a script?
So something like this:
Code:
BEGIN TIME
for ((  i = 0 ;  i <= 1000;  i++  )) do
  `dd if=/dev/zero of=/home/import/1kfile$i bs=1KB count=1 2>/dev/null`
done
END TIME
echo TIME


BEGIN TIME2
for ((  i = 0 ;  i <= 10;  i++  )) do
  `dd if=/dev/zero of=/home/import/1kfile$i bs=500KB count=1 2>/dev/null`
done
END TIME2
echo TIME2
 
a- all benchmarks are artificial as one can only document a controlled environment.
b- your scripting is ok but it serialize the file i/o (just one kind of test)
c- use an outer script to launch the existing script into background by appending '&' to invocation

suggest bs=X count=Y be parameters so you can measure blocking factors as
well as file sizes

  1. you need multiple file creation,
  2. multiple file reads (cat $fn >/dev/null)
  3. multiple file updates-inplace.

AND

some means to perform these actions at random locations.
(3) above will assist here as the files are preexisting and thus move the HD arm
to each as needed.

(3) will require a program that fopens the file in mode 'r+'. This will
cause existing sectors to be overwritten rather than deleting the old file on
the fopen mode 'w' and then just recreating it.
 
Well, the purpose of your system is not to be filled with files from zero, is it?
I would imagine the purpose of the system is to be full of files and handle operations on these files real fast.

So your starting point would be to put a bunch of real-looking data there. Maybe copy it from an existing system, possibly obfuscating file contents and filenames for security reasons.
After that you could simulate tons of simultenaous file accesses, writes, creations, deletions etc, whatever the real life situation is. After you are done with the first run, format the filesystem clean, copy the same starting point data again and start over.


It's a less known fact.. You can do pretty much everything with for loops that you can do with normal commands.
Code:
time for i in whatever; do
   some
   other 
   stuff
done
time for j in somethingelse; do
   some
   more
   tasks
done
 
Status
Not open for further replies.
Back