Raid and backup problem

Status
Not open for further replies.
I have 4 RAID5 slots that seem to be shutting down evey day. This happens soon after a daily backup is performed on them. I am thinking the backup has something to do with these slots shutting down. The only way to fix this at the moment is to manually reboot the RAID server. I looked at the event log for my RAID5 slots and it seems to me that they are trying to reset themselves but it fails. I don't know why the RAID slots would need to be reset after a backup. The only management utility I am using for my RAID drives is Silicon Image SATARAID5 (Array Manager), ver. 1.5.

I have posted the event log entries for August 6 as an image file.

If anyone has any idea why this is happening please reply. I appreciate all your help. Thanks.

LRT
 

Attachments

  • log image.jpg
    log image.jpg
    89.5 KB · Views: 8
are you certain as to the timing of the disconnect? AFTER the backup completes?

I could envision software taking the raid offline BEFORE the backup.
 
What and how are you backing up using what software and what devices?

You have fakeraid, meaning that the array depends on your CPU. RAID5 means heavy load too. It could be that, for example, the compression during the backup procedure plus accessing the backup medium put too much load on the system and make the (crappy) Silicon Image driver fail.
 
I will answer both above replies.

jobeard: Daily incremental backups are scheduled everyday at 6:00PM. Last night the backup started at 6:00PM and was completed at 7:47PM. And the RAID drive went down at 7:39PM. It seems the RAID is going down during the backup.

Nodsu: I am using the default backup utility that comes with Windows Small Business Server 2003. All backups weekly and daily seem to run fine. So you think the load the backups put on the server CPU is causing the RAID drives to go down? That makes sense. I am a new IT guy at this small company and the only IT guy. And I don't know a lot about RAID. But, from what I can see the RAID is directly connected to the server via 4 SATA cables and an LED cable. Is this what you mean by "fakeraid."

What can I do to prevent the RAID from going down?

Thanks.

LRT
 
By fakeraid I mean that you have an extremely cheap, simple and stupid RAID controller that doesn't actually do anything. All the "RAID" is done by the drivers (= your CPU).

You could try updating the drivers. Also, try the backup without compression (if you have the option). Are you writing the backup to the same disk? If yes, then maybe try an external drive, so the RAID wouldn't have to do both reads and writes.
 
Nodsu said:
Are you writing the backup to the same disk? If yes, then maybe try an external drive, so the RAID wouldn't have to do both reads and writes.
hopefully, it is obvious that placing a backup on the same device
that is being backed-up is an exercise in futility -- when you need it the most,
the backup would be unavailable :eek:
 
The backup is not being placed on the same device. It is being store on a remote network storage device.

I need to know:

What would be a good setup for my RAID and server, so my RAID does not have to rely on my server's CPU and crash when it is being backed up?
 
LRT said:
The backup is not being placed on the same device. It is being store on a remote network storage device.
great :)
What would be a good setup for my RAID and server, so my RAID does not have to rely on my server's CPU and crash when it is being backed up?
Can't tell; are you using an onboard raid controller or did you install a PCI or SCSI
controller card to get the raid feature? Without the extra card, your CPU/software
is performing the raid function; not a very good implementation :sad:
 
Mirrored/striped RAID like RAID0, RAID1, RAID1+0, RAID 0+1 are far less CPU-intensive than RAID5.
 
Thank you all for your help.

After messing with the RAID long enough (updating drivers, Flash BIOS, etc.), I finally managed to make the darn thing crash and crash for good. We couldn't even see our data anymore. After messing with it some more we finally managed to get the RAID back up (what a relief!). I cancelled last night's backup and the RAID this morning was up and running fine. Because RAID5 does parity checking, this may have caused to much overload on the controller card. Also, the network attached storage drive that all the backup data was being sent to is also set in RAID5, and therefore also does parity checking, which just slows down the entire backup job and puts a lot of load on the little tiny RAID controller.

I am in the process of "slowly" moving all the data out of the RAID drives and will eventually reconfigure the RAID5 to RAID10. I believe RAID10 is a good option for us. If anyone can tell me anything about RAID10, good or bad, please let me know. THANKS AGAIN!

LRT
 
Disk space is not really a problem right now. Although we produce a lot of data (2-3GB per day), we have 12 500GB HDDs. Thats about 6TB of free space. At that rate we wouldn't need another HDD for another 6 or 7 years. With RAID10 I would use 6 of them as original and the other 6 as mirrored. Striping across 6 HDDs would give us faster reads than if I were to stripe across a lesser amount. Am I right?

The only problem I have now is pulling all the data from my RAID drives into the NAS drives. I need to do this so I can format the RAID drives and setup a RAID10 configuration. Pulling all this data puts a lot of strain on the controller card CPU. This causes the RAID drive to go down (same thing was happening with backup jobs). So basically, I am having to move the data little by little so I can keep the controller happy. This is a slow process because we have a lot of data.

Thanks for the help Nodsu and jobeard. Any other thoughts or comments would be appreciated.

LRT
 
What exact hardware are you using here? Some sort of a special RAID enclosure? Maybe you need a firmware update or something..
 
Update:

We got the data out of the RAID5 drives into a USB2.0 connected external hard drive (MyBook <not related to Apple> 500GB, very cheap at BestBuy by the way). The data (350GB) moved very fast. We were previously trying to move all this data into our NAS storage device (ReadyNAS), but this was connected to our network and the data was moving VERY slow. The ReadyNAS device is what we want to use as our backup storage device. Backup jobs were (and still are) crashing because this data does not seem to be able to move through our network at all.

As for the RAID drives, once we got all the data moved into the MyBook we reconfigured the drives for RAID10. Then we moved all the data from the MyBook into the RAID10 drives. I have to say RAID10 is performing much better than RAID5. Before, we were having problems with the RAID5 drives shutting down during the backups (read above). Now the newly configured RAID10 drives are not crashing (although the backup jobs ARE crashing) and have been up and running for the past week.

This is most definitely, in my opinion, a network problem. Am I trying to move too much data? Here is my network setup:

internet -> router -> server -> switch -> NAS and other clients

The RAID10 device is connected straight into the server. I configured the ReadyNAS, internal NIC, and switch for jumbo frames, but that didn't help much. If anyone has any ideas on how to make this data transfer move more quickly please let me know! Thanks!!

LRT
 
NAS backups has the advantage of placing the data offsite and thus is protected
from catastrophic site failures (ie your hosts). It also provides access to assist
when rebuilding a new site from those backups.

The down side is the performance! You need to schedule for off-shift
AND to segment the backups for smaller clumps of logical data -- Corp data
segregated from User data which might be further divided into departmental data.

Regardless, it MUST be able to transfer successfully; COMPLAIN to the vendor!

BTW; Backup of the OS is wasteful and ineffective; You can't use that data to
create an OS running system. You have to fresh install the OS to even get to
the NAS to start with!

Windows is a special case due to the registry issues;
installed programs
login users​
The users can be saved/restored by getting a USER HIVE ... see MS KB on the issue of HIVES vs REG files.
 
Status
Not open for further replies.
Back