Server 2003 SP2 reboots when an incremental backup job is run

mikepbmike

Posts: 11   +0
Greetings!

We have a server here that keeps rebooting. It's a Proliant ML370 G5 running Windows Server 2003 with SP2. The stop codes vary; we've seen (all beginning with 0x000000) C2, 50, D1, 24, C5, 7E, 44, 0A, & 8E.

The reboots happen when trying to do any kind of backup *other* than a Full backup. We've tried backups using Backup Exec 12.5 & the backup utility that comes with the OS, and reboots occur with both, so we don't think its Backup Exec. The reboots occur at different points in the backups, so there isn't a specific file/folder that's causing issues. And Full backups run with no problem.

Sometimes, chkdsk runs during the reboot, sometimes it doesn't. We've had messages saying "The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume \Device\HarddiskVolume1." We've also had messages saying that chkdsk had run and taken care of some things; see below:
____________________________________________________________
Checking file system on C:
The type of the file system is NTFS.


One of your disks needs to be checked for consistency. You
may cancel the disk check, but it is strongly recommended
that you continue.
Windows will now check the disk.
Index entry {CA0CB8D8-FA7C-43B5-BC5C-0576AA521238}_2.fh of index $I30 in file 0x7d89 points to unused file 0x700c.
Deleting index entry {CA0CB8D8-FA7C-43B5-BC5C-0576AA521238}_2.fh in index $I30 of file 32137.
Index entry {CA0CB~2.FH of index $I30 in file 0x7d89 points to unused file 0x700c.
Deleting index entry {CA0CB~2.FH in index $I30 of file 32137.
Cleaning up minor inconsistencies on the drive.
Cleaning up 703 unused index entries from index $SII of file 0x9.
Cleaning up 703 unused index entries from index $SDH of file 0x9.
Cleaning up 703 unused security descriptors.
CHKDSK discovered free space marked as allocated in the
master file table (MFT) bitmap.
CHKDSK discovered free space marked as allocated in the volume bitmap.
Windows has made corrections to the file system.

71644783 KB total disk space.
26161608 KB in 40372 files.
15064 KB in 8524 indexes.
0 KB in bad sectors.
85391 KB in use by the system.
23040 KB occupied by the log file.
45382720 KB available on disk.

4096 bytes in each allocation unit.
17911195 total allocation units on disk.
11345680 allocation units available on disk.

Internal Info:
20 c5 00 00 0b bf 00 00 44 17 01 00 00 00 00 00 .......D.......
8d 00 00 00 02 00 00 00 63 08 00 00 00 00 00 00 ........c.......
08 36 c3 03 00 00 00 00 10 d4 0f 11 00 00 00 00 .6..............
8e 8d b2 13 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 4e 7a 02 30 00 00 00 00 ........Nz.0....
e0 9c f2 9e 00 00 00 00 ff ff ff ff 11 00 00 00 ................
b4 9d 00 00 00 00 00 00 00 20 c7 3c 06 00 00 00 ......... .<....

Windows has finished checking your disk.
Please wait while your computer restarts.

__________________________________________________________




Below is the Memory.dmp file for the most recent reboot:
____________________________________________________
Microsoft (R) Windows Debugger Version 6.12.0002.633 X86
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Test\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is: C:\WINDOWS\Symbols;srv*
Executable search path is:
Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Built by: 3790.srv03_sp2_gdr.100216-1301
Machine Name:
Kernel base = 0x80800000 PsLoadedModuleList = 0x808af9c8
Debug session time: Thu Oct 14 09:36:28.491 2010 (UTC - 4:00)
System Uptime: 0 days 2:27:15.625
Loading Kernel Symbols
...............................................................
............................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 7ffda00c). Type ".hh dbgerr001" for details
Loading unloaded module list
.......
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 8E, {c0000005, 8089bcdc, b914fac8, 0}

*** ERROR: Symbol file could not be found. Defaulted to export symbols for mfehidk.sys -
Probably caused by : Pool_Corruption ( nt!ExFreePool+f )

Followup: Pool_corruption
---------

3: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003. This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG. This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG. This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8089bcdc, The address that the exception occurred at
Arg3: b914fac8, Trap Frame
Arg4: 00000000

Debugging Details:
------------------


EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

FAULTING_IP:
nt!ExAllocatePoolWithTag+838
8089bcdc 8b07 mov eax,dword ptr [edi]

TRAP_FRAME: b914fac8 -- (.trap 0xffffffffb914fac8)
ErrCode = 00000000
eax=8a7bc170 ebx=8a7bb0c0 ecx=8a7bc170 edx=8a7bc170 esi=8a7bb210 edi=04c507b6
eip=8089bcdc esp=b914fb3c ebp=b914fb78 iopl=0 nv up ei pl nz na pe cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010207
nt!ExAllocatePoolWithTag+0x838:
8089bcdc 8b07 mov eax,dword ptr [edi] ds:0023:04c507b6=????????
Resetting default scope

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x8E

PROCESS_NAME: cqmgstor.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8085bba7 to 8087c4a0

STACK_TEXT:
b914f694 8085bba7 0000008e c0000005 8089bcdc nt!KeBugCheckEx+0x1b
b914fa58 808346cc b914fa74 00000000 b914fac8 nt!KiDispatchException+0x3a2
b914fac0 80834680 b914fb78 8089bcdc badb0d00 nt!CommonDispatchException+0x4a
b914fae4 8089c26e fe4c4cd0 00000000 00000000 nt!Kei386EoiHelper+0x186
b914fb78 f711eb2c 00000001 00000000 3945464d nt!ExFreePool+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
b914fb94 f711ecc7 88c35f08 b914fcf0 00000000 mfehidk+0xcb2c
b914fcdc f713162a b914fcf0 005cfe4c b914fd64 mfehidk+0xccc7
b914fd18 f713201d 85499dc8 b914fd44 b914fd4c mfehidk+0x1f62a
b914fd64 7c82860c badb0d00 005cfe34 00000000 mfehidk+0x2001d
b914fd68 badb0d00 005cfe34 00000000 00000000 0x7c82860c
b914fd6c 005cfe34 00000000 00000000 00000000 0xbadb0d00
b914fd70 00000000 00000000 00000000 00000000 0x5cfe34


STACK_COMMAND: kb

FOLLOWUP_IP:
nt!ExFreePool+f
8089c26e 5d pop ebp

SYMBOL_STACK_INDEX: 4

SYMBOL_NAME: nt!ExFreePool+f

FOLLOWUP_NAME: Pool_corruption

IMAGE_NAME: Pool_Corruption

DEBUG_FLR_IMAGE_TIMESTAMP: 0

MODULE_NAME: Pool_Corruption

FAILURE_BUCKET_ID: 0x8E_nt!ExFreePool+f

BUCKET_ID: 0x8E_nt!ExFreePool+f

Followup: Pool_corruption
---------

3: kd> lmvm Pool_Corruption
start end module name
_________________________________________________________________

What's really confusing is that it all works fine if a Full backup is run. Any help would be greatly appreciated - thanks again for the help!

mikepbmike
 
Just had a reboot about 10 min. ago. What we saw on the BSOD was Bad_Pool_Caller with a stop error of 0x000000c2. Googled it & found: "'Stop 0x000000C2 BAD_POOL_CALLER' error message in Windows Server 2003." This has a hotfix that we tried to install, but we got a message saying the service pack (SP2) was newer than the hotfix, so the hotfix didn't need to be applied. So much for that approach.
 
Problem Fixed!

Per HP, installed the current Support Pack & rebooted the server. Then, installed the current Firmware update & rebooted the server. The server has been running without rebooting ever since. Wanted to share this in case someone else is having the same issue.

Regards,

mikePBmike
 
Back