Win 2003 R2 crashes frequently

Status
Not open for further replies.

m0zz0

Posts: 6   +0
Hi,
I have a bunch of Windows 2003 R2 servers running. These are our build servers and for sometime now they crash while builds are ongoing. The problem looks like there is some problem with the clearcase and McAfee drivers when they work together. I need help in figuring out some of the kernel dump. My machines were configured to make a kernel dump so I cannot post any mini dumps but I have just changed it to create mini dumps now. So I should be able to post mini dumps soon.
Right now, I am pasting the results shown by WinDbg. Could you guys please tell me what's going on here.
This is just one of them dumps. And this one doesn't show anything about ClearCase and McAfee or so I think. And this is exactly what confuses me.

Thanks a lot,
m0zz0

Symbol search path is: SRV*c:\Symbols\* http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Server 2003 Kernel Version 3790 (Service Pack 1) MP (4 procs) Free x86 compatible
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Built by: 3790.srv03_sp1_rtm.050324-1447
Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8
Debug session time: Fri Mar 16 02:59:46.200 2007 (GMT+9)
System Uptime: 0 days 10:09:38.531
Loading Kernel Symbols
.............................................................................................
Loading User Symbols

Loading unloaded module list
.......
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck A, {9d7f7008, 2, 1, 80864873}

Probably caused by : memory_corruption ( nt!MiReleasePageFileSpace+55 )

Followup: MachineOwner
---------

3: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 9d7f7008, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, value 0 = read operation, 1 = write operation
Arg4: 80864873, address which referenced memory

Debugging Details:
------------------


WRITE_ADDRESS: 9d7f7008

CURRENT_IRQL: 2

FAULTING_IP:
nt!MiReleasePageFileSpace+55
80864873 213e and dword ptr [esi],edi

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xA

PROCESS_NAME: System

TRAP_FRAME: f7912be4 -- (.trap fffffffff7912be4)
ErrCode = 00000002
eax=8c9f00c0 ebx=00000000 ecx=00000000 edx=00000000 esi=9d7f7008 edi=fffffffe
eip=80864873 esp=f7912c58 ebp=f7912c68 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
nt!MiReleasePageFileSpace+0x55:
80864873 213e and dword ptr [esi],edi ds:0023:9d7f7008=????????
Resetting default scope

LAST_CONTROL_TRANSFER: from 80864873 to 8088bdd3

STACK_TEXT:
f7912be4 80864873 badb0d00 00000000 00000000 nt!KiTrap0E+0x2a7
f7912c68 80842017 00000020 8b010000 808ab4a8 nt!MiReleasePageFileSpace+0x55
f7912d44 80843343 8b01ec28 00000000 808ab4c0 nt!MiCleanSection+0x471
f7912d90 80843466 00000000 8cf5b440 00000000 nt!MiRemoveUnusedSegments+0x963
f7912dac 80948bb2 00000000 00000000 00000000 nt!MiDereferenceSegmentThread+0x5e
f7912ddc 8088d4d2 80843408 00000000 00000000 nt!PspSystemThreadStartup+0x2e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16


STACK_COMMAND: kb

FOLLOWUP_IP:
nt!MiReleasePageFileSpace+55
80864873 213e and dword ptr [esi],edi

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: nt!MiReleasePageFileSpace+55

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP: 42435b14

IMAGE_NAME: memory_corruption

FAILURE_BUCKET_ID: 0xA_W_nt!MiReleasePageFileSpace+55

BUCKET_ID: 0xA_W_nt!MiReleasePageFileSpace+55

Followup: MachineOwner
---------
 
Hi Greenflash,

Here's what I get in the Event log ..
Error code 1000007e, parameter1 c0000005, parameter2 80892182, parameter3 f7912ae4, parameter4 f79127e0.

Event ID: 1003 and Category: 102

Thats all I get.
Please let me know if you need anything else.

Thanks,
m0zz0
 
Hi Greenflash,

Sorry I was away for a couple of days. I have raised this to our IT guys who handles this. Let's see what they say. Apparently, we are not allowed to have whatever McAfee updates we want so it's gonna be painfully slow I guess. But I'll keep you posted if I come across anything new.

Thanks a lot.

Sincerely,
m0zz0
 
Crashes again after applying patch

Hi Greenflash and Other members,

I have just got the McAfee patch 15 really quick and have applied it to some machines. This patch is supposed to solve the memory leak problem with the Anti Virus filter driver of McAfee. But I just got one of the same error again.

I am attaching 3 mini dumps of which the last one is after the patch was applied.

Could you guys please have a look at them.

Thanks,
m0zz0
 
Hello and welcome to Techspot.

It looks to me like you may have a ram problem.

Go HERE and follow the instructions for testing you ram etc.

Regards Howard :wave: :wave:
 
Hi Howard,

I ran the memtest86 on the machines some time back and memtest86 did not report any error on any of the machines. I did not change the DIMM slots though. I have some Opteron and Dell PowerEdge 1955 Blade servers and it occurs on almost all of them randomly.
Do you suggest anything else that I should look into?

Thanks,
m0zz0
 
There looks to be a problem with the win32k.sys file. The lmtn output is listed below, followed by the verbose output (which it isn't). As it's a Microsoft file the verbose output should look the same as any other Microsoft file, e.g. ntkrpamp.exe (listed last). Might be worth checking to see if it's legit.

: kd> lmtn
start end module name
Map win32k.sys: Image region 400:192a00 does not fit in mapping <-- This is the first time I've seen this.
bf800000 bf9d0000 win32k win32k.sys Thu Oct 06 08:37:08 2005 (434471B4)

0: kd> lmvm win32k
start end module name
bf800000 bf9d0000 win32k (deferred)
Image path: win32k.sys
Image name: win32k.sys
Timestamp: Thu Oct 06 08:37:08 2005 (434471B4)
CheckSum: 001D057D
ImageSize: 001D0000
Translations: 0000.04b0 0000.04e0 0409.04b0 0409.04e0

0: kd> lmvm nt
start end module name
80800000 80a53000 nt # (pdb symbols) c:\symbols\ntkrpamp.pdb\FEC480982D1145E696432CBBD9BC2C831\ntkrpamp.pdb
Loaded symbol image file: ntkrpamp.exe
Mapped memory image file: c:\symbols\ntkrpamp.exe\42435B14253000\ntkrpamp.exe
Image path: ntkrpamp.exe
Image name: ntkrpamp.exe
Timestamp: Fri Mar 25 09:28:04 2005 (42435B14)
CheckSum: 0023D043
ImageSize: 00253000
File version: 5.2.3790.1830
Product version: 5.2.3790.1830
File flags: 0 (Mask 3F)
File OS: 40004 NT Win32
File type: 1.0 App
File date: 00000000.00000000
Translations: 0404.04b0
CompanyName: Microsoft Corporation
ProductName: Microsoft(R) Windows(R) Operating System
InternalName: ntkrpamp.exe
OriginalFilename: ntkrpamp.exe
ProductVersion: 5.2.3790.1830
FileVersion: 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
FileDescription: NT Kernel & System
LegalCopyright: (C) Microsoft Corporation. All rights reserved.
 
Hi peterdiva,

I started looking at the problem from a different perspective after you suggested this. And I found out that my machines use to crash for two reasons.

1. ClearCase and McAfee drivers certainly has a problem. And patch 15 from McAfee solves this problem. As it no longer scans mvfs files anymore.

2. Some of the Poweredge 1955 blade server was crashing because of the SAS device driver. The problem happens only in systems where the RAM is more than 4 GB.
The driver version which solves this problem is Dell SAS 5/iR Integrated, SAS 5/iR Adapter, v.01.21.26.01.

All my machines have been running fine for five days now. And I have been subjecting them to more than 15 hours of tests every day.

Howard, greenflash thanks a bunch, This is a great forum, and I'll be hanging out more here from now on.

Thanks again,

Sincerely,
m0zz0
 
Status
Not open for further replies.
Back