Win 2003 R2 crashes frequently

By m0zz0
Mar 15, 2007
Topic Status:
Not open for further replies.
  1. Hi,
    I have a bunch of Windows 2003 R2 servers running. These are our build servers and for sometime now they crash while builds are ongoing. The problem looks like there is some problem with the clearcase and McAfee drivers when they work together. I need help in figuring out some of the kernel dump. My machines were configured to make a kernel dump so I cannot post any mini dumps but I have just changed it to create mini dumps now. So I should be able to post mini dumps soon.
    Right now, I am pasting the results shown by WinDbg. Could you guys please tell me what's going on here.
    This is just one of them dumps. And this one doesn't show anything about ClearCase and McAfee or so I think. And this is exactly what confuses me.

    Thanks a lot,
    m0zz0

    Symbol search path is: SRV*c:\Symbols\* http://msdl.microsoft.com/download/symbols
    Executable search path is:
    Windows Server 2003 Kernel Version 3790 (Service Pack 1) MP (4 procs) Free x86 compatible
    Product: Server, suite: Enterprise TerminalServer SingleUserTS
    Built by: 3790.srv03_sp1_rtm.050324-1447
    Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8
    Debug session time: Fri Mar 16 02:59:46.200 2007 (GMT+9)
    System Uptime: 0 days 10:09:38.531
    Loading Kernel Symbols
    .............................................................................................
    Loading User Symbols

    Loading unloaded module list
    .......
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck A, {9d7f7008, 2, 1, 80864873}

    Probably caused by : memory_corruption ( nt!MiReleasePageFileSpace+55 )

    Followup: MachineOwner
    ---------

    3: kd> !analyze -v
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high. This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 9d7f7008, memory referenced
    Arg2: 00000002, IRQL
    Arg3: 00000001, value 0 = read operation, 1 = write operation
    Arg4: 80864873, address which referenced memory

    Debugging Details:
    ------------------


    WRITE_ADDRESS: 9d7f7008

    CURRENT_IRQL: 2

    FAULTING_IP:
    nt!MiReleasePageFileSpace+55
    80864873 213e and dword ptr [esi],edi

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    BUGCHECK_STR: 0xA

    PROCESS_NAME: System

    TRAP_FRAME: f7912be4 -- (.trap fffffffff7912be4)
    ErrCode = 00000002
    eax=8c9f00c0 ebx=00000000 ecx=00000000 edx=00000000 esi=9d7f7008 edi=fffffffe
    eip=80864873 esp=f7912c58 ebp=f7912c68 iopl=0 nv up ei pl zr na pe nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
    nt!MiReleasePageFileSpace+0x55:
    80864873 213e and dword ptr [esi],edi ds:0023:9d7f7008=????????
    Resetting default scope

    LAST_CONTROL_TRANSFER: from 80864873 to 8088bdd3

    STACK_TEXT:
    f7912be4 80864873 badb0d00 00000000 00000000 nt!KiTrap0E+0x2a7
    f7912c68 80842017 00000020 8b010000 808ab4a8 nt!MiReleasePageFileSpace+0x55
    f7912d44 80843343 8b01ec28 00000000 808ab4c0 nt!MiCleanSection+0x471
    f7912d90 80843466 00000000 8cf5b440 00000000 nt!MiRemoveUnusedSegments+0x963
    f7912dac 80948bb2 00000000 00000000 00000000 nt!MiDereferenceSegmentThread+0x5e
    f7912ddc 8088d4d2 80843408 00000000 00000000 nt!PspSystemThreadStartup+0x2e
    00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16


    STACK_COMMAND: kb

    FOLLOWUP_IP:
    nt!MiReleasePageFileSpace+55
    80864873 213e and dword ptr [esi],edi

    SYMBOL_STACK_INDEX: 1

    SYMBOL_NAME: nt!MiReleasePageFileSpace+55

    FOLLOWUP_NAME: MachineOwner

    MODULE_NAME: nt

    DEBUG_FLR_IMAGE_TIMESTAMP: 42435b14

    IMAGE_NAME: memory_corruption

    FAILURE_BUCKET_ID: 0xA_W_nt!MiReleasePageFileSpace+55

    BUCKET_ID: 0xA_W_nt!MiReleasePageFileSpace+55

    Followup: MachineOwner
    ---------
  2. greenflash

    greenflash Newcomer, in training Posts: 104

    Hi there,
    Have you reviewed Event Log?
  3. m0zz0

    m0zz0 Newcomer, in training Topic Starter

    Hi Greenflash,

    Here's what I get in the Event log ..
    Error code 1000007e, parameter1 c0000005, parameter2 80892182, parameter3 f7912ae4, parameter4 f79127e0.

    Event ID: 1003 and Category: 102

    Thats all I get.
    Please let me know if you need anything else.

    Thanks,
    m0zz0
  4. greenflash

    greenflash Newcomer, in training Posts: 104

  5. m0zz0

    m0zz0 Newcomer, in training Topic Starter

    Hi Greenflash,

    Sorry I was away for a couple of days. I have raised this to our IT guys who handles this. Let's see what they say. Apparently, we are not allowed to have whatever McAfee updates we want so it's gonna be painfully slow I guess. But I'll keep you posted if I come across anything new.

    Thanks a lot.

    Sincerely,
    m0zz0
  6. m0zz0

    m0zz0 Newcomer, in training Topic Starter

    Crashes again after applying patch

    Hi Greenflash and Other members,

    I have just got the McAfee patch 15 really quick and have applied it to some machines. This patch is supposed to solve the memory leak problem with the Anti Virus filter driver of McAfee. But I just got one of the same error again.

    I am attaching 3 mini dumps of which the last one is after the patch was applied.

    Could you guys please have a look at them.

    Thanks,
    m0zz0
  7. howard_hopkinso

    howard_hopkinso Newcomer, in training Posts: 25,948   +19

    Hello and welcome to Techspot.

    It looks to me like you may have a ram problem.

    Go HERE and follow the instructions for testing you ram etc.

    Regards Howard :wave: :wave:
  8. m0zz0

    m0zz0 Newcomer, in training Topic Starter

    Hi Howard,

    I ran the memtest86 on the machines some time back and memtest86 did not report any error on any of the machines. I did not change the DIMM slots though. I have some Opteron and Dell PowerEdge 1955 Blade servers and it occurs on almost all of them randomly.
    Do you suggest anything else that I should look into?

    Thanks,
    m0zz0
  9. peterdiva

    peterdiva TechSpot Ambassador Posts: 1,202

    There looks to be a problem with the win32k.sys file. The lmtn output is listed below, followed by the verbose output (which it isn't). As it's a Microsoft file the verbose output should look the same as any other Microsoft file, e.g. ntkrpamp.exe (listed last). Might be worth checking to see if it's legit.

    : kd> lmtn
    start end module name
    Map win32k.sys: Image region 400:192a00 does not fit in mapping <-- This is the first time I've seen this.
    bf800000 bf9d0000 win32k win32k.sys Thu Oct 06 08:37:08 2005 (434471B4)

    0: kd> lmvm win32k
    start end module name
    bf800000 bf9d0000 win32k (deferred)
    Image path: win32k.sys
    Image name: win32k.sys
    Timestamp: Thu Oct 06 08:37:08 2005 (434471B4)
    CheckSum: 001D057D
    ImageSize: 001D0000
    Translations: 0000.04b0 0000.04e0 0409.04b0 0409.04e0

    0: kd> lmvm nt
    start end module name
    80800000 80a53000 nt # (pdb symbols) c:\symbols\ntkrpamp.pdb\FEC480982D1145E696432CBBD9BC2C831\ntkrpamp.pdb
    Loaded symbol image file: ntkrpamp.exe
    Mapped memory image file: c:\symbols\ntkrpamp.exe\42435B14253000\ntkrpamp.exe
    Image path: ntkrpamp.exe
    Image name: ntkrpamp.exe
    Timestamp: Fri Mar 25 09:28:04 2005 (42435B14)
    CheckSum: 0023D043
    ImageSize: 00253000
    File version: 5.2.3790.1830
    Product version: 5.2.3790.1830
    File flags: 0 (Mask 3F)
    File OS: 40004 NT Win32
    File type: 1.0 App
    File date: 00000000.00000000
    Translations: 0404.04b0
    CompanyName: Microsoft Corporation
    ProductName: Microsoft(R) Windows(R) Operating System
    InternalName: ntkrpamp.exe
    OriginalFilename: ntkrpamp.exe
    ProductVersion: 5.2.3790.1830
    FileVersion: 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
    FileDescription: NT Kernel & System
    LegalCopyright: (C) Microsoft Corporation. All rights reserved.
  10. m0zz0

    m0zz0 Newcomer, in training Topic Starter

    Hi peterdiva,

    I started looking at the problem from a different perspective after you suggested this. And I found out that my machines use to crash for two reasons.

    1. ClearCase and McAfee drivers certainly has a problem. And patch 15 from McAfee solves this problem. As it no longer scans mvfs files anymore.

    2. Some of the Poweredge 1955 blade server was crashing because of the SAS device driver. The problem happens only in systems where the RAM is more than 4 GB.
    The driver version which solves this problem is Dell SAS 5/iR Integrated, SAS 5/iR Adapter, v.01.21.26.01.

    All my machines have been running fine for five days now. And I have been subjecting them to more than 15 hours of tests every day.

    Howard, greenflash thanks a bunch, This is a great forum, and I'll be hanging out more here from now on.

    Thanks again,

    Sincerely,
    m0zz0
Topic Status:
Not open for further replies.


Add New Comment

TechSpot Members
Login or sign up for free,
it takes about 30 seconds.
You may also...


Get complete access to the TechSpot community. Join thousands of technology enthusiasts that contribute and share knowledge in our forum. Get a private inbox, upload your own photo gallery and more.