Help With Diagnosing Server Crash

metalmania

Member
Joined
Nov 9, 2010
Messages
2
Location
UK, Durham
Hi All,
Would anyone be able to help me diagnose my Server 2008 disruptive shutdowns please?
This has been happening for the last month and occurs between 1 and 4 times a day!

My server is a leased dedicated server, I have remote access (and KVM access when enabled).
The provider (fasthosts) has recently swapped the chassis after memory diagnosis and disk tests seemed to pass ok.
Initially the server was NOT providing ANY minidump - so I had nothing to work with. However I have just (yesterday) upgraded to SP2 and this morning the server crashed twice and provided a minidump on both occassions.

Here is the server spec:
(msinfo32 says its a Fujitsu D2812-A2)
BIOS Phoenix Technologies Ltd 6.00 r1.20.2812.A2
Intel Core2 Quad CPU Q8400 2.66GHZ
8GB RAM
Windows Server Standard SP2 X64
Network : Intel 82567LM-3 Gigabit Network
Disk: WDC WD3000HLFS-01G6U1 ATA x 2
RAID-1

I have Mailenable Pro, MySQL 5.1.52 installed and the server runs multiple ASP websites - with some quite large ecommerce sites. All the sites/software (excl Mailenable) have been migrated onto this new machine from a Windows 2003 server which has been running fine for the last few years.

When the server crashes it has a tendency to corrupt some of the MySQL tables/indexes especially where autonumbered columns are the primary key.

I downloaded the minidumps from the server to my own PC (Windows 7 Pro) and ran WINDBG - The results (from both minidumps) says :-
Probably caused by : PCIIDEX.SYS ( PCIIDEX!BmSetup+6b )

This is the first time I have analysed a minidump and I am not sure I have done everything correctly although I followed the instructions from Major Geeks Forum

I have attached the mini dumps in the zip file.
also download from here : minidumps
Could someone tell me if I have firstly analysed them correctly and hopefully tell me what I need to do to fix the problem.

Many Thanks.
 
The server crashed twice again this morning, the second time when I was using windbg on the server analysing the first minidump!
Below is the output so I am pretty sure it is being caused by PCIIDEX.sys however I have no idea how to solve this, any thoughts appreciated.


Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\Minidump\Mini111010-02.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Server 2008/Windows Vista Kernel Version 6002 (Service Pack 2) MP (4 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 6002.18267.amd64fre.vistasp2_gdr.100608-0458
Machine Name:
Kernel base = 0xfffff800`01a15000 PsLoadedModuleList = 0xfffff800`01bd9dd0
Debug session time: Wed Nov 10 07:46:46.773 2010 (UTC + 0:00)
System Uptime: 0 days 4:34:11.634
Loading Kernel Symbols
...............................................................
................................................................
..
Loading User Symbols
Loading unloaded module list
.....
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck A, {74000, 2, 0, fffff80001a71390}

Probably caused by : PCIIDEX.SYS ( PCIIDEX!BmSetup+6b )

Followup: MachineOwner
---------

1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000074000, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80001a71390, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS: GetPointerFromAddress: unable to read from fffff80001c3c080
0000000000074000

CURRENT_IRQL: 2

FAULTING_IP:
nt!RtlCopyMemoryNonTemporal+40
fffff800`01a71390 4c8b0c11 mov r9,qword ptr [rcx+rdx]

CUSTOMER_CRASH_COUNT: 2

DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP

BUGCHECK_STR: 0xA

PROCESS_NAME: System

TRAP_FRAME: fffffa600171b620 -- (.trap 0xfffffa600171b620)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000001000 rbx=0000000000000000 rcx=fffffa6001f25000
rdx=0000059ffe14f000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80001a71390 rsp=fffffa600171b7b8 rbp=0000000000000002
r8=0000000000000000 r9=0000000000000000 r10=fffffa800670e290
r11=fffffa600171b7a0 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
nt!RtlCopyMemoryNonTemporal+0x40:
fffff800`01a71390 4c8b0c11 mov r9,qword ptr [rcx+rdx] ds:9c38:00000000`00074000=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff80001a6f26e to fffff80001a6f4d0

STACK_TEXT:
fffffa60`0171b4d8 fffff800`01a6f26e : 00000000`0000000a 00000000`00074000 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffffa60`0171b4e0 fffff800`01a6e14b : 00000000`00000000 fffffa80`0cd69c38 00000000`00000001 00000000`00074000 : nt!KiBugCheckDispatch+0x6e
fffffa60`0171b620 fffff800`01a71390 : fffff800`01f2eca9 00000000`00074000 00000000`00000002 fffffa80`08a3a0f0 : nt!KiPageFault+0x20b
fffffa60`0171b7b8 fffff800`01f2eca9 : 00000000`00074000 00000000`00000002 fffffa80`08a3a0f0 fffff800`01a78c7a : nt!RtlCopyMemoryNonTemporal+0x40
fffffa60`0171b7c0 fffff800`01f2e423 : fffffa80`0670e290 fffffa80`0670e200 00000000`00000000 fffffa80`08a3a0f0 : hal!HalpDmaSyncMapBuffers+0x1b1
fffffa60`0171b870 fffff800`01f31399 : fffffa80`0c89dc68 fffffa80`0670e290 fffffa80`08a3a0f0 fffffa80`73506d00 : hal!HalpDmaMapScatterTransfer+0xa3
fffffa60`0171b8c0 fffff800`01f31312 : fffffa80`0c89dc68 fffffa80`0c89dc60 00000000`00001000 00000000`00000001 : hal!HalpMapTransfer+0x79
fffffa60`0171b940 fffff800`01f3080f : 00000000`00000000 fffff800`01f2de45 00000000`00000000 00000000`00000001 : hal!IoMapTransfer+0x8e
fffffa60`0171b980 fffff800`01f30fdd : fffffa80`06701840 fffffa80`0670e290 fffffa80`0670e201 00000000`00000000 : hal!HalpAllocateAdapterCallback+0xc7
fffffa60`0171ba20 fffff800`01f305df : fffffa80`0c89dc20 00000000`00001000 fffffa80`0670e290 fffffa80`08a3a0f0 : hal!HalAllocateAdapterChannel+0x101
fffffa60`0171ba60 fffffa60`00c770d3 : fffffa80`0c89db80 fffffa60`00c7712c fffffa80`000000a0 00000000`00075000 : hal!HalBuildScatterGatherList+0x2f3
fffffa60`0171bad0 fffffa60`00ca951a : fffffa80`0c89db80 fffffa80`0c89db80 fffffa80`0673f1a0 fffffa60`00ca2901 : PCIIDEX!BmSetup+0x6b
fffffa60`0171bb30 fffffa60`00ca873c : fffffa80`067404e8 fffffa80`067eb1b0 00000000`00000002 fffffa60`00c77199 : ataport!IdeDispatchChannelRequest+0x106
fffffa60`0171bb60 fffffa60`00ca9e26 : 00000000`00000001 00000000`00000000 fffffa80`0c89db80 00000000`00000000 : ataport!IdeStartChannelRequest+0xd8
fffffa60`0171bbb0 fffffa60`00ca9991 : fffffa80`0673f1a0 00000000`00000000 fffffa60`005ef580 00000000`00000001 : ataport!IdeProcessCompletedRequests+0x316
fffffa60`0171bc60 fffff800`01a72ee7 : fffffa80`0673f118 00000000`00000000 00000000`00000000 fffffa60`005ef580 : ataport!IdePortCompletionDpc+0x15d
fffffa60`0171bd10 fffff800`01a738d2 : fffffa60`00ca9834 fffffa60`005ec180 00000000`00000000 fffffa60`005f5d40 : nt!KiRetireDpcList+0x117
fffffa60`0171bd80 fffff800`01c40860 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x62
fffffa60`0171bdb0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!zzz_AsmCodeRange_End+0x4


STACK_COMMAND: kb

FOLLOWUP_IP:
PCIIDEX!BmSetup+6b
fffffa60`00c770d3 85c0 test eax,eax

SYMBOL_STACK_INDEX: b

SYMBOL_NAME: PCIIDEX!BmSetup+6b

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: PCIIDEX

IMAGE_NAME: PCIIDEX.SYS

DEBUG_FLR_IMAGE_TIMESTAMP: 49e02bde

FAILURE_BUCKET_ID: X64_0xA_PCIIDEX!BmSetup+6b

BUCKET_ID: X64_0xA_PCIIDEX!BmSetup+6b

Followup: MachineOwner
---------
 
The server crashed twice again this morning, the second time when I was using windbg on the server analysing the first minidump!
Below is the output so I am pretty sure it is being caused by PCIIDEX.sys however I have no idea how to solve this, any thoughts appreciated.


Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\Minidump\Mini111010-02.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Server 2008/Windows Vista Kernel Version 6002 (Service Pack 2) MP (4 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 6002.18267.amd64fre.vistasp2_gdr.100608-0458
Machine Name:
Kernel base = 0xfffff800`01a15000 PsLoadedModuleList = 0xfffff800`01bd9dd0
Debug session time: Wed Nov 10 07:46:46.773 2010 (UTC + 0:00)
System Uptime: 0 days 4:34:11.634
Loading Kernel Symbols
...............................................................
................................................................
..
Loading User Symbols
Loading unloaded module list
.....
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck A, {74000, 2, 0, fffff80001a71390}

Probably caused by : PCIIDEX.SYS ( PCIIDEX!BmSetup+6b )

Followup: MachineOwner
---------

1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000074000, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80001a71390, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS: GetPointerFromAddress: unable to read from fffff80001c3c080
0000000000074000

CURRENT_IRQL: 2

FAULTING_IP:
nt!RtlCopyMemoryNonTemporal+40
fffff800`01a71390 4c8b0c11 mov r9,qword ptr [rcx+rdx]

CUSTOMER_CRASH_COUNT: 2

DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP

BUGCHECK_STR: 0xA

PROCESS_NAME: System

TRAP_FRAME: fffffa600171b620 -- (.trap 0xfffffa600171b620)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000001000 rbx=0000000000000000 rcx=fffffa6001f25000
rdx=0000059ffe14f000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80001a71390 rsp=fffffa600171b7b8 rbp=0000000000000002
r8=0000000000000000 r9=0000000000000000 r10=fffffa800670e290
r11=fffffa600171b7a0 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
nt!RtlCopyMemoryNonTemporal+0x40:
fffff800`01a71390 4c8b0c11 mov r9,qword ptr [rcx+rdx] ds:9c38:00000000`00074000=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff80001a6f26e to fffff80001a6f4d0

STACK_TEXT:
fffffa60`0171b4d8 fffff800`01a6f26e : 00000000`0000000a 00000000`00074000 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffffa60`0171b4e0 fffff800`01a6e14b : 00000000`00000000 fffffa80`0cd69c38 00000000`00000001 00000000`00074000 : nt!KiBugCheckDispatch+0x6e
fffffa60`0171b620 fffff800`01a71390 : fffff800`01f2eca9 00000000`00074000 00000000`00000002 fffffa80`08a3a0f0 : nt!KiPageFault+0x20b
fffffa60`0171b7b8 fffff800`01f2eca9 : 00000000`00074000 00000000`00000002 fffffa80`08a3a0f0 fffff800`01a78c7a : nt!RtlCopyMemoryNonTemporal+0x40
fffffa60`0171b7c0 fffff800`01f2e423 : fffffa80`0670e290 fffffa80`0670e200 00000000`00000000 fffffa80`08a3a0f0 : hal!HalpDmaSyncMapBuffers+0x1b1
fffffa60`0171b870 fffff800`01f31399 : fffffa80`0c89dc68 fffffa80`0670e290 fffffa80`08a3a0f0 fffffa80`73506d00 : hal!HalpDmaMapScatterTransfer+0xa3
fffffa60`0171b8c0 fffff800`01f31312 : fffffa80`0c89dc68 fffffa80`0c89dc60 00000000`00001000 00000000`00000001 : hal!HalpMapTransfer+0x79
fffffa60`0171b940 fffff800`01f3080f : 00000000`00000000 fffff800`01f2de45 00000000`00000000 00000000`00000001 : hal!IoMapTransfer+0x8e
fffffa60`0171b980 fffff800`01f30fdd : fffffa80`06701840 fffffa80`0670e290 fffffa80`0670e201 00000000`00000000 : hal!HalpAllocateAdapterCallback+0xc7
fffffa60`0171ba20 fffff800`01f305df : fffffa80`0c89dc20 00000000`00001000 fffffa80`0670e290 fffffa80`08a3a0f0 : hal!HalAllocateAdapterChannel+0x101
fffffa60`0171ba60 fffffa60`00c770d3 : fffffa80`0c89db80 fffffa60`00c7712c fffffa80`000000a0 00000000`00075000 : hal!HalBuildScatterGatherList+0x2f3
fffffa60`0171bad0 fffffa60`00ca951a : fffffa80`0c89db80 fffffa80`0c89db80 fffffa80`0673f1a0 fffffa60`00ca2901 : PCIIDEX!BmSetup+0x6b
fffffa60`0171bb30 fffffa60`00ca873c : fffffa80`067404e8 fffffa80`067eb1b0 00000000`00000002 fffffa60`00c77199 : ataport!IdeDispatchChannelRequest+0x106
fffffa60`0171bb60 fffffa60`00ca9e26 : 00000000`00000001 00000000`00000000 fffffa80`0c89db80 00000000`00000000 : ataport!IdeStartChannelRequest+0xd8
fffffa60`0171bbb0 fffffa60`00ca9991 : fffffa80`0673f1a0 00000000`00000000 fffffa60`005ef580 00000000`00000001 : ataport!IdeProcessCompletedRequests+0x316
fffffa60`0171bc60 fffff800`01a72ee7 : fffffa80`0673f118 00000000`00000000 00000000`00000000 fffffa60`005ef580 : ataport!IdePortCompletionDpc+0x15d
fffffa60`0171bd10 fffff800`01a738d2 : fffffa60`00ca9834 fffffa60`005ec180 00000000`00000000 fffffa60`005f5d40 : nt!KiRetireDpcList+0x117
fffffa60`0171bd80 fffff800`01c40860 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x62
fffffa60`0171bdb0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!zzz_AsmCodeRange_End+0x4


STACK_COMMAND: kb

FOLLOWUP_IP:
PCIIDEX!BmSetup+6b
fffffa60`00c770d3 85c0 test eax,eax

SYMBOL_STACK_INDEX: b

SYMBOL_NAME: PCIIDEX!BmSetup+6b

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: PCIIDEX

IMAGE_NAME: PCIIDEX.SYS

DEBUG_FLR_IMAGE_TIMESTAMP: 49e02bde

FAILURE_BUCKET_ID: X64_0xA_PCIIDEX!BmSetup+6b

BUCKET_ID: X64_0xA_PCIIDEX!BmSetup+6b

Followup: MachineOwner
---------

I really hope someone can help as I have the same server and I am getting random crashes also. I do not know how to debug but I used a whocrashed application and got this output.


On Tue 22/03/2011 15:28:15 GMT your computer crashed
crash dump file: C:\Windows\Minidump\Mini032211-04.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x5A490)
Bugcheck code: 0xA (0xFFFFFA6009998000, 0x2, 0x0, 0xFFFFF80001A65390)
Error: IRQL_NOT_LESS_OR_EQUAL
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft Windows Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that Microsoft Windows or a kernel-mode driver accessed paged memory at DISPATCH_LEVEL or above.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver which cannot be identified at this time.


On Tue 22/03/2011 15:28:15 GMT your computer crashed
crash dump file: C:\Windows\memory.dmp
This was probably caused by the following module: hal.dll (hal!HalMakeBeep+0x19CD)
Bugcheck code: 0xA (0xFFFFFA6009998000, 0x2, 0x0, 0xFFFFF80001A65390)
Error: IRQL_NOT_LESS_OR_EQUAL
file path: C:\Windows\system32\hal.dll
product: Microsoft Windows Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This indicates that Microsoft Windows or a kernel-mode driver accessed paged memory at DISPATCH_LEVEL or above.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system which cannot be identified at this time.

I have uploaded todays 4 crash minidumps here: http://www.mediafire.com/?aelcn8p8z16l1hi
 
I really hope someone can help as I have the same server and I am getting random crashes also. I do not know how to debug but I used a whocrashed application and got this output.


On Tue 22/03/2011 15:28:15 GMT your computer crashed
crash dump file: C:\Windows\Minidump\Mini032211-04.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x5A490)
Bugcheck code: 0xA (0xFFFFFA6009998000, 0x2, 0x0, 0xFFFFF80001A65390)
Error: IRQL_NOT_LESS_OR_EQUAL
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft Windows Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that Microsoft Windows or a kernel-mode driver accessed paged memory at DISPATCH_LEVEL or above.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver which cannot be identified at this time.


On Tue 22/03/2011 15:28:15 GMT your computer crashed
crash dump file: C:\Windows\memory.dmp
This was probably caused by the following module: hal.dll (hal!HalMakeBeep+0x19CD)
Bugcheck code: 0xA (0xFFFFFA6009998000, 0x2, 0x0, 0xFFFFF80001A65390)
Error: IRQL_NOT_LESS_OR_EQUAL
file path: C:\Windows\system32\hal.dll
product: Microsoft Windows Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This indicates that Microsoft Windows or a kernel-mode driver accessed paged memory at DISPATCH_LEVEL or above.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system which cannot be identified at this time.

I have uploaded todays 4 crash minidumps here: http://www.mediafire.com/?aelcn8p8z16l1hi

Hi there,

It's never easy diagnose a bluescreen.

For me you could have two problems: RAM or Driver.

What I don't understand is why windows tells you there's something wrong with HAL.dll, which is a DLL that works with hardware. In other words, it looks like your RAM has a problem. I'm going to read your dumps, but you have to take note in which case this happens. Is it completly random?

I'll post again once I'm finished with dumps
 
Well this is interesting...

The DLL called everytime is HAL.DLL, the process involved is ntoskernel (obvious).

If you don't see ANY relation between crashes, you should start the MEMTEST test:

http://www.memtest.org/#downiso

During this test your system will not be usable, run the test AT LEAST 3 times. Don't use the integrated test in windows. I don't think you problem is a driver, well honestly it could... during the memtest (it takes hours to complete), go to the server's website and look for updated drivers for everything.

Hope this help.

Let me know.
 
just read the specifications for your "server". It's not a server!

The fujitsu D2812-A2 is a workstation, if you look for drivers there are only for Win XP and Vista. Anyway keep trying with what I've suggested before.
 
just read the specifications for your "server". It's not a server!

The fujitsu D2812-A2 is a workstation, if you look for drivers there are only for Win XP and Vista. Anyway keep trying with what I've suggested before.

Well yeah its a workstation with a quad core that we are running server 2k8 on. I had to install the unsupported drivers but anyway I managed to get around to learning how to debug the system crashes.

PCIIDEX.SYS is the faulting driver which causes the Blue screens.

(Probably caused by : PCIIDEX.SYS ( PCIIDEX!BmSetup+6b ))

led me to possible fix:


http://social.technet.microsoft.com...n/thread/bd99af07-b640-43ae-a7da-03176df28d48

So basically we require to do this:

1st do regedit

http://support.microsoft.com/kb/922976

will be trying it tonight :) thanks for your input , my 1st test was the memory as I have had similar problems before on other machines. It passes the memtest.
 
Back
Top