For the love of God.. Help me!

  • Thread starter Thread starter Michichael
  • Start date Start date
"Michichael" <Michichael@discussions.microsoft.com> wrote in message
news:542451A7-178A-4CEB-ADAA-37F64699CF27@microsoft.com...
> Anyway, little update. Got another PAGE_FAULT_IN_NONPAGED_AREA from
> dxgkrnl.sys, tried calling Microsoft. The rep effectively repeated a
> mantra
> This time it was trying to play Counterstrike: Source.
> So this isn't just limited to Halo... BSOD's so far today (got on it 3
> hours
> ago...): 6.


I guess you're running latest ForceWare drivers (163.69 as of today). I'm
also assuming you've trawled the knowledgebase at Nvidia, read their online
help files, etc and not found any solutions yet. So we'll go straight to the
heavy-duty troubleshooting. You can take 2 approaches. They're not entirely
mutually exclusive.

Firstly, take the system back to a minimal config baseline that works. From
there, you incrementally add or adjust configuration items, one by one, to
bring the system back in line with its present configuration. After each
adjustment, exercise the system in a way which should reproduce the error
(ie play Halo2, I guess). The point at which you start to see the problem
re-appear, will give you a good clue where the cause of the problem may lie.

Here's what you'd do, in a serious industrial setting. Since it's a home
machine, you might choose to be a bit less disciplined although each
departure diminishes the fidelity of the exercise (and possibly, turns it
into a waste of time if you get too cavalier). Much patience is required.

- first, back up your user data

- remove as many peripheral devices as you can - printers, cameras, sound
cards, scanners etc. We want CPU, memory, graphics card, and one hard disk
that's all

- if the machine has been overclocked, take *everything* back to the default
factory settings - CPU, buss, graphic processor, the lot.

- re-install Vista from scratch, from original media, reformatting the hard
disk and avoiding any third party drivers during the installation process
(only use the Microsoft-supplied drivers).

- you now have a very plain, vanilla installation of Vista. Performance
might be less than what you'd like but our goal here is stability, not
performance! (not yet, anyway).

- install Halo2, as the tool with which to exercise the system.

- reproduce the problem scenario eg, play Halo2 for >30 minutes and verify
that it does not crash (this is the painful part: you need to play games for
at least 30 minutes :-)

If it still crashes in this very vanilla environment, then you have a
fundamental hardware problem with your machine. It needs to be examined by a
skilled computer technician. I mean someone with a certificate in
electronics engineering, or similar, who can use an oscilliscope, logic
probes etc - not just a PC enthusiast who reads Maximum PC (fine publication
that it is).

Assuming that Halo2 does run okay, start changing your config back to how it
was. It is very important you only change one thing at a time, and then test
after each change. For example:
- confirm the new, clean install of Vista runs okay.
- then run Windows Update to patch your machine to the current revision
level. Test again
- run DXDiag and export a report of your settings ("Save all Information"),
for future reference and comparison.
- next, install the current Nvidia-supplied Forceware drivers. Test again.
- next, re-attach your peripheral devices, one by one. Exercise the system
in between each, to verify the system continues to run normally. You might
need to spread this over a few days.
- install any additional vendor-supplied drivers for your various devices.
Test system again.
- install your normal user applications eg Office, Photoshop, etc. Avoid
installing any apps which install kernel-mode drivers we want to stick to
user mode stuff, for now.
- exercise the system. This is a fairly good baseline: a plain installation
of Vista, Nvidia-supplied graphics drivers and general user apps. Hopefully
the system is still stable, at this point.
- now install any apps which include kernel mode drivers. Test the system
again.
- assuming you want to return to an overclocked configuration, you can start
overclocking again, now. But, don't leap straight to the maximum overclock -
just ramp up the CPU a little bit, and then test. Then increase a little bit
more, and test the system. Then change your memory timing settings, if
that's what you wish ... but again, don't go straight to an aggressive
setting, just moderate - and test the system again.

At some point, the system may start to fail. Observe the last change you
made to the system. If possible, roll back that change (eg uninstall driver,
decrease OC setting, etc) and check that the system returns to stability. Be
aware that not all changes are "idempotent" - in other words, they might be
one-way: even uninstalling the change won't return the system back to a
working state. If that's what you encounter, you may need to repeat the
whole loop, stopping at the point just short of where problems appeared last
time round.

This approach is empirical, and draws on the traditions of root cause
analysis (in the precise engineering sense, not the loose vernacular sense
of "root cause").

The second approach is to be analytically diagnostic: get a memory dump of
the crash, and analyse it.

For this, you need to install the Windows Debugging Tools. You can download
these from here:
http://www.microsoft.com/whdc/devtools/debugging/default.mspx

You'll also need a a symbol path, so WinDBG can find the debug symbols from
Microsoft's public symbol server. In Contol Panel, System, Advanced System
Settings, define an environment varible called "_NT_SYMBOL_PATH" (with an
initial underscore). Assign it the value of
"srv*C:\Symbols*http://msdl.microsoft.com/download/symbols". This will tell
debuggers on your system to download symbols from
http://msdl.microsoft.com/download/symbols, and store them in a directory on
your hard disk called "C:\Symbols". If you do a SET command at the prompt,
you should see this in the output:

_NT_SYMBOL_PATH=srv*C:\Symbols*http://msdl.microsoft.com/download/symbols

For more background on confuring the Debug Tools, see:
http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

If you're lucky, the system will still have the mini-dumps from your
previous crashes. These are stored in a location like
"C:\Users\<username>\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp", where
the "90FE" part of the path will vary, for each different crash. Note that
AppData is normally a hidden directory.

If you have no minidump.mdmp files currently on your system, go to Control
Panel, System, Advanced System Settings, Startup and Recovery, Settings, and
configure a specific location for your memory dumps in the "System Failure"
box (eg C:\Dumps, or similar).

A full debug of a memory dump is a complex task, which requires extensive
specialised knowledge. Fortunately, some of this knowledge has been
automated in WinDBG's "analyze" command.
- Run WinDBG from the Windows Debugging Tools in Start menu
- go to File menu, Open Crash Dump, to open one of the minidump.mdmp or
memory.mdp files on your machine.
- when the dump file is opened, WinDBG will display a message similar to the
following (you'll have a different exception code):


Microsoft (R) Windows Debugger Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File
[C:\Users\Someguy\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp]
User Mini Dump File: Only registers, stack and portions of memory are
available

Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Sun Aug 5 15:55:28.000 2007 (GMT+10)
System Uptime: 0 days 18:45:04.965
Process Uptime: 0 days 0:00:07.000
Symbol search path is:
srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
.............
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a84.1014): Access violation - code c0000005 (first/second chance not
available)
00000000`002e4a30 c3 ret


Now, at the command line at the bottom of the WinDBG window, enter the
command "!analyze -v". That's an exclamation mark, followed by the word
"analyze" spelt in the American fashion with a "z", then a space, and a
dash, and a lower-case v.

WinDBG will chugg away for a minute or two - you will also see some network
activity, as it downloads the debug symbols from the symbol server. It will
then display a diagnostic report, making a reasonable guess at the faulty
module. To give a big headstart to any troubleshooting, include this report
in any problem reports to Nvidia, Microsoft, newsgroup forums etc.

Dxgkrnl.sys, which is referred to in several of your crashes, is a miniport
driver. That's to say, the DirectX graphics have 2 main components - a bunch
of functionality which is common to all drivers from all vendors, and so is
written one time for everyone by Microsoft (that's dxgkrnl.sys) and a
vendor-supplied driver, which contains the functionality specific to each
vendor's hardware (for Nvidia, this will be nvlddmkm.sys). Like all miniport
drivers, there is an unusually close symbiosis of the Microsoft-supplied and
vendor-supplied drivers - so a crash in one, can easily be caused by a
problem in the other. The fact you're seeing crashes in dxgkrnl.sys is
interesting, but ... this is some of the most heavily exercised code out
there. Every Vista machine is hammering this driver all day, every day.
There could certainly be many as-yet undiscovered bugs in this driver! But
if the cuase of the crashes is a bug in dxgkrnl.sys, there would need to be
some fairly unusual condition on your machine which is exposing the bug,
when it does seem to occur with anything like the same frquency on most
other machines. Isolating that unusual condition may also provide you with a
workaround solution, even if there isn't a hotfix (yet) for the bug.

PAGE_FAULT_IN_NONPAGED_AREA can often be caused by faulty hardware. But
since you're seeing a combination of STOP 0x50 and STOP 0x3B, I think it's
more likely to be a buggy driver, passing bad data as it makes the
transition from User Mode to Kernel Mode (graphics drivers especially prone
to this). That's the "System Service" referred to in the STOP message (ie,
not a "service" as in a process controlled by the Windows Service Manager,
but a "service call" by the operating system, to request a kernel function).

Obviously there's a lot of work here .. but if there's no ready-to-go answer
to your problem, this is the way I'd tackle it. Other folks may have
additional ideas.

Good luck,
--
Andrew McLaren
amclar (at) optusnet dot com dot au
 
Alright, I'll give that a shot tomorrow. I also just installed
http://support.microsoft.com/kb/940105.

I will note though, this *is* a vanilla install of Vista. So far the only
things on it are the updates which got it to stop crashing in the first place
on bootup, and Source/Halo2/AVG. All settings are at defaults. No additional
programs have been installed beyond that. This is a virgin system, as far as
hardware problems go, all brand new parts except for the RAM, which will be
replaced come Tuesday. I will give the debug utility a shot though.

I always go through the methodical "change one thing, test, change another
thing, test" procedure, which is why this one is frustrating me so. I can't
reliably replicate the error, it just appears at random. Nothing in the
Application, Security, or other Event logs shows me the problem. I'll keep
you posted, however. Since there's no real documented method of fixing this
that is easily found, I'll keep a log of what I do and when we do get it
nailed, post it as an aid to others.

"Andrew McLaren" wrote:

> "Michichael" <Michichael@discussions.microsoft.com> wrote in message
> news:542451A7-178A-4CEB-ADAA-37F64699CF27@microsoft.com...
> > Anyway, little update. Got another PAGE_FAULT_IN_NONPAGED_AREA from
> > dxgkrnl.sys, tried calling Microsoft. The rep effectively repeated a
> > mantra
> > This time it was trying to play Counterstrike: Source.
> > So this isn't just limited to Halo... BSOD's so far today (got on it 3
> > hours
> > ago...): 6.

>
> I guess you're running latest ForceWare drivers (163.69 as of today). I'm
> also assuming you've trawled the knowledgebase at Nvidia, read their online
> help files, etc and not found any solutions yet. So we'll go straight to the
> heavy-duty troubleshooting. You can take 2 approaches. They're not entirely
> mutually exclusive.
>
> Firstly, take the system back to a minimal config baseline that works. From
> there, you incrementally add or adjust configuration items, one by one, to
> bring the system back in line with its present configuration. After each
> adjustment, exercise the system in a way which should reproduce the error
> (ie play Halo2, I guess). The point at which you start to see the problem
> re-appear, will give you a good clue where the cause of the problem may lie.
>
> Here's what you'd do, in a serious industrial setting. Since it's a home
> machine, you might choose to be a bit less disciplined although each
> departure diminishes the fidelity of the exercise (and possibly, turns it
> into a waste of time if you get too cavalier). Much patience is required.
>
> - first, back up your user data
>
> - remove as many peripheral devices as you can - printers, cameras, sound
> cards, scanners etc. We want CPU, memory, graphics card, and one hard disk
> that's all
>
> - if the machine has been overclocked, take *everything* back to the default
> factory settings - CPU, buss, graphic processor, the lot.
>
> - re-install Vista from scratch, from original media, reformatting the hard
> disk and avoiding any third party drivers during the installation process
> (only use the Microsoft-supplied drivers).
>
> - you now have a very plain, vanilla installation of Vista. Performance
> might be less than what you'd like but our goal here is stability, not
> performance! (not yet, anyway).
>
> - install Halo2, as the tool with which to exercise the system.
>
> - reproduce the problem scenario eg, play Halo2 for >30 minutes and verify
> that it does not crash (this is the painful part: you need to play games for
> at least 30 minutes :-)
>
> If it still crashes in this very vanilla environment, then you have a
> fundamental hardware problem with your machine. It needs to be examined by a
> skilled computer technician. I mean someone with a certificate in
> electronics engineering, or similar, who can use an oscilliscope, logic
> probes etc - not just a PC enthusiast who reads Maximum PC (fine publication
> that it is).
>
> Assuming that Halo2 does run okay, start changing your config back to how it
> was. It is very important you only change one thing at a time, and then test
> after each change. For example:
> - confirm the new, clean install of Vista runs okay.
> - then run Windows Update to patch your machine to the current revision
> level. Test again
> - run DXDiag and export a report of your settings ("Save all Information"),
> for future reference and comparison.
> - next, install the current Nvidia-supplied Forceware drivers. Test again.
> - next, re-attach your peripheral devices, one by one. Exercise the system
> in between each, to verify the system continues to run normally. You might
> need to spread this over a few days.
> - install any additional vendor-supplied drivers for your various devices.
> Test system again.
> - install your normal user applications eg Office, Photoshop, etc. Avoid
> installing any apps which install kernel-mode drivers we want to stick to
> user mode stuff, for now.
> - exercise the system. This is a fairly good baseline: a plain installation
> of Vista, Nvidia-supplied graphics drivers and general user apps. Hopefully
> the system is still stable, at this point.
> - now install any apps which include kernel mode drivers. Test the system
> again.
> - assuming you want to return to an overclocked configuration, you can start
> overclocking again, now. But, don't leap straight to the maximum overclock -
> just ramp up the CPU a little bit, and then test. Then increase a little bit
> more, and test the system. Then change your memory timing settings, if
> that's what you wish ... but again, don't go straight to an aggressive
> setting, just moderate - and test the system again.
>
> At some point, the system may start to fail. Observe the last change you
> made to the system. If possible, roll back that change (eg uninstall driver,
> decrease OC setting, etc) and check that the system returns to stability. Be
> aware that not all changes are "idempotent" - in other words, they might be
> one-way: even uninstalling the change won't return the system back to a
> working state. If that's what you encounter, you may need to repeat the
> whole loop, stopping at the point just short of where problems appeared last
> time round.
>
> This approach is empirical, and draws on the traditions of root cause
> analysis (in the precise engineering sense, not the loose vernacular sense
> of "root cause").
>
> The second approach is to be analytically diagnostic: get a memory dump of
> the crash, and analyse it.
>
> For this, you need to install the Windows Debugging Tools. You can download
> these from here:
> http://www.microsoft.com/whdc/devtools/debugging/default.mspx
>
> You'll also need a a symbol path, so WinDBG can find the debug symbols from
> Microsoft's public symbol server. In Contol Panel, System, Advanced System
> Settings, define an environment varible called "_NT_SYMBOL_PATH" (with an
> initial underscore). Assign it the value of
> "srv*C:\Symbols*http://msdl.microsoft.com/download/symbols". This will tell
> debuggers on your system to download symbols from
> http://msdl.microsoft.com/download/symbols, and store them in a directory on
> your hard disk called "C:\Symbols". If you do a SET command at the prompt,
> you should see this in the output:
>
> _NT_SYMBOL_PATH=srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
>
> For more background on confuring the Debug Tools, see:
> http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx
>
> If you're lucky, the system will still have the mini-dumps from your
> previous crashes. These are stored in a location like
> "C:\Users\<username>\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp", where
> the "90FE" part of the path will vary, for each different crash. Note that
> AppData is normally a hidden directory.
>
> If you have no minidump.mdmp files currently on your system, go to Control
> Panel, System, Advanced System Settings, Startup and Recovery, Settings, and
> configure a specific location for your memory dumps in the "System Failure"
> box (eg C:\Dumps, or similar).
>
> A full debug of a memory dump is a complex task, which requires extensive
> specialised knowledge. Fortunately, some of this knowledge has been
> automated in WinDBG's "analyze" command.
> - Run WinDBG from the Windows Debugging Tools in Start menu
> - go to File menu, Open Crash Dump, to open one of the minidump.mdmp or
> memory.mdp files on your machine.
> - when the dump file is opened, WinDBG will display a message similar to the
> following (you'll have a different exception code):
>
>
> Microsoft (R) Windows Debugger Version 6.6.0007.5
> Copyright (c) Microsoft Corporation. All rights reserved.
>
>
> Loading Dump File
> [C:\Users\Someguy\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp]
> User Mini Dump File: Only registers, stack and portions of memory are
> available
>
> Windows Vista Version 6000 MP (2 procs) Free x64
> Product: WinNt, suite: SingleUserTS
> Debug session time: Sun Aug 5 15:55:28.000 2007 (GMT+10)
> System Uptime: 0 days 18:45:04.965
> Process Uptime: 0 days 0:00:07.000
> Symbol search path is:
> srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
> Executable search path is:
> ............
> This dump file has an exception of interest stored in it.
> The stored exception information can be accessed via .ecxr.
> (a84.1014): Access violation - code c0000005 (first/second chance not
> available)
> 00000000`002e4a30 c3 ret
>
>
> Now, at the command line at the bottom of the WinDBG window, enter the
> command "!analyze -v". That's an exclamation mark, followed by the word
> "analyze" spelt in the American fashion with a "z", then a space, and a
> dash, and a lower-case v.
>
> WinDBG will chugg away for a minute or two - you will also see some network
> activity, as it downloads the debug symbols from the symbol server. It will
> then display a diagnostic report, making a reasonable guess at the faulty
> module. To give a big headstart to any troubleshooting, include this report
> in any problem reports to Nvidia, Microsoft, newsgroup forums etc.
>
> Dxgkrnl.sys, which is referred to in several of your crashes, is a miniport
> driver. That's to say, the DirectX graphics have 2 main components - a bunch
> of functionality which is common to all drivers from all vendors, and so is
> written one time for everyone by Microsoft (that's dxgkrnl.sys) and a
> vendor-supplied driver, which contains the functionality specific to each
> vendor's hardware (for Nvidia, this will be nvlddmkm.sys). Like all miniport
> drivers, there is an unusually close symbiosis of the Microsoft-supplied and
> vendor-supplied drivers - so a crash in one, can easily be caused by a
> problem in the other. The fact you're seeing crashes in dxgkrnl.sys is
> interesting, but ... this is some of the most heavily exercised code out
> there. Every Vista machine is hammering this driver all day, every day.
> There could certainly be many as-yet undiscovered bugs in this driver! But
> if the cuase of the crashes is a bug in dxgkrnl.sys, there would need to be
> some fairly unusual condition on your machine which is exposing the bug,
> when it does seem to occur with anything like the same frquency on most
> other machines. Isolating that unusual condition may also provide you with a
> workaround solution, even if there isn't a hotfix (yet) for the bug.
>
> PAGE_FAULT_IN_NONPAGED_AREA can often be caused by faulty hardware. But
> since you're seeing a combination of STOP 0x50 and STOP 0x3B, I think it's
> more likely to be a buggy driver, passing bad data as it makes the
> transition from User Mode to Kernel Mode (graphics drivers especially prone
> to this). That's the "System Service" referred to in the STOP message (ie,
> not a "service" as in a process controlled by the Windows Service Manager,
> but a "service call" by the operating system, to request a kernel function).
>
> Obviously there's a lot of work here .. but if there's no ready-to-go answer
> to your problem, this is the way I'd tackle it. Other folks may have
> additional ideas.
>
> Good luck,
> --
> Andrew McLaren
> amclar (at) optusnet dot com dot au
>
>
 
"Michichael" <Michichael@discussions.microsoft.com> wrote ...
> Alright, I'll give that a shot tomorrow. I also just installed
> http://support.microsoft.com/kb/940105.
>
> I will note though, this *is* a vanilla install of Vista.


Are you running with the Microsoft-supplied video drivers, or Nvidia
Forceware drivers?

I'd start with vanilla-vanilla ... so vanilla, you can taste it! :-) So
vanilla, Homer Hudson knocks on your door and asks to buy your PC. As I
say: if you have a truly plain, out-of-the-box installation of Vista, with
no 3rd party drivers or updates, and the system is routinely crashing, then
I'd think you have a hardware problem.

I've worked on many cases where the customer said "No no, everything is
completely plain". Then I found a 3rd party driver (or the like) in the
dump's call stack. When I confront the customer about it, they say "Oh but
*that* driver doesn't matter, it never causes any problems". Uh-huh ... so,
what's it doing in the memory dump, then??? Basically I just don't trust
anyone, anymore (sad ... but true).

Let us know how it goes.

--
Andrew McLaren
amclar (at) optusnet dot com dot au
 
"Seven" <Seven@linux.sux> wrote in message
news:O9UEDPj$HHA.3548@TK2MSFTNGP06.phx.gbl...
> "Adam Albright" <AA@ABC.net> wrote in message
> news:tuodf3leeet3oijunc3rccrf97d9laeru5@4ax.com...
>> On Sun, 23 Sep 2007 14:38:01 -0700, Michichael
>> <Michichael@discussions.microsoft.com> wrote:
>>
>>>I'll give it a shot when the new RAM comes. But the current RAM is set to
>>>it's timings (4-4-4-12).
>>>
>>>Also, I dunno what did it, but that KB patch, which is supposedly only
>>>for
>>>multiple GPU (SLI) usage.
>>>
>>>http://support.microsoft.com/kb/936710
>>>
>>>This has reduced the load on my GPU so it only runs 65-70 under load
>>>instead
>>>of up to 80. And I haven't crashed yet. hmmm...

>>
>> I'm assuming you mean the chip on your graphic card? Intense gaming
>> can cause heat build up. Lot of people know they can overclock their
>> CPU, I'll wager a lot don't know you can also overclock your graphic
>> card. Of course proceed only if you're willing to assume the risk.
>>
>> Have you tried this?
>>
>> http://forums.firingsquad.com/firingsquad/board/message?board.id=hardware&message.id=70995
>>
>>

>
> Yeah moron.
> Most gamers who know their memory timings don't know you can OC the GPU !
> And if he is monitoring his GPU temps, I'll wager he understands.


I'll wager that the OP has forgotten that different programs use different
features in the GPU and just because it will run one program indefinitely
while over clocked doesn't mean it will run another even at the same
temperature.
You run stuff out of spec and you accept that odd things may, no make that,
will happen.
No manufacturer of hardware or software is going to be interested in fixing
problems caused by users using stuff out of spec with the possible exception
of a few PC box shifters who will do anything for a bit of cash.
 
"Michichael" <Michichael@discussions.microsoft.com> wrote in message
news:75B42862-8A4C-44EF-85E2-167302292D4A@microsoft.com...
> Hard to do when you have a full time job as a Systems Administrator and
> your
> hours are the exact same as theirs! =P
>
> But yeah I'm getting sick of Vista really fast. Can't even get my music to
> play from my speakers, and voice from my headset in Halo 2. But that's
> going
> in another thread in a few moments. Had another
> PAGE_FAULT_IN_NONPAGED_AREA
> error, as well as a SYSTEM_SERVICE_EXCEPTION when I EXITED counterstrike
> source.... Hmmm... -.-
>
> Now my GPU is reading at most 81C at really intensive scenes of Halo 2, so
> I
> was worried it'd be a heat issue, but it just doesn't seem to click, as
> I've
> played it at upwards of 100C without any issues. The card is just designed
> for higher heat.


The card may have bigger fans and heat sinks but there is still a limit on
how hot any bit of the *silicon* can get and still perform as the software
expects.
Measuring heatsink temps is at best a guide to the total power in the chip
and not to that in any particular bit of the chip so local tempretures can
be much higher than what you measure. These can cause the hardware to do
things the software can't cope with (do ATI still have the reset GPU catch
all in their drivers? Their drivers failed so often they had a watchdog
reset the GPU if it timed out.). Even a minor change in a bit of driver code
can effect the heat distribution on a GPU. If you have it working the way
you like it then don't change things as there is a good chance something
will change, probably for the worst. Even updating the graphics drivers
causes overclockers problems without changing the whole OS.

> Since both the mobo and the card are from XFX, who's tech
> support is great, I'm going to grill their tech support team about the
> issues.


As its probably a hardware problem that would be a good idea.
As they supply cards intended to be operated out of spec they may have a
returns policy so you can get a new bit of hardware that behaves in a
different way when out of spec.. if you are lucky this will work for you but
I doubt if they gurantee it. Also its a good chance its the RAM causing the
problem so you need to run some intensive RAM tests to eliminate that too
(that will take days to do properly BTW).


> Specially since Microsoft would rather make you pay for support in
> getting it to work =P


Unless you get the problem with all the hardware running in spec there isn't
much point in talking to M$.. they have no way to fix it.
They certainly don't test windows with all games running on all out of spec
systems.
 
I will note I'm using the defaults for everything. 0 overclock involved. Why
overclock an already unstable system?

"dennis@home" wrote:

>
> "Seven" <Seven@linux.sux> wrote in message
> news:O9UEDPj$HHA.3548@TK2MSFTNGP06.phx.gbl...
> > "Adam Albright" <AA@ABC.net> wrote in message
> > news:tuodf3leeet3oijunc3rccrf97d9laeru5@4ax.com...
> >> On Sun, 23 Sep 2007 14:38:01 -0700, Michichael
> >> <Michichael@discussions.microsoft.com> wrote:
> >>
> >>>I'll give it a shot when the new RAM comes. But the current RAM is set to
> >>>it's timings (4-4-4-12).
> >>>
> >>>Also, I dunno what did it, but that KB patch, which is supposedly only
> >>>for
> >>>multiple GPU (SLI) usage.
> >>>
> >>>http://support.microsoft.com/kb/936710
> >>>
> >>>This has reduced the load on my GPU so it only runs 65-70 under load
> >>>instead
> >>>of up to 80. And I haven't crashed yet. hmmm...
> >>
> >> I'm assuming you mean the chip on your graphic card? Intense gaming
> >> can cause heat build up. Lot of people know they can overclock their
> >> CPU, I'll wager a lot don't know you can also overclock your graphic
> >> card. Of course proceed only if you're willing to assume the risk.
> >>
> >> Have you tried this?
> >>
> >> http://forums.firingsquad.com/firingsquad/board/message?board.id=hardware&message.id=70995
> >>
> >>

> >
> > Yeah moron.
> > Most gamers who know their memory timings don't know you can OC the GPU !
> > And if he is monitoring his GPU temps, I'll wager he understands.

>
> I'll wager that the OP has forgotten that different programs use different
> features in the GPU and just because it will run one program indefinitely
> while over clocked doesn't mean it will run another even at the same
> temperature.
> You run stuff out of spec and you accept that odd things may, no make that,
> will happen.
> No manufacturer of hardware or software is going to be interested in fixing
> problems caused by users using stuff out of spec with the possible exception
> of a few PC box shifters who will do anything for a bit of cash.
>
>
 
Well we've narrowed it down to a driver issue. Here's where we stand:

I did about 4 hours of extensive testing in a 64 bit XP installation. No
issues with any game.

Vista, about 2 hours into Halo 2 or Bioshock, or even Counterstrike, the
dxgkrnl.sys error pops up:

0x0000007E (0XFFFFFFFFC0000005, 0XFFFFF98004BDB81A, 0XFFFFF9800D941838,
0XFFFFF9800D941210)

dxgkrnl.sys FFFFF98004BDB81A base at FFFFF98004B22000, Datestamp 46a9480b.

That's an example of one of them.

I've contacted Microsoft and they're going to help me fix it.

At this point, I'm inclined to believe one of two issues: That DX10 is
inherently unstable, or the fact that my NVidia 8800 GTX is faulty in the
DX10 runtime, and I should RMA it. I'll keep you folks posted on the
resolution!

"Andrew McLaren" wrote:

> "Michichael" <Michichael@discussions.microsoft.com> wrote ...
> > Alright, I'll give that a shot tomorrow. I also just installed
> > http://support.microsoft.com/kb/940105.
> >
> > I will note though, this *is* a vanilla install of Vista.

>
> Are you running with the Microsoft-supplied video drivers, or Nvidia
> Forceware drivers?
>
> I'd start with vanilla-vanilla ... so vanilla, you can taste it! :-) So
> vanilla, Homer Hudson knocks on your door and asks to buy your PC. As I
> say: if you have a truly plain, out-of-the-box installation of Vista, with
> no 3rd party drivers or updates, and the system is routinely crashing, then
> I'd think you have a hardware problem.
>
> I've worked on many cases where the customer said "No no, everything is
> completely plain". Then I found a 3rd party driver (or the like) in the
> dump's call stack. When I confront the customer about it, they say "Oh but
> *that* driver doesn't matter, it never causes any problems". Uh-huh ... so,
> what's it doing in the memory dump, then??? Basically I just don't trust
> anyone, anymore (sad ... but true).
>
> Let us know how it goes.
>
> --
> Andrew McLaren
> amclar (at) optusnet dot com dot au
>
>
 
=?Utf-8?B?TWljaGljaGFlbA==?= <Michichael@discussions.microsoft.com>
Whilst having a limited moment of clarity in an otherwise cloudy
existence the drunkard typed
news:22CE22B4-00F1-44A4-9846-F935AE324753@microsoft.com:

> Well we've narrowed it down to a driver issue. Here's where we stand:
>
> I did about 4 hours of extensive testing in a 64 bit XP installation.
> No issues with any game.
>
> Vista, about 2 hours into Halo 2 or Bioshock, or even Counterstrike,
> the dxgkrnl.sys error pops up:
>
> 0x0000007E (0XFFFFFFFFC0000005, 0XFFFFF98004BDB81A,
> 0XFFFFF9800D941838, 0XFFFFF9800D941210)
>
> dxgkrnl.sys FFFFF98004BDB81A base at FFFFF98004B22000, Datestamp
> 46a9480b.
>
> That's an example of one of them.
>
> I've contacted Microsoft and they're going to help me fix it.
>
> At this point, I'm inclined to believe one of two issues: That DX10 is
> inherently unstable, or the fact that my NVidia 8800 GTX is faulty in
> the DX10 runtime, and I should RMA it. I'll keep you folks posted on
> the resolution!
>
> "Andrew McLaren" wrote:
>
>> "Michichael" <Michichael@discussions.microsoft.com> wrote ...
>> > Alright, I'll give that a shot tomorrow. I also just installed
>> > http://support.microsoft.com/kb/940105.
>> >
>> > I will note though, this *is* a vanilla install of Vista.

>>
>> Are you running with the Microsoft-supplied video drivers, or Nvidia
>> Forceware drivers?
>>
>> I'd start with vanilla-vanilla ... so vanilla, you can taste it! :-)
>> So vanilla, Homer Hudson knocks on your door and asks to buy your PC.
>> As I say: if you have a truly plain, out-of-the-box installation of
>> Vista, with no 3rd party drivers or updates, and the system is
>> routinely crashing, then I'd think you have a hardware problem.
>>
>> I've worked on many cases where the customer said "No no, everything
>> is completely plain". Then I found a 3rd party driver (or the like)
>> in the dump's call stack. When I confront the customer about it, they
>> say "Oh but *that* driver doesn't matter, it never causes any
>> problems". Uh-huh ... so, what's it doing in the memory dump, then???
>> Basically I just don't trust anyone, anymore (sad ... but true).
>>
>> Let us know how it goes.
>>
>> --
>> Andrew McLaren
>> amclar (at) optusnet dot com dot au
>>
>>

>


<this is where the post goes>
 
"Michichael" <Michichael@discussions.microsoft.com> wrote ...
> Well we've narrowed it down to a driver issue. Here's where we stand:
> I've contacted Microsoft and they're going to help me fix it.


Outstanding. Thanks for the update.

BTW (and just out of curiosity, not questioning your methodological rigour)
were you running dxgkrnl.sys in conjunction with the Forceware driver from
the Nvidia website, or with the plain, Microsoft-supplied driver?

> At this point, I'm inclined to believe one of two issues: That DX10 is
> inherently unstable,


I'm hoping DX10 is not "inherently" unstable. Maybe it just needs a bit of
real-world buffing, to knock the last of the rough edges off. If Microsoft
need to go and resdesign a whole graphics architecture, it'd be painful for
all of us.

Mind you ... in my world, computers are used for numerical analysis, complex
stochastic algorithms, and databases. Computers are *not* meant for playing
games, or looking at pictures! :-)

> DX10 runtime, and I should RMA it. I'll keep you folks posted on the
> resolution!


I'll be interested to hear what happens ...
--
Andrew McLaren
amclar (at) optusnet dot com dot au
 
I had to do a complete wipe/reinstall of Vista. I was running it from the
base Windows Update provided drivers, no sound card, nothin.

I will also note, it's still possible it's the combination of old RAM and
DX10 requiring extensive resources. The new modules will be installed today,
so we'll see.

Also, I noticed something odd. The 8800 GTX has 768 MB of DDR3 RAM. However
dxdiag reports 1499 MB.

It does not show this under XP 64-bit, however. Odd.... I remember that you
can disable the page file under the XP OS, but can you also, if you have 4G
of RAM, disable it in Vista? Or does it still require it for memory dumping.


"Andrew McLaren" wrote:

> "Michichael" <Michichael@discussions.microsoft.com> wrote ...
> > Well we've narrowed it down to a driver issue. Here's where we stand:
> > I've contacted Microsoft and they're going to help me fix it.

>
> Outstanding. Thanks for the update.
>
> BTW (and just out of curiosity, not questioning your methodological rigour)
> were you running dxgkrnl.sys in conjunction with the Forceware driver from
> the Nvidia website, or with the plain, Microsoft-supplied driver?
>
> > At this point, I'm inclined to believe one of two issues: That DX10 is
> > inherently unstable,

>
> I'm hoping DX10 is not "inherently" unstable. Maybe it just needs a bit of
> real-world buffing, to knock the last of the rough edges off. If Microsoft
> need to go and resdesign a whole graphics architecture, it'd be painful for
> all of us.
>
> Mind you ... in my world, computers are used for numerical analysis, complex
> stochastic algorithms, and databases. Computers are *not* meant for playing
> games, or looking at pictures! :-)
>
> > DX10 runtime, and I should RMA it. I'll keep you folks posted on the
> > resolution!

>
> I'll be interested to hear what happens ...
> --
> Andrew McLaren
> amclar (at) optusnet dot com dot au
>
>
 
"Michichael" <Michichael@discussions.microsoft.com> wrote...
>I had to do a complete wipe/reinstall of Vista. I was running it from the
> base Windows Update provided drivers, no sound card, nothin.


Interesting. Hmmm. Okay, thanks - that's good to know.

> Also, I noticed something odd. The 8800 GTX has 768 MB of DDR3 RAM.
> However
> dxdiag reports 1499 MB.


Prior to the Windows Display Driver Model (WDDM) introduced in Vista, old
display drivers could generally only use the real, dedicated memory on the
graphics card. In WDDM however, graphics memory is virtualised, just like
main system memory. For main system memory, the virtual backing store is the
page file. For graphics memory, the virtual backing store is system memory.
So DXDiag is correctly reporting the memory available to the WDDM driver,
which is the sum of the real, semiconductor memory on the graphics card,
plus a portion of the global RAM available to the system. On XP, only the
dedicated graphics memory on the graphics card is available to the graphics
driver. On Vista, the graphics driver can use all the dedicated graphics
memory plus, if necessary, a slice of the total system memory sitting on the
motherboard. So DXDiag on Vista reports a higher figure. DXDiag is reporting
correctlty, in both cases.

Note that the WDDM driver does not naively "steal" this RAM from your
system. There are quite elaborate algorithms to ensure fair sharing of
memory, balanced with best overall system performance. You can read all
about it in this paper from Microsoft:
http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/GraphicsMemory.doc

> It does not show this under XP 64-bit, however. Odd.... I remember that
> you
> can disable the page file under the XP OS, but can you also, if you have
> 4G
> of RAM, disable it in Vista? Or does it still require it for memory
> dumping.


Well, to have a page file or no, is really a different discussion, not
relevant to your graphics memory. Except that, I guess, if the system was
runing a WDDM driver and was really under heavy load, it's possible that
both global system memory, *and* graphics memory, would end up swapping
pages of memory to disk. But the system doesn't page graphics memory to disk
under normal conditions - see the paper, for details of the allocation
mechanisms used here. Not easily summarised in a few words.

I was fortunate to have the chance to discuss the page file question with
Microsoft guys, at a couple of conferences, etc ... I mean, technical guys
such as Lou Perazzoli, Bruce Worthington and Adrian Marinescu - not just the
marketing clowns with great hair and khaki chinos. I doubt I could fully
reproduce the subtlety of their thinking! But the conclusions were always
clear and unanimous. NT (including NT, 2000, XP, 2003, Vista and Server
2008) was designed from the ground up as a virtual memory operating system.
The page file isn't just an after-thought, to work aound limited real
memory. For example, many of the copy-on-write algorithms used to load EXE
and DLL files when a process loads, will make intelligent use of the page
file, if it is there. They have performance-tuned the crap out of the page
replacement mechanisms for nearly 20 years now (starting in 1988). Despite
the urban myths which circulate in some PC enthusiast forums, NT does *not*
lose performance by naively swapping pages to disk, unnecessarily. Even if
you have 4GB of RAM and 32 bit Windows, the NT kernel can find useful uses
for the page file. Even if your committed memory never even approaches the
limit of your semiconductor memory, NT can find good uses for the page file.
If you're really worried about disk space, drop the file size to 256MB or
the like but do not remove the page file altogether. That seems to be the
concensus of the cognescenti.

And as you note, a page file on the boot disk is required, if you want to
get a memory dump.

--
Andrew McLaren
amclar (at) optusnet dot com dot au
 
In an effort to keep yall informed, I'm CC'n my tech support logs here:

"Righto. Newegg dropped the ball, and either my RAM was stolen off my
front porch, or they shipped it somewhere other than here. So my new RAM
is dead in the water right now, so can`t test that theory. At the
moment, the X-fi card is sitting next to me, and I`ve completely removed
it from my system, including the hidden device drivers left behind. X-fi
doesn`t exist as far as this system is concerned. After about an hour of
testing, it crashed again. STOP 0x0000007E (0XFFFFFFFFC0000005,
0XFFFFF98004BDB81A, 0XFFFFF9800D941838, 0XFFFFF9800D941210) dxgkrnl.sys
FFFFF98004BDB81A base at FFFFF98004B22000 I`m now installing the DX10
SDK as directed, and will keep you informed. I will note, however, that
at the time of this crash, I had experienced artifacts and shearing in
the game shortly before it completely failed. Temperature 85*C. Ambient
31 *C (Sensor on the other side of the card, sitting right on top of the
circuit board.) Will post again shortly.
"


--
Michichael
 
Well the new SDK did not help. Crashed after about 10 min of play,
ambient temperature 30, GPU 81. Same error code.


--
Michichael
 
And here's the analysis:

> PAGE_FAULT_IN_NONPAGED_AREA (50)
> Invalid system memory was referenced. This cannot be protected by
> try-except,
> it must be protected by a Probe. Typically the address is just plain
> bad or it
> is pointing at freed memory.
> Arguments:
> Arg1: ffffb88002ef4400, memory referenced.
> Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
> Arg3: fffff980032c9997, If non-zero, the instruction address which
> referenced the bad memory
> address.
> Arg4: 0000000000000007, (reserved)
>
> Debugging Details:
> ------------------
>
> ***** Debugger could not find nt in module list, module list might be
> corrupt, error 0x80070057.
>
>
> READ_ADDRESS: unable to get nt!MmSpecialPoolStart
> unable to get nt!MmSpecialPoolEnd
> unable to get nt!MmPoolCodeStart
> unable to get nt!MmPoolCodeEnd
> ffffb88002ef4400
>
> FAULTING_IP:
> +fffff980032c9997
> fffff980`032c9997 ?? ???
>
> MM_INTERNAL_CODE: 7
>
> DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
>
> BUGCHECK_STR: 0x50
>
> LAST_CONTROL_TRANSFER: from 0000000000000000 to fffff960000914ad
>
> STACK_TEXT:
> fffff980`209086d0 00000000`00000000 : 00000000`00000000
> 00000000`00000000 00000000`00000000 00000000`00000000 :
> 0xfffff960`000914ad
>
>
> STACK_COMMAND: kb
>
> SYMBOL_NAME: ANALYSIS_INCONCLUSIVE
>
> FOLLOWUP_NAME: MachineOwner
>
> MODULE_NAME: Unknown_Module
>
> IMAGE_NAME: Unknown_Image
>
> DEBUG_FLR_IMAGE_TIMESTAMP: 0
>
> BUCKET_ID: CORRUPT_MODULELIST
>
> Followup: MachineOwner


Now I'm no engineer, but maybe, just maybe, this is pointing to a
problem with my RAM?


--
Michichael
 
>> ***** Debugger could not find nt in module list, module list might be
>> corrupt, error 0x80070057.
>> STACK_TEXT:
>> fffff980`209086d0 00000000`00000000 : 00000000`00000000
>> 00000000`00000000 00000000`00000000 00000000`00000000 :
>> 0xfffff960`000914ad
>> SYMBOL_NAME: ANALYSIS_INCONCLUSIVE
>> BUCKET_ID: CORRUPT_MODULELIST

>
> Now I'm no engineer, but maybe, just maybe, this is pointing to a
> problem with my RAM?


"Module" as used here, is referring to a Loadable Module - an EXE, DLL, SYS
etc file. The Stack in the dump is completely trashed, so either there was
stack corruption or the memory hardware didn't preserve the right data and
it evaporated.

You'd probably need to try walking the stack in the dump to see if it looked
like it had been corrupted by something (ie something over-wrote the stack
area with data, apparently all 0s). That is certainly feasible for an
experienced debug user, but far beyond what can be achieved via newsgroup
support.

You can give your RAM a pretty good workout by using the built-in memory
test. Boot up from the Vist DVD and go to repair options. Hit memory test.
Somewhere in the memory dialogue, there's an "Advanced" option where you can
crank up the number of passes to pretty punishing levels. That might help
isolate any hardware issues. Of course there are also many 3rd party memory
test tools as well.

Apart from that ... it's probably up to the Microsoft guys to study your
dump, now.

--
Andrew McLaren
amclar (at) optusnet dot com dot au
 
Yeah, after reading through it after another crash I noticed it
wasn't exactly working. I'm really starting to think this is either heat
or RAM related now, because with the X-Fi card OUT, my video card's
operating temperature jumped 10*C (It's idling at 65-70*C, load 80-85*C)
and I'm seeing more frequent crashes. I completely uninstalled
everything to d with the card, as well as many of the other
"Unconnected" devices in the device manager, just to rule them out.
Though I'm getting concerned with the fact that it's citing USBPORt.sys
as a cause now too.


--
Michichael
 
"Michichael" <Michichael.2xir78@no-mx.forums.net> wrote in message
news:Michichael.2xir78@no-mx.forums.net...
>
> Yeah, after reading through it after another crash I noticed it
> wasn't exactly working. I'm really starting to think this is either heat
> or RAM related now, because with the X-Fi card OUT, my video card's
> operating temperature jumped 10*C (It's idling at 65-70*C, load 80-85*C)
> and I'm seeing more frequent crashes. I completely uninstalled
> everything to d with the card, as well as many of the other
> "Unconnected" devices in the device manager, just to rule them out.
> Though I'm getting concerned with the fact that it's citing USBPORt.sys
> as a cause now too.


Have you checked the fans are the correct way around?
I ask this because I was evaluating a very expensive server once and the
disks were getting >55C which is just too hot.
They sent the engineers out and eventually the designers as it was a brand
new design.
I was poking around inside with a temp probe while they were there and
noticed one of the fans was backwards.. the result not a lot of airflow even
though there were no fan alarms. It turns out that they were building them
all incorrectly so its just as well it was found early. It pays to check the
simple stuff first.
 
Yup. The two 250mm fans are blowing cool air straight into the case from
the side, I have a 120mm fan in the front as an exhaust, I'm going to
flip it and make it a intake, so all exhaust would go through the
passive vents on the sides, as well as out the back of the case.


--
Michichael
 
"Michichael" <Michichael.2xjfi6@no-mx.forums.net> wrote in message
news:Michichael.2xjfi6@no-mx.forums.net...
>
> Yup. The two 250mm fans are blowing cool air straight into the case from
> the side, I have a 120mm fan in the front as an exhaust, I'm going to
> flip it and make it a intake, so all exhaust would go through the
> passive vents on the sides, as well as out the back of the case.


You have to be careful if you have exhaust and inlet fans to make sure the
air just doesn't flow straight in one and out the other by the shortest
path.
 
Alrighty - This issue has been resolved.

It was the RAM. The X-Fi card is installed and working just fine, and
everything is functional. Just finished Halo 2!

Video card peaks 90*C Operating. No Artifacts or errors.

Stats:

E6850 Core 2 Duo - 3.0ghz
Peltier Cooler Processor Heatsink - 39*C
Windtunnel Case - CLIO
Creative Fatal1ty X-Fi card
2x2GB Mushkin DDR2 800 4-3-3-10 @ 2.2 V
WD Raptor 150 GB HDD
Maxtor 500 GB HDD
Ageia PhysX Card
Thermaltake 850W SLI PSU
XFX NVidia 680i SLI Motherboard
XFX NVidia 8800 GTX, 768MB DDR3 Video Card
Vista 64-bit Ultimate (Mostly Working)
Thank everyone for their help.


--
Michichael
 
Back
Top