The end of overclock?

NVIDIA had an issue on their hands with their flagship 7900 GTX retail cards consistently freezing and locking — we finally have the answer and NVIDIA’s response.

If you happened to have purchased one of the first 7900 GTX video cards that came overclocked out of the box, or if you have simply seen the many threads in different forums such as this one in our forum, you might have heard about some random freezing and lockup issues.

By our own testing, and with the help of many different users who contributed additional information, the freeze up issue was confirmed by me and many others.  Basically, a game or benchmark would ‘freeze’ for a time between 10 and 60 seconds, then ‘unfreeze’ and return to normal playing conditions.  It was an odd issue that I hadn’t really seen before and we spent time with new drivers, new power supplies and new motherboards to try and fix it. 

The only thing that seemed to work though was lowering the clocks back down the reference speeds NVIDIA announced of 650 MHz core, 800 MHz memory and 700 MHz vertex clock.  Yes, in this case, the vertex shaders are running slightly faster than the majority of the GPU, the pixel shaders.  Turns out that overclocking was one of the main instigators of this problem.

Well today NVIDIA officially got back to me with an answer to all my inquiries on the subject.  The short of it is: the GPUs on the cards that were having problems didn’t have the headroom necessary for the vendor’s overclocked specifications.  The two most prominent problem areas were the vertex clock (which remember runs 50 MHz faster than the pixel clock) and the memory clock.  These GPU subsystems were running far enough out of spec that the chips were having physical issues with stability; causing the random ‘freezes’ we were seeing.  NVIDIA didn’t specify whether the memory chips or the memory subsystem on the GPU was the culprit for memory problems, but my instincts tell me that the memory subsystem is to blame, just as the vertex shading subsystem is.

NVIDIA GeForce 7900 GTX Lockup Issue Resolved - Graphics Cards 2

NVIDIA’s 7900 GTX Reference Card

It would seem that G71 in its initial revision doesn’t quite have the headroom that G70 had at its launch.  Vendors went ahead with overclocks on their cards and sent them out the door for sale.  I know for a fact that NVIDIA didn’t give the vendors very much time at all to get boards ready for launch day, and with all of NVIDIA’s insistence on ‘hard launches’ the pressure must have been extreme from the green machine to get parts out, and get them out NOW. 

Interestingly, these problems were seen in cards that were overclocked by as much as 40 MHz core and 80 MHz memory to cards that were only overclocked by 20 MHz on each clock.  Obviously the architecture, or the yield parts that were being used, has some strict frequency restrictions. 

NVIDIA was adamant to tell us that all the board vendors are going to be fixing cards that show these problems that are in the hands of end users.  If you have one such card, contact the manufacturer and get your support process started ASAP.  The vendors, and NVIDIA, are being much stricter at the fab where G71 is being manufactured and NVIDIA tells me the new chips going out will be able to run at the speeds the vendors will sell at.  That might mean lower clock speeds from vendors (back to stock?  Gasp!) or maybe only higher yielding chips will be selected for those cards. 

The biggest concern here is the QA programs of the add-in card vendors.  There isn’t just one vendor that had issues — we saw cards from EVGA, BFG and XFX exhibit these same problems on our testing bench and with end user reports.  Obviously, these are stock-overclocked cards, so the vendors SHOULD have been testing these cards for stability at the overclocked speeds.  But that didn’t happen, at least not the standards that we expect them too.  If I can see these ‘freezes’ on my first runs of 3DMark06 that I use for basic stability testing, and end users see it on their computers immediately after purchase, something in the quality assurance process stinks. 

UPDATE (4/10/06 @ 6:10pm): After speaking with a representative of BFG Technologies, they are telling me that even though there have been a handful of reports of this problem on their 7900 GTX OC models, the number is no more than is usual for a new SKU released into the market.  It is quite possible that since BFG’s clock rates were more modest than other AICs, there will be fewer instances of this issue on their product.  Something to definitely keep in mind.

UPDATE (4/11/06 @ 9:10pm): XFX just contacted me today to also report that their support rates are no different with the 7900 GTX products than any other product launch.  From the forum posts I have seen the EVGA SuperClocked cards are the most common problem cards. 

UPDATE (4/11/06 @ 9:20pm): More information is coming in here that is quite interesting.  It would seem that SOME of the vendors that distributed overclocked cards decided to manually edit their BIOS so that the vertex engine was NOT clocked 50 MHz faster than the pixel engine.  While this would change performance numbers just a smidge, with the main instigator of the “freezing” issue being the vertex clock, the change is important.  Even more interesting, is that no one really told us this before — they didn’t want to say they had to downclock a portion of the card in order to overclock it.  My guess you’ll be seeing all the OC’d cards from all the AIC resorting to this fix in the near future.

I have several 7900 GTX GPUs here with testing completed or nearly completed and reviews ready but been held back due to this issue.  I was not going to review and recommend a product that had these kinds or problems — expect those articles soon now that the issue seems to be resolved. 

The official quote from NVIDIA:

In working with our board partners we discovered the cause of the random slowdown and temporary lockup problems experienced by various users of certain overclocked 7900 GTX graphics boards. Essentially, it was a case where the core and/or memory clocks were driven a bit too high, and the overclocking margins weren’t available on those specific boards. For users who experience the problems, the graphics card vendors will work with those individual users to fix the problem. We have learned from our board partners that the situation is now under control.

I surely hope that all of NVIDIA’s and ATI’s AIC partners take a good, hard look at this situation and remedy their QA testing methods so this doesn’t happen again.  Sending bad product out to thousands of gamers is an easy way to screw up the vendor’s reputations, and NVIDIA’s.  Maybe ATI’s previous ‘no overclocking’ stance made the most sense after all.  And maybe getting the product completely 100% right is better than getting 100% on time. 

To share your thoughts on this issue, or if you have a retail 7900 GTX and want to share your results, head here to our forum thread on the subject.

Be sure to use our price checking engine to find the best prices on the NVIDIA 7900 GTX, and anything else you may want to buy!

Be sure to use our price checking engine to find the best prices on the ATI X1900 XTX, and anything else you may want to buy!