Josh’s Thoughts on P100, 16FF+, and Pricing

So, we finally get to see big Pascal.  Frankly, it is a bigger chip than I was expecting for the first round of TSMC’s 16 FF+ process.  At 610 mm2, it is approaching the maximum size, not only for this particular process, but for fitting on an interposer with 4 HBM2 modules.  The max interposer size is around 830 mm2, and a 4GB stack of HBM2 is slightly larger than the 1 GB of HBM1 that we saw on AMD’s Fury line-up.  It is going to be a tight fit on that interposer, and integrating all of these chips will push the physical tolerances of this technology.

The 5 miracles of St. Jen-Hsun…

TSMC has not been offering this particular process node for very long, but there have been previous chips to utilize it.  This is one of NVIDIA’s first products to use it, but obviously, with the Pascal lineup release this summer, the P100 is not the only one.  The other Pascal chips will be smaller and easier to fabricate, compared to this 610 mm2 monster.  I cannot even begin to imagine what the yields will be like on this chip as it is a new process, a new architecture, and NV’s first push into both 16nm and FinFETS.  It would not shock me to hear yields below 20% per wafer.  NV helps to maximize this by disabling portions of the P100 chip, though, to meet yield and bin targets.

The other Pascal units will likely utilize GDDR5/5X as their memory technology.  The reasoning behind this is both cost and the availability of HBM2 memory.  Samsung is already producing HBM2, but we do not know how much exactly is flowing out of those fabs.  SK Hynix, AMD’s original partner in developing HBM, will start production in Q3 of this year.  HBM2 memory is going to be fairly scarce until we get this second supplier up and running.  Initially, we will see HBM2 reach 16 GB on high end cards, but it can go to 32 GB in the future.  For the time being, the P100 looks to be the only GPU from NVIDIA that will embrace HBM2, at least until, perhaps, Q4 of this year?

What does this mean for consumers?  It means that, while we get a glimpse of this marvel, consumers will not be getting their hands on it anytime soon.  Yields are, again, likely very bad on this chip, but it is being addressed to a market that has tremendous margins.  The DGX-1 is the “Deep Learning” machine that NVIDIA will be selling later this summer.  It contains two Xeon processors, a bunch of SSDs, and eight P100 based devices.  The cost for DGX-1?  $129,000 US.  That is a lot of money.  My guess here is that the components outside of the P100 cards will probably come in at around $9,000.  That leaves $120,000 to be spread between those 8 P100 cards.  That comes in at a clean $15,000 per card.  When we consider that a fully processed bulk wafer of GPUs can be between $5,000 to $10,000 US, depending on the process and steps needed, the margins for each card is going to be tremendous.  Who cares about yields if you have a product that companies and universities are willing to spend that kind of cash on?

This is most definitely not a consumer level card.  Perhaps in early 2017, when NVIDIA’s HPC partners release their own servers with P100, will we finally see consumer grade hardware that will be in the $1000 to $1500 range that we see today with Titan products.  The other Pascal products will be pulling in the slack, and, considering that AMD has not shown or hinted at a large GPU like P100, NVIDIA will have plenty of smaller products to compete.

TSMC is pushing out 16FF+ wafers as fast as they can, but yields are another story altogether with such a large, complex die.

The specifications of this chip are very, very impressive.  15+ billion transistors, 46.3 (Update: 1:49pm ET — fixed typo) 15 MB of register space, 4 MB of L2 cache, and high speed interconnects throughout the design, as well as the NVLink tech to communicate with other GPUs.  FP64 received much needed attention, and its overall compute power at FP16, FP32, and FP64 are all amazing for a single chip solution.  This is a compute monster that is very much aimed at the high margin market of HPC installations.

I have no idea how many of these cards NVIDIA will sell at this price, but their research has likely pinpointed exactly how many they expect to move.  If they can move several hundred units, it will pay for itself.  They will also retain marketshare in the face of AMD’s GPU Technology group and their solid products.

AMD's Fury may be the first GPU to feature an interposer and HBM1 memory, but the P100 will quickly outclass this product. If high end enthusiasts can even get a hold of it…

Perhaps the most interesting aspect of this design and the process technology is that it can clock up to 1320 MHz and Boost up to 1480 MHz.  A 15 billion transistor chip that is 610 mm2 and on a new process that is hitting these numbers, all while retaining a 300 watt TDP bodes well for NVIDIA’s smaller chips being fabricated by TSMC.  Expect higher stream unit counts on these smaller units as compared to current GTX 960s, 970s, and 980s, but with a much higher clockspeed that pushes performance well beyond what we have today, even with the GTX 980 Ti.  P100 looks to be a fascinating product, but consumers will have to look from afar for the time being.

« PreviousNext »