GP100, the “Big Pascal” chip that was announced at GTC, will be coming to PCIe for enterprise and supercomputer customers in Q4 2016. Previously, it was only announced using NVIDIA's proprietary connection. In fact, they also gave themselves some lead time with their first-party DGX-1 system, which retails for $129,000 USD, although we expect that was more for yield reasons. Josh calculated that each GPU in that system is worth more than the full wafer that its die was manufactured on.
This brings us to the PCIe versions. Interestingly, they have been down-binned from the NVLink version. The boost clock has been dropped to 1300 MHz, from 1480 MHz, although that is matched with a slightly lower TDP (250W versus the NVLink's 300W). This lowers the FP16 performance to 18.7 TFLOPs, down from 21.2, FP32 performance to 9.3 TFLOPs, down from 10.6, and FP64 performance to 4.7 TFLOPs, down from 5.3. This is where we get to the question: did NVIDIA reduce the clocks to hit a 250W TDP and be compatible with the passive cooling technology that previous Tesla cards utilize, or were the clocks dropped to increase yield?
They are also providing a 12GB version of the PCIe Tesla P100. I didn't realize that GPU vendors could selectively disable HBM2 stacks, but NVIDIA disabled 4GB of memory, which also dropped the bus width to 3072-bit. You would think that the simplicity of the circuit would want to divide work in a power-of-two fashion, but, knowing that they can, it makes me wonder why they did. Again, my first reaction is to question GP100 yield, but you wouldn't think that HBM, being such a small part of the die, is something that they can reclaim a lot of chips by disabling a chunk, right? That is, unless the HBM2 stacks themselves have yield issues — which would be interesting.
There is also still no word on a 32GB version. Samsung claimed the memory technology, 8GB stacks of HBM2, would be ready for products in Q4 2016 or early 2017. We'll need to wait and see where, when, and why it will appear.