Pat holding up the Larrabee wafer. The dies, even from this far away, are large and quite visible. Photo courtesy of Hardware.FR.
There are a few things that we have been bandying back and forth between the staff of PCPer, and I thought this might be a good place to go over them here. I am betting that the Larrabee is a fully custom part which should run at relatively high clockspeeds. Since it is comprised of one fixed function portion (the texture unit), the ringbus architecture to transmit data, and a handful of fully custom X86 cores attached to their own specialized vector unit. Custom layout in this case is much simpler than if someone attempted to do a fully custom design with non-repeating parts. If we assume that the Larrabee group has produced a X86 compliant core along the lines of the Atom it would be around 25 mm square using the 45 nm process. If we further theorize that the vector unit attached to each processor is around the same size (since it is a specialized vector unit it would not necessarily have to be large to produce some strong numbers) then that gives approximately 50 mm square for each functional unit. If the die is around a total of 600 mm square then there will be a grand total of 12 cores in the first generation Larrabee product. Now, if the vector portion was slightly larger, then we would obviously have to cut that number down to 8 or 10 units.
Power will always be an issue with ICs, and the larger the die and the more transistors it holds dictates the power consumption and heat production of a part. While using a fully custom design will mitigate some of the unattractive side effects of integrated circuit design, it is not a panacea. Fully custom takes a lot of time to do right, and it is still very manpower intensive vs. the standard cell place and route that has been commonplace until recently. So, while Intel could do fully custom, and probably hit between 1.5 to 2 GHz with such a design, their overall processing power could be low as compared to current and upcoming solutions from AMD and NVIDIA. The reason for this is that so much of the transistor budget has been used by the X86 cores (which are poor units when doing pure rendering) as well as extensive L1 and L2 caches.
The fear that many of us hold is that the first Larrabee product will match up against the midrange products from AMD and NVIDIA in terms of gaming performance. Utilizing a 600 mm square die for a product that will compete against products in the $99 to $159 price range could be a rather… expensive endeavor for Intel. While Intel will be dipping its foot into the gaming market, we feel that they will probably be pushing their Larrabee parts more into the HPC market where having a dozen X86 cores attached to programmable vector units all in one (relatively) small package pulling some 200 watts is considered “a damn good product”. It is highly doubtful that Intel will make a splash in the gaming market, but their Larrabee architecture could be a very troublesome entry into the GPGPU marketplace, which is only now starting to really expand.
It is good that we have proof that Larrabee is in fact out there, and Intel is working on physical prototypes as we speak. There is a lot of compiler work to be done on Intel’s side to make it an effecient architecture when competing with more traditional designs with fast fixed function units still doing a goodly percentage of the work required by rendering. But the physical part most definitely exists, and we expect to see it introduced in late 2009.