Intel’s Architecture Day was held yesterday and brought announcements of three new technologies. Intel shared details of a new 3D stacking technology for logic chips, a brand new CPU architecture for desktop and server, and some surprising developments on the iGPU front. Oh, and they mentioned that whole discrete GPU thing…
3D Stacking for Logic Chips
First we have Foveros, a new 3D packaging technology that follows Intel’s previous EMIB (Embedded Multi-die Interconnect Bridge) 2D packaging technology and enables die-stacking of high-performance logic chips for the first time.
“Foveros paves the way for devices and systems combining high-performance, high-density and low-power silicon process technologies. Foveros is expected to extend die stacking beyond traditional passive interposers and stacked memory to high-performance logic, such as CPU, graphics and AI processors for the first time.”
Foveros will allow for a new “chiplet” paradigm, as “I/O, SRAM, and power delivery circuits can be fabricated in a base die and high-performance logic chiplets are stacked on top”. This new approach would permit design elements to be “mixed and matched”, and allow new device form-factors to be realized as products can be broken up into these smaller chiplets.
The first range of products using this technology are expected to launch in the second half of 2019, beginning with a product that Intel states “will combine a high-performance 10nm compute-stacked chiplet with a low-power 22FFL base die,” which Intel says “will enable the combination of world-class performance and power efficiency in a small form factor”.
Intel Sunny Cove Processors – Coming Late 2019
Next up is the announcement of a brand new CPU architecture with Sunny Cove, which will be the basis of Intel’s next generation Core and Xeon processors in 2019. No mention of 10nm was made, so it is unclear if Intel’s planned transition from 14nm is happening with this launch (the last Xeon roadmap showed a 10 nm transition with "Ice Lake" in 2020).
Intel states that Sonny Cove is “designed to increase performance per clock and power efficiency for general purpose computing tasks” with new features included “to accelerate special purpose computing tasks like AI and cryptography”.
Intel provided this list of Sunny Cove’s features:
- Enhanced microarchitecture to execute more operations in parallel.
- New algorithms to reduce latency.
- Increased size of key buffers and caches to optimize data-centric workloads.
- Architectural extensions for specific use cases and algorithms. For example, new performance-boosting instructions for cryptography, such as vector AES and SHA-NI, and other critical use cases like compression and decompression.
Integrated Graphics with 2x Performance
Intel slide image via ComputerBase
Intel did reveal next-gen graphics, though it was a new generation of the company’s integrated graphics announced at the event. The update is nonetheless significant, with the upcoming Gen11 integrated GPU “expected to double the computing performance-per-clock compared to Intel Gen9 graphics” thanks to a huge increase in Execution Units, from 24 EUs with Gen9 to 64 EUs with Gen11. This will provide “>1 TFLOPS performance capability”, according to Intel, who states that the new Gen11 graphics are also expected to feature advanced media encode/decode, supporting “4K video streams and 8K content creation in constrained power envelopes”.
And finally, though hardly a footnote, the new Gen11 graphics will feature Intel Adaptive Sync technology, which was a rumored feature of upcoming discrete GPU products from Intel.
Discrete GPUs?
And now for that little part about discrete graphics: At the event Intel simply “reaffirmed its plan to introduce a discrete graphics processor by 2020”. Nothing new here, and this obviously means that we won’t be seeing a new discrete GPU from Intel in 2019 – though the beefed-up Gen11 graphics should provide a much needed boost to Intel’s graphics offering when Sonny Cove launches “late next year”.
So it’s passive silicon
So it’s passive silicon interposers giving way to active interposers and logic included in the Interposer which just becomes another logic die in addition to hosting fabric/traces. And Both AMD and Nvidia have also released papers with similar active Interposer IP in the designs.
As far as Intel’s Graphics are concerned I’d like to see Shader core, TMU, ROP counts included just like AMD and Nvidia provide such information with their respective GPU SKUs. ROPs and Raster Operations are what will probably be of more interest to gamers and also tessellation resources.
Both AMD and Intel better start thinking about Tensor Cores and AI based denoising and AI based filtering as well as that’s something that the Professional Graphics/3D Graphics software is beginning to make us of, in software for GPUs that do not have hardware based AI Cores. So that’s GPU AI accelerated on the GPU’s shader cores but that’s not going to be as efficient as having a trained AI running on the Tensor Cores for things like AI assisted denoising, AI Upscaling, and AI based Graphics Filter Effects that Adobe, and other Professional Graphics software packages are making use of. All the flagship phone SOCs are now including NPUs/Tensor Cores and Dedicated DSP IP/Others so maybe that specialized processor IP needs to begin to appear on some x86 based mobile offerings for laptops and tablets.
And now the chiplet wars begin!
The die stacking looks almost
The die stacking looks almost exactly like an active interposer that AMD has talked about for a long time. Basically they would make the IO chip as an active interposer and place the cpu chiplets on top of it. I would think that would be a product that is a few years out for intel, but they may have significantly accelerated their plans given what AMD has been doing.
I don’t know if this really has that much of an advantage over AMD’s Rome implementation though. It would be smaller package size, but how important is that? They could do higher bandwidth from IO die to cpu chiplets but cpus can only consume so much bandwidth and the external DDR interface may be more limiting than the interconnect. The interposer also adds a size limitation and it is more expensive. AMD may go that route if the performance benefits are there. P When we get to Zen 3 or Zen 4, intel and AMD may have a very similar implementation again.
Both AMD and intel are going to face some serious competition from ARM going forward. ARM processors may be able to deliver much better compute per watt than x86, which is a very important metric when you are running an entire building full of servers. Either way, hopefully the Intel monopoly is in its last days. The nvidia monopoly is still, mostly, going strong though.
“The Interposer also adds a
“The Interposer also adds a size limitation” No you can splice Interposers and make a larger Interposer out of 2 smaller ones and get around that size limitation.
And everyone will be going towards Active Interposers starting with silicon Interposers that host the entire interconnect Fabric’s Traces and the interconnect’s Logic Circuitry as well will be on the Interposer with the Processor Die/Chiplets on top. APUs hopefully will be getting some HBM2, or eDRAM, Die/Chiplet love also.
Everything is going modular and DIEs using different fab process nodes can be utilized also on the same module. AMD’s research paper for an Exascale APU already mentioned Active Interposers and 3D stacked HBM instead of simply 2.5D HBM stacking.
ARM processors have been using POP(Package on Package) IP for several years and that’s the way the entire industry is heading for desktop/laptop usage also, in addition to Phones/Tablets where that IP is old hat by now.
AMD even has patent filings for placing FPGAs on the HBM stacks along with the DRAM DIEs for some in memory compute on future offerings. So AMD could potentially offer FPGAs programmed to act as Tensor Cores/Ray Tracing Core via FPGA DIEs with the FPGAs able to be reprogrammed if better algorithms become available.
All this modular die IP means that the processor makers will be able to offer improvments without having to redesign the whole module/system, be that module based system an APU or a GPU/other processor.
All this Big/Little CISC x86
All this Big/Little CISC x86 core stuff that Intel has announced will not stop the Windows on ARM devices from entering the market. Because even the Big ARM cores use less enegry than even the small x86 cores and the x86 ISA still takes more transistors to implement on silicon than any RISC ISA design. So more transistors equates to more power used, there’s no getting around the laws of physics on that matter.
The custom ARM SOCs also come with dedicated NPUs and DSP/Other processor IP in addition to Graphics, so Intel will have to match that also.
Once Apple begins offering Custom ARM based Laptop Devices there will be even less of a market for x86 based devices and most mobile Phone/Tablet devices are not x86 based anyways. At least AMD has a Mothballed Custom ARM(K12) design that it can tape out if needed for that ARM market should it begin to take more sales away from the x86 market on Laptop form factor devices.