CPU Design and Kepler in Your Pocket

The processor design of the Tegra K1 is very similar to that of the Tegra 4 in its primary form. There are four A15 cores running at a maximum clock speed of 2.3 GHz with a fifth A15 core at a maximum of 1.0 GHz (it is typically in the 500 MHz range I’m told) used during idle and low performance scenarios to save on power. 

The engineers have built on three key areas to improve the SoC.  First, and most importantly, the chip benefits from the experience that NVIDIA built from using the A15 cores with the Tegra 4. As with any microprocessor design, you improve performance significantly the second time around. 

Tegra K1 also benefits from the updated 28HPM process node from TSMC that combines the benefits of higher performance and lower power silicon.  Finally, the latest revision of the Cortex-A15 “r3” was used with the K1 which adds ARM-developed and integrated architectural power reductions. 

All three of these components combine to allow the Tegra K1 to run about 30% “better” than Tegra 4.  As with all mobile processors you can target two different, but equally important metrics when analyzing performance.  K1 will see 1.4x the performance at the same power draw of Tegra 4 or utilize about half the power to get to the same performance levels of the Tegra 4. 

NVIDIA did offer up a comparison of Octane performance and power draw on Tegra K1 against Qualcomm’s S800 SoC with Krait 400 graphics and the results look impressive.  The single white point on the graph represents the Apple A7 SoC with the Cyclone graphics core nearly matching the performance of Tegra K1 at 2 watts power draw – the maximum level for each part more than likely.  Without a complete power/performance curve, the comparison to Apple’s product will have to wait until we have K1 product in our hands.

All of this data was presented on the 32-bit, A15-based verions of the Tegra K1.  The 64-bit version that includes the NVIDIA Denver cores and will obviously change the game pretty dramatically.  From the information provided at NVIDIA's press conference the Denver cores run 200 MHz faster at the maximum clock rate and the CPU has a 7-way superscalar design which should make it very efficient.  We don't yet have architectural details or even performance estimates of Denver; that analysis will have to wait until later in the year.

Kepler in your Pocket

By far the most impressive part of Tegra K1 is the implementation of a full Kepler SMX onto a chip that will be running well under 2 watts.  While it has been the plan from NVIDIA to merge the primary GPU architectures between mobile and discrete, this choice did not come without some risk.  When the company was building the first Tegra part it basically had to make a hedge on where the world of mobile technology would be in 2015.  NVIDIA might have continued to evolve and change the initial GPU IP that was used in Tegra 1, adding feature support and increasing the required die area to improve overall GPU performance, but instead they opted to position a “merge point” with Kepler in 2014.  The team at NVIDIA saw that they were within reach of the discontinuity point we are seeing today with Tegra K1, but in truth they had to suffer through the first iterations of Tegra GPU designs that they knew were inferior to the design coming with Kepler.

Going forward, all future GPU architectures will be built and designed with mobile integration in mind.  NVIDIA’s Jonah Alben, SVP of GPU Engineering, described it as being part of the “bones” of the design team.  Very early in the design of Kepler, NVIDIA’s key minds decided that this was the direction for the company – and this isn’t without its own risks.  Maxwell, NVIDIA’s upcoming architecture due in desktop systems this year, will be the first design that was truly and completely built with Tegra as one of the targets.

If they can make it work, NVIDIA’s graphics IP will be scaling from milliwatts to megawatts and from phones to HPC server racks.  Alben seems confident that this does not require a sacrifice to the high end in favor of the low end – “it can be built perfectly for everyone” was a common thread amongst discussions.  If you consider the benefits that the GeForce line has had since the power efficiency improvements of Kepler were introduced, it is easy to see how this might actually work out in favor of each market segment.

« PreviousNext »