NVIDIA’s Tegra X1
NVIDIA has released the latest Tegra featuring Maxwell GPU tech
NVIDIA seems to like begin on a one year cycle with their latest Tegra products. Many years ago we were introduced to the Tegra 2, and the year after that the Tegra 3, and the year after that the Tegra 4. Well, NVIDIA did spice up their naming scheme to get away from the numbers (not to mention the potential stigma of how many of those products actually made an impact in the industry). Last year's entry was the Tegra K1 based on the Kepler graphics technology. These products were interesting due to the use of the very latest, cutting edge graphics technology in a mobile/low power format. The Tegra K1 64 bit variant used two “Denver” cores that were actually designed by NVIDIA.
While technically interesting, the Tegra K1 series have made about the same impact as the previous versions. The Nexus 9 was the biggest win for NVIDIA with these parts, and we have heard of a smattering of automotive companies using Tegra K1 in those applications. NVIDIA uses the Tegra K1 in their latest Shield tablet, but they do not typically release data regarding the number of products sold. The Tegra K1 looks to be the most successful product since the original Tegra 2, but the question of how well they actually sold looms over the entire brand.
So why the history lesson? Well, we have to see where NVIDIA has been to get a good idea of where they are heading next. Today, NVIDIA is introducing the latest Tegra product, and it is going in a slightly different direction than what many had expected.
The reference board with 4 GB of LPDDR4.
The latest GPU architecture that NVIDIA introduced with the well-received GTX 750 Ti. This was the first product based on the Maxwell architecture, and it provided a significant improvement in overall efficiency when it came to performance and power scaling. The architecture also has added a few new features to the mix in terms of new AA methods, floating point formats, and the addition of HDMI 2.0 support. NVIDIA followed up the GTX 750 with the GTX 970 and GTX 980 graphics cards. These have proven to be outstanding performers in the market and show off the extent of power efficiency that NVIDIA has designed into their latest products.
The Tegra X1 integrates the latest Maxwell architecture into the ARM ecosystem. Two full SMM (streaming multiprocessor Maxwell) of 128 CUDA Cores power the graphics engine for a grand total of 256 CUDA cores. This is attached to 16 full ROPs, so there is plenty of pixel painting power. NVIDIA is claiming that the X1 can provide up to 1 TF of performance at the 4 watt TDP range. The architecture provides support for OpenGL ES 3.1, OpenGL 4.5, DirectX 12, AEP, and CUDA 6.0.
The two SMMs will also provide more tessellation power than the single SMX of the K1. GPGPU applications will also see up to the 1 TFlop range of performance with FP16, and 500 GFlops in double precision applications. This last bit will become much more important later on when NVIDIA goes into some of their programs in the automotive sector that the X1 is aimed at.
One area that the X1 might have some traction in is that of its 4K support. It has a built-in 4K decoder that provides 4K 60 fps support. With 4K support becoming much more common, this is a logical advance that could gain some extra customers for the X1. Entertainment, information, and electronic signage are obviously where this is aimed at, and it looks to be one of the few ARM based chips out there that supports 4K 60 fps. The unit supports 4K H.265 and VP9 formats at the full 60 fps. It also supports the 10 bit color depth 4K H.265 codec. The encoder supports 4K 30 fps with H.264, H.265, and VP8 formats.
The upgrade for the graphics and 4K support are nice, but what about the CPU portion of the X1? One would expect a couple more Denver based cores making their way into the latest Tegra? That is not the case in this particular implementation. NVIDIA decided to go with the Cortex A53 and Cortex A57 designs for this product. The Tegra X1 is comprised of eight total cores; four are made up of A5 units while the other four are A57. These are 64 bit cores and provide good overall performance compared to the previous, 32 bit Cortex implementations. This is not to say that Denver cores may eventually make their way into the Tegra X1 as we saw with the Tegra K1 but, for now, the 4+4 implementation using the ARM designed cores are what we are getting. NVIDIA claims that, with their expertise gained from the years they offered 4+1 cores with Tegra 3 and 4, they have implemented a more efficient 4+4 setup than their competitors (such as the Samsung Exynos 5433).
NVIDIA is utilizing TSMC’s 20 nm planar process for production of these parts. 20 nm planar is an effective solution for smaller, low power devices and has already been a proven commodity with Apple’s A8 SOC that is used in the latest iPhone 6 products. When asked about fab space for 20 nm NVIDIA replied that “there is enough” for them to produce what they want. They also have other options for manufacturing that they are considering, but obviously do not want to talk about.
Over 1200 solder balls comprise this BGA. The chip substrate is around 1.5 x 1.5 cm.
The chip communicates with LPDDR3 and LPDDR4 memory devices through a 64 bit connection. With the fastest LPDDR4 on the market, the chip will see memory bandwidth of up to 25 GB/sec. What is interesting to me is that the chip features over 1200 BGA connections on the back of what is a very tiny chip. The chip has more input and output features than I am describing here, but those are aimed at multiple cameras in an automotive environment.
The reference board sports a heat dissipation unit that mimics the thermal properties of a tablet device.
The Tegra X1 is a major upgrade from the previous K1. It features more cores, more performance, and far greater efficiency. While it is aimed at the same TDP area as the previous chip, it will provide many new features as well as double the graphics and CPU performance of its predecessor. NVIDIA has not announced the speeds of these chips, but overall it looks to be in the same general area in terms of clocks. Where Tegra X1 really excels is the amount of work it can do per clock, and how efficiently (in terms of power) it can accomplish that.
Follow all of our coverage of the show at https://pcper.com/ces!