NVIDIA’s Tegra X1
NVIDIA has released the latest Tegra featuring Maxwell GPU tech
NVIDIA seems to like begin on a one year cycle with their latest Tegra products. Many years ago we were introduced to the Tegra 2, and the year after that the Tegra 3, and the year after that the Tegra 4. Well, NVIDIA did spice up their naming scheme to get away from the numbers (not to mention the potential stigma of how many of those products actually made an impact in the industry). Last year's entry was the Tegra K1 based on the Kepler graphics technology. These products were interesting due to the use of the very latest, cutting edge graphics technology in a mobile/low power format. The Tegra K1 64 bit variant used two “Denver” cores that were actually designed by NVIDIA.
While technically interesting, the Tegra K1 series have made about the same impact as the previous versions. The Nexus 9 was the biggest win for NVIDIA with these parts, and we have heard of a smattering of automotive companies using Tegra K1 in those applications. NVIDIA uses the Tegra K1 in their latest Shield tablet, but they do not typically release data regarding the number of products sold. The Tegra K1 looks to be the most successful product since the original Tegra 2, but the question of how well they actually sold looms over the entire brand.
So why the history lesson? Well, we have to see where NVIDIA has been to get a good idea of where they are heading next. Today, NVIDIA is introducing the latest Tegra product, and it is going in a slightly different direction than what many had expected.
The reference board with 4 GB of LPDDR4.
Tegra X1
The latest GPU architecture that NVIDIA introduced with the well-received GTX 750 Ti. This was the first product based on the Maxwell architecture, and it provided a significant improvement in overall efficiency when it came to performance and power scaling. The architecture also has added a few new features to the mix in terms of new AA methods, floating point formats, and the addition of HDMI 2.0 support. NVIDIA followed up the GTX 750 with the GTX 970 and GTX 980 graphics cards. These have proven to be outstanding performers in the market and show off the extent of power efficiency that NVIDIA has designed into their latest products.
The Tegra X1 integrates the latest Maxwell architecture into the ARM ecosystem. Two full SMM (streaming multiprocessor Maxwell) of 128 CUDA Cores power the graphics engine for a grand total of 256 CUDA cores. This is attached to 16 full ROPs, so there is plenty of pixel painting power. NVIDIA is claiming that the X1 can provide up to 1 TF of performance at the 4 watt TDP range. The architecture provides support for OpenGL ES 3.1, OpenGL 4.5, DirectX 12, AEP, and CUDA 6.0.
The two SMMs will also provide more tessellation power than the single SMX of the K1. GPGPU applications will also see up to the 1 TFlop range of performance with FP16, and 500 GFlops in double precision applications. This last bit will become much more important later on when NVIDIA goes into some of their programs in the automotive sector that the X1 is aimed at.
One area that the X1 might have some traction in is that of its 4K support. It has a built-in 4K decoder that provides 4K 60 fps support. With 4K support becoming much more common, this is a logical advance that could gain some extra customers for the X1. Entertainment, information, and electronic signage are obviously where this is aimed at, and it looks to be one of the few ARM based chips out there that supports 4K 60 fps. The unit supports 4K H.265 and VP9 formats at the full 60 fps. It also supports the 10 bit color depth 4K H.265 codec. The encoder supports 4K 30 fps with H.264, H.265, and VP8 formats.
The upgrade for the graphics and 4K support are nice, but what about the CPU portion of the X1? One would expect a couple more Denver based cores making their way into the latest Tegra? That is not the case in this particular implementation. NVIDIA decided to go with the Cortex A53 and Cortex A57 designs for this product. The Tegra X1 is comprised of eight total cores; four are made up of A5 units while the other four are A57. These are 64 bit cores and provide good overall performance compared to the previous, 32 bit Cortex implementations. This is not to say that Denver cores may eventually make their way into the Tegra X1 as we saw with the Tegra K1 but, for now, the 4+4 implementation using the ARM designed cores are what we are getting. NVIDIA claims that, with their expertise gained from the years they offered 4+1 cores with Tegra 3 and 4, they have implemented a more efficient 4+4 setup than their competitors (such as the Samsung Exynos 5433).
NVIDIA is utilizing TSMC’s 20 nm planar process for production of these parts. 20 nm planar is an effective solution for smaller, low power devices and has already been a proven commodity with Apple’s A8 SOC that is used in the latest iPhone 6 products. When asked about fab space for 20 nm NVIDIA replied that “there is enough” for them to produce what they want. They also have other options for manufacturing that they are considering, but obviously do not want to talk about.
Over 1200 solder balls comprise this BGA. The chip substrate is around 1.5 x 1.5 cm.
The chip communicates with LPDDR3 and LPDDR4 memory devices through a 64 bit connection. With the fastest LPDDR4 on the market, the chip will see memory bandwidth of up to 25 GB/sec. What is interesting to me is that the chip features over 1200 BGA connections on the back of what is a very tiny chip. The chip has more input and output features than I am describing here, but those are aimed at multiple cameras in an automotive environment.
The reference board sports a heat dissipation unit that mimics the thermal properties of a tablet device.
The Tegra X1 is a major upgrade from the previous K1. It features more cores, more performance, and far greater efficiency. While it is aimed at the same TDP area as the previous chip, it will provide many new features as well as double the graphics and CPU performance of its predecessor. NVIDIA has not announced the speeds of these chips, but overall it looks to be in the same general area in terms of clocks. Where Tegra X1 really excels is the amount of work it can do per clock, and how efficiently (in terms of power) it can accomplish that.
PC Perspective's CES 2015 coverage is sponsored by Logitech.
Follow all of our coverage of the show at https://pcper.com/ces!
Before I read the rest of
Before I read the rest of this article, I just gota say, BRAVO MR WALRATH, that was just the right amount of juicy suspensful lead in before the “Click here to continue”. Caught my attention and then bamb, no idea what it is, click here to find out. Bad Ass.
“Today, NVIDIA is introducing the latest Tegra product, and it is going in a slightly different direction than what many had expected.”
That’s as good as “Which 1 of the 2 major ketchup brands is giving your kids cancer? The answer might surprise you, tonight at 11”
Ok, so now I’ve read it, very
Ok, so now I’ve read it, very intresting part it seems. I could see this guy packed into an android based dedicated video playback system, either a laptop with a 4k screen or a mini pc style box, this would even make a kick ass chrome-box or chrome-top or whatever they call their desktop model.
Question tho, it says it has DX.12 support. Why? Is there any ANY instance past present or future where you can run any DirectX program on any ARM cpu? Is there any reason for this or is it simply “It’s part of the architecture, cant get it out so might as well mention it” kinda thing?
what problem directx have
what problem directx have witrh ARM? windows phones has been using qualcomm ARM chip for so long. also there was WinRT. did you think WinRT use something different than DirectX?
I would imagine that DX 12
I would imagine that DX 12 will be used on this chip with windows 10 and or windows phone 10.
The ARM reference design
The ARM reference design A57/A53 cores have less than half of the Denver core’s performance(7+ IPC), so 4 Denver cores would have as much compute performance as 8 of the (A57/A53 cores which are rated at 3 IPC per core). So what is up with the whole Project Denver, and Nvidia’s custom core, will Apple be the only one using a custom ARM core this year, in new products, or will Nvidia be tweaking Denver, for some product in 2016, and the expected arrival of AMD’s custom ARMv8 ISA based APUs.
My guess is that there was
My guess is that there was just too many bugs with the first iteration of Denver, which isnt unexpected its the first gen of a brand new superscalar design, plus its a new area for Nvidia, theyre usually a GPU guy and implementing stock ARM cores, now they have to actually design the cores, and i dont think it was as similar to GPU cores as they’d hoped.
i odnt think theyre going to abanadon Denver, i just think it needs a fair bit more refinement
From what I read at
From what I read at Anandtech, its the 20nm process node, and it’s currently not ready for Denver, I wish when companies release their new products that they would release the Processor’s/SOCs data sheet imminently after their product is initially unveiled, it would save much confusion regarding the product. The Denver cores have over twice the IPC, of the ARM A57/A53 reference design cores, so it would be a serious mistake for Nvidia to get out of the custom extra wide order superscalar Denver core design, the Apple A7/A8/A8X is another extra wide order superscalar custom design, that likewise has twice the IPC performance of the reference ARM A57/A53 core. AMD will be introducing its custom ARMv8 ISA based cores in 2016, and then there will be more competition, hopefully AMDs custom ARMv8 ISA based APU will be able to execute at least 6 IPC per core, and offer competing graphics, to both Nvidia, and ITs PowerVR GPUs.
Loved the tech.Hated the
Loved the tech.Hated the direction that Nvidia has decided to take it.
They’ve too much cash on hand, and too little focus on their core business. The way good companies go tits up! Microsoft scenario in spades!
It might take 20 years, but I see their direction.
Those that don’t know history, are bound to repeat it.
Nvidia needs to focus on
Nvidia needs to focus on consumer for CES, and keep the Auto stuff for the Auto shows. The Apple A8, fits in a phone, the A8X is a tablet SOC. The GXA6850(the GPU in the A8X, as Anand tech describes it “GXA6850”), has 256 FP32 ALUs, and performs 512 FP32 FLOPs/Clock, 1024 FP16 FLOPs/Clock, Pixels/Clock (ROPs) 16, Texels/Clock 16. So Nvidia’s FP 16 only will not be as wide as the FP in the Kepler, is Nvidia having power usage issues with its fabrication process, and will a more mature 20nm node work some of these power usage issues out, and allow Nvidia to reintroduce FP32. Man the presentation was sloppy for Nvidia this time around, but all that car talk was not what most were expecting at a consumer electronics show!
Have you actually ever been
Have you actually ever been to a CES? About 1/5 of the entire show is filled with cars and car tech.
I was just talking to a
I was just talking to a friend about how when they first announced Tegra they did so with a demonstration with Audi and talked about integrated the chip into cars and how nobody cared back then and it seems like there are still people that don’t get it.
They are obviously still developing technology on their GPU side for PCs and these mobile chips are obviously going to be very useful in tablets and other mobile devices. I think the reason they spend so much time on demonstrating a lot of what they discussed during their press conference is because they were trying to highlight uses that might not be immediately obvious.
I think there also needs to be an understanding that in order for nVidia to survive as a company they need to evolve and they are working their way into more markets. If you honestly think nVidia can just get by making PC GPUs you are absolutely insane. That just isn’t a long term solution. Instead of just making GPUs nVidia is now also working monitor tech with GSYNC, mobile chips for primary tablets (but potentially cell phones) with Tegra, cloud rendering/video streaming with GRID, and now autos with all the crazy shit they are doing with Tegra.
Autos are a MASSIVE market, WAY bigger than PC gaming. Some of the stuff about rendering an odometer with different textures is pretty meaningless, but the reality of true self driving cars is going to be here much sooner than people may realize.
No one thinks Nvidia can just
No one thinks Nvidia can just get by on their gaming business alone, it’s the venue CES, there are auto shows for that type of introduction, but CES is for the consumer electronics, this is more of Nvidia’s marketing trying to use the CES buzz to plug its CAR products, pitch that to the auto executives. Now Nvidia’s customers are wondering about the Denver core based products. Autos are a commodity parts prices competitive business, and expect competition from Imagination technology’s and their PowerVR/MIPS based systems, in the AUTO and other markets. Hay GPU accelerators for server systems are big too, and they have their associated trade shows, I’m sure Nvidia is there, but cars are not on consumers lists of every year or two updates, like tablets and phones. Self driving cars, belong at the self driving cars trade shows, especially when it comes to their embedded computing systems, but keep your eyes out for Big Blue, they have actual neural net processors, hardware based, processors that are even lower power using than vector processors programmed to perform neural net processor type functions in software. Nvidia better clarify Denver’s status ASAP, or cause more confusion in the consumer market for its products.
I guess there is just a
I guess there is just a difference in the types of consumers. I understand that the average consumer doesn’t buy a new car every couple of years, but you could say the same about $1000+ TVs. I actually know for a fact that my brother and his wife have bought more new cars in the last 4 years than they have new TVs or PCs. Yet we obviously agree that TVs and PCs are absolutely a major part of CES.
And you can scoff about auto news coming into CES but I think it seems fair that if a product like a car is becoming increasingly more dependent on electronics and what those electronics actually mean for the consumer it is a venue for a company to talk about those things.
We also see products like washer/dryers, home security systems, fridges, oven/stoves, etc are all things that companies will show off at CES as well and that is all well and good. It doesn’t do anybody any good to be so rigid as to what is and isn’t allowed to be shown at CES just because there are other conferences. If you followed that logic you could just as easily say that nobody should show off phones because they should to that at Mobile World Congress but that would be silly.
did consumer really care
did consumer really care about denver? for the most part they don’t even know what SoC inside their phones/tablet. but more cores? more RAM? that might pick their interest because OEM has play that part to market their product. so it doesn’t matter if nvidia disclose more info about denver to the public. for most consumer that is not important. for tech geek? that might be but not general consumer.
can I say Mr walrath than you
can I say Mr walrath than you look remarkably good without hair on your head its gives you that senior military look, I feel like i should be saluting when you talk.
Can i also say i think we have seen how facebook/oculus is going to power the CV1. 4K @ 60fps is 3K @ 90fps.
Ok John carmack will be able to write close to the metal so that Oculus can push the display to a 4K crescent bay. But given the need to ramp production, keep costs down and the need to produce high grade audio and all the other things a VR headset needs to supply and get it out for later this year. i suspect we are looking at minimum a Note4 display (as used in cresent bay) and an nvidia x1 – and that will give a seriously kick arse headset – one that doesn’t require its customers to go out and buy SLI’d GPU’s to run it!
I suspect they will take a leaf out of the console makers books and do what the consoles failed to do this time and loss-lead for a couple of iterations using the facebook billions, as they grow the market and the brand name.
I hope you all slept well and thanks for the coverage and saving me money!
Well if you watch the CES
Well if you watch the CES video from 5:50 then you can see and hear 10 watts for power consumption! In this article mentioned 4 watts.https://twitter.com/Siriq111/status/552091164539494400
It will depend on the form
It will depend on the form factor and cooling. Most tablets will be around 4 watt TDP while the car solutions could be in the 10 watt range.
Something looks really odd.
Something looks really odd.
At 10watts for an embedded car solution you probably won’t need 4K support, and for mobile parts which could use the 4K support, the 10watts kills it for battery use.
What is the product aimed at exactly?
My guess is that the car
My guess is that the car solutions are using GPGPU heavily, which can benefit from pushing the TDP up quite a bit.
Great article Josh!
Great article Josh!
JHH said 10w+ during the
JHH said 10w+ during the presentation.
If it was 4w+ he would have mention Super Computer in a “phone” a billion times over like Dr. Evil.
Just repeating what I was
Just repeating what I was told at a meeting prior to the X1 release.
I think it’s amazing this
I think it’s amazing this thing is putting out 1TF of performance when 2 years ago PS4 was released with only 1.8TF. Who’s to say…next year Nvidia may have a product that fits in phones/tablets like this that is faster than PS4/XB1.
Standard quad core arm cpu’s
Standard quad core arm cpu’s ruined this soc for me. There really isn’t a single program I run that benefits from more than 2 cores and most is heavily single threaded dependent. Very little true multi tasking is ever done on a phone as well. Using larger more powerful dual cores is the proper way to power a phone soc. Most everything relies much more on fast single thread support. So mostly what matters is how fast you can make 1 core complete instructions so a dual core is all you need. 1 core running the background data gathering for ur email and other real time apps and 1 core running your currently in use program.
The denver and apple a8 are proof that stronger dual cores do outperform weaker quad cores and since tdp is a terribly scarce resource dual core cpu’s are always going to be more powerful per core than a quad core cpu.
This new x1 soc will actually be a downgrade in cpu performance for the vast majority of software that is single thread performance bottlenecked compared to the denver k1 unless the gpu side is so massively more efficient they were able to give the cpu a much higher portion of the tdp. In all likelihood the efficiencies gained on the gpu side went into making the gpu much faster for the same tdp
Such a disappointment as it would of been a huge upgrade with faster 20nm revised denver dual cores + the 256 maxwell cores vs 192 kepler cores. Nvidia is sooooo close to getting it right. Once nvidia comes out with pascal and stacked dram they will be in an amazing position to deliver an insane performing soc. With the ram built right into it and eliminating the use of an external ram chip they free up lots of mobo space which enables phones to have 2 nand chips instead of 1. Since 128GB is the max size one of these chips are offered in, this side effect of increased space on the mobo from the stacked dram allows phones to have 256GB internal space as the new maximum and dont forget about the increased memory bandwidth as well. Or they can just shrink the overall size of the mobo and fill up the gained internal volume with more battery.
Let’s just pray nvidia comes to their senses and continues to enhance and upgrade their dual cole denver architecture to go along with pascal and stacked dram. Nvidia could very well be propelled into first place with the higher performing mobile soc if they do this right.
Denver is not going anywhere.
Denver is not going anywhere. There were hints that we would eventually see a Denver based X1 later on. Using the A53 and A57 cores allowed NV to get this product to market faster since they didn't have to do as much verification on certain process nodes (ARM does a lot of that work before they license out the cores). Your point about really fast dual cores for a lot of cellphone and tablet applications is pretty spot on. I think the sweet spot for tablets might be 4 cores of Denver.