During GTC 2014 NVIDIA launched the Tegra K1, a new mobile SoC that contains a powerful Kepler-based GPU. Initial processors (and the resultant design wins such as the Acer Chromebook 13 and Xiaomi Mi Pad) utilized four ARM Cortex-A15 cores for the CPU side of things, but later this year NVIDIA is deploying a variant of the Tegra K1 SoC that switches out the four A15 cores for two custom (NVIDIA developed) Denver CPU cores.
Today at the Hot Chips conference, NVIDIA revealed most of the juicy details on those new custom cores announced in January which will be used in devices later this year.
The custom 64-bit Denver CPU cores use a 7-way superscalar design and run a custom instruction set. Denver is a wide but in-order architecture that allows up to seven operations per clock cycle. NVIDIA is using a custom ISA and on-the-fly binary translation to convert ARMv8 instructions to microcode before execution. A software layer and 128MB cache enhance the Dynamic Code Optimization technology by allowing the processor to examine and optimize the ARM code, convert it to the custom instruction set, and further cache the converted microcode of frequently used applications in a cache (which can be bypassed for infrequently processed code). Using the wider execution engine and Dynamic Code Optimization (which is transparent to ARM developers and does not require updated applications), NVIDIA touts the dual Denver core Tegra K1 as being at least as powerful as the quad and octo-core packing competition.
Further, NVIDIA has claimed at at peak throughput (and in specific situations where application code and DCO can take full advantage of the 7-way execution engine) the Denver-based mobile SoC handily outpaces Intel’s Bay Trail, Apple’s A7 Cyclone, and Qualcomm’s Krait 400 CPU cores. In the results of a synthetic benchmark test provided to The Tech Report, the Denver cores were even challenging Intel’s Haswell-based Celeron 2955U processor. Keeping in mind that these are NVIDIA-provided numbers and likely the best results one can expect, Denver is still quite a bit more capable than existing cores. (Note that the Haswell chips would likely pull much farther ahead when presented with applications that cannot be easily executed in-order with limited instruction parallelism).
NVIDIA is ratcheting up mobile CPU performance with its Denver cores, but it is also aiming for an efficient chip and has implemented several power saving tweaks. Beyond the decision to go with an in-order execution engine (with DCO hopefully mostly making up for that), the beefy Denver cores reportedly feature low latency power state transitions (e.g. between active and idle states), power gating, dynamic voltage, and dynamic clock scaling. The company claims that “Denver's performance will rival some mainstream PC-class CPUs at significantly reduced power consumption.” In real terms this should mean that the two Denver cores in place of the quad core A15 design in the Tegra K1 should not result in significantly lower battery life. The two K1 variants are said to be pin compatible such that OEMs and developers can easily bring upgraded models to market with the faster Denver cores.
For those curious, In the Tegra K1, the two Denver cores (clocked at up to 2.5GHz) share a 16-way L2 cache and each have 128KB instruction and 64KB data L1 caches to themselves. The 128MB Dynamic Code Optimization cache is held in system memory.
Denver is the first (custom) 64-bit ARM processor for Android (with Apple’s A7 being the first 64-bit smartphone chip), and NVIDIA is working on supporting the next generation Android OS known as Android L.
The dual Denver core Tegra K1 is coming later this year and I am excited to see how it performs. The current K1 chip already has a powerful fully CUDA compliant Kepler-based GPU which has enabled awesome projects such as computer vision and even prototype self-driving cars. With the new Kepler GPU and Denver CPU pairing, I’m looking forward to seeing how NVIDIA’s latest chip is put to work and the kinds of devices it enables.
Are you excited for the new Tegra K1 SoC with NVIDIA’s first fully custom cores?
In order execution. Well I
In order execution. Well I guess you can’t have everything with the first try.
PS a few typos
and If i am not wrong Krait is Qualcomm’s not Samsung’s.
Whoops! Fixed, thanks :).
Whoops! Fixed, thanks :).
I have to admire NVIDIA’s
I have to admire NVIDIA’s ambitiousness. Custom ISA and on-the-fly binary translation is something that’s hard to do well, but very flexible. I’m sure NVIDIA isn’t just aiming at doing ARM emulation on this.
x86 binary translation? 🙂 If
x86 binary translation? 🙂 If only! Speaking of x86, I'd love to see NVIDIA with an x86 license.
Intel and AMD will hate that.
Intel and AMD will hate that. Probably that was the reason Intel didn’t continue giving them licenses for Intel compatible chipsets and never gave them a license for x86. They knew that Nvidia can become a real threat and it is clear now that they where right.
Anybody thinking of Desktop Steam OS machines with Full Nvidia hardware in them? (Nvidia SoCs, Nvidia mobos, Nvidia Geforce cards connected with NVlink maybe?)
Heh, my P5N-E SLI (RIP) knew
Heh, my P5N-E SLI (RIP) knew the pain of Intel compatible chipsets (it was all downhill from there from what I remember :-/ heh).
I think NVIDIA-powered Steam streaming targets are totally possible, in order for them to be native Steam clients more games would not only need to be ported to Linux but to the ARM architecture which doesn't seem likely. At least the Tegra K1 has full OpenGL support so there's that…
It’s more likely than you
It’s more likely than you might think. Nvidia bends over backwards to help devlopers to promote thier technology. Nvidia will probably port common used APIs to not only support ARM, but be optimized for Denver. In addition, they will throw hardware at developers, provide assistance in any ARM related issues, and provide marketing to games that do support thier platform. The focus will be on Android short term, but if Steam OS becomes somewhat popular we will see an Nvidia Steam Machine.
I wouldn’t mind being proven
I wouldn’t mind being proven wrong here. Who knows, The Way Steam Machines Are Meant To Be Built initiative could pop up tomorrow 🙂
Nvidia with a Power8 license,
Nvidia with a Power8 license, watch out Xeon!
Sadly NVIDIA and Intel have
Sadly NVIDIA and Intel have an agreement which specifically prevents NVIDIA from offering x86 emulation. Though that might end at some point.
Still, apart from the obvious route of using its own ISA as is (and I’m sure they have an ART compiler directly to it for Android L), think for example about a Power architecture emulation, where NVIDIA can provide backwards compatibility for older consoles. This could open the door for a next gen Nintendo console for example.
Why would Nvidia need to
Why would Nvidia need to invest any recourses in designing/changing the Denver’s microarchitecture to run the power8 ISA, when they could just License the already powerful reference designs from OpenPower/IBM, the Power8 reference designs are the ones that can eat Xeon’s lunch. Those designs already have 12 core, 8 treads per core, and are even wider order superscalar, and there is no need to improve the power8’s superscalar execution resources, as Nvidia and Apple have had to do to get more execution done with their ARMv8 ISA custom designs(Denver, and Cyclone respectively). IBM has the CPU architectural engineers, and they have been designing CPUs, since before there was any Integrated on a single Chip CPUs. The Power8 reference designs are up for ARM style licensing, and Nvidia already is integrating its GPUs, with Power8 for IBM’s HPC/Watson systems, on a Mezzanine module. Nvidia could take the base reference design Power8 and reduce some of the modular on die functional blocks, and create some desktop/laptop variants with Maxwell, or Pascal, graphics, and at first offer a standard socketed solution, or from the get-go Nvidia could take its Mezzanine module approach, and Nvlink(Derived from CAPI) and produce some powerful enthusiast SKUs. I seriously doubt that Apple, or Nvidia, could vastly improve the Power8’s already powerful design and why would they want to, when they could just license the Reference power8, integrate some of their own, and other third party IP, and have some Chip fab, fabricate the devices at cost, no middle man involved, in Apple’s case, and in Nvidia’s case, entry into the Desktop/laptop market with Nvidia brand CPUs/SOCs, Nvidia is already there with its custom ARM variants for the netbook/chromebook/tablet market.
Dedicated GPUs are going
Dedicated GPUs are going away, so Nvidia needs to do something different. Most lower-end laptops are shipping with integrated graphics only, but IGPs are moving up the product stack. Once Intel, Nvidia, and AMD start stacking memory chips in the CPU/GPU/APU, this will provide faster memory bandwidth than external memory used on graphics cards. There will be little reason to use a dedicated GPU at this point. In fact, a dedicated GPU may actually hurt performance due to the inefficiencies of having them in separate packages with separate memory spaces.
Intel has high performance CPUs, but their GPUs are still lacking; they have the resources to catch up though. AMD has both good (enough) performance CPUs, and good performance GPUs, but their CPUs seem to be held back by process tech currently. Nvidia has the GPU, but lacked a CPU until now. Since this is ARM based I don’t know if we will see these outside of tablets and such. They are doing emulation with run-time compilation though, so they may be able to emulate AMD64 easily technologically, but then they would need an x86 license. X86 compatibility may be less necessary than it once was given recent developments.
Yep, I tend to agree with
Yep, I tend to agree with your points. I do hope that those rumors of Carrizo APUs using stacked HBM memory are true though I am concerned that they may /still/ not be utilizing a truly shared memory space.
If we can get to a point where the CPU and GPU can access and manipulate the same data set over a high bandwidth, low latency (e.g. on package memory), compute and HSA gets a huge advantage and even though a dedicated GPU may have multiple times the processing power as an IGP, the overall compute time and performance may actually go to the APU as in some workloads the memory copy operations back and forth severely limit/hurt performance.
With that said, I don't think I'm ready to give up my dedicated graphics card for an IGP just yet :-).
I would take an IGP if they
I would take an IGP if they connect 8 GB of GDDR5 directly to it and put it in a laptop. This would be like a PS4, so it would probably handle everything I would want to use it for.
Unlike a PS4, games would not
Unlike a PS4, games would not be optimized nearly as efficiently.
Have you heard any more news
Have you heard any more news about The PowerVR wizard, and have you had the chance to ask Nvidia, about the future of dedicated Ray Tracing hardware for GPUs, for not just gaming but for rendering(professional graphics)? Also Does Nvidia see its Denver cores, or a future variant, made into a Many Cores Lower cost solution, to Intel’s Xeon Pi brand, for HPC, or professional graphics uses? Hardware ray tracing done on the GPU may, in the future, negate the need for server grade CPUs, with professional graphics ray tracing done entirely on the GPU.
3 cpu vendors would be awsome
3 cpu vendors would be awsome tho… noone would dare milk the consumer with shitty products then.
Only need Intel to start making gpus then to have 3 vendor clash for gpu`s too! ^^
What do you mean 3, hell the
What do you mean 3, hell the ARM vendors, number in the hundreds, and at least 6 or more with Top Tier ARM architecture licenses, in addition to Nvidia, Nvidia could very well license the Power8, Nvidia is a member of OpenPower foundation, and the first non IBM Power8 SKUs are expected to begin arriving in 2015, you want to see some powerful server chips running Nvidia GPUs, wait until the Power8s start arriving with Nvidia’s GPUs on the Mezzanine module surrounded by stacked memory and Nvlink(CAPI derived) at up to 1TB, or more, for bandwidth. Some of this tech is bound for the consumer market, and Power8 is a ISA Nvidia could do things with in workstations, along with ARM for the mobile market.
Binary translation with
Binary translation with dynamic code optimization in the software layer with a system memory cache.. sounds a lot like the Transmeta Code-morphing software. That worked out well for Transmeta….
“Denver is the first (custom)
“Denver is the first (custom) 64-bit ARM processor for Android (with Apple’s A7 being the first 64-bit smartphone chip), and NVIDIA is working on supporting the next generation Android OS known as Android L.”
There is no such thing as a CPU that runs the ARMv8 ISA, that is intrinsically bound to any OS: iOS, Android or otherwise. Did you mean the first (custom) 64-bit ARM processor to target the Android based market, and as such Nvidia is missing out on a wider customer base, and Nvidia will be walling themselves off from the laptop/netbook market, there will be full Linux distros running on these SKUs. Nvidia do not go all in with any one OS, there will be others producing custom ARMv8 ISA running silicon that will run any OS, don’t go iOS style closed with the K1.
This is interesting the K1, and the custom ARMv8 ecosystem market, and hopefully soon there will be AMD’s answer. Thanks to Nvidia the desktop level driver support has been brought to the mobile market, a much more significant development for mobile, than just an Android capacity, that any ISA has, and any ISA that can run the Linux kernel, the Android VM, does not a make functioning OS, without a kernel to hold its hand! That 128, DCO cache where does it physically reside? is it on package, and just shares a main memory address space, or is it in system memory an off package? If It’s in memory then hopefully in the next Tegra iteration it can be moved on Module, and share a larger Stacked memory DCO, and larger GPU, and 4 Denver Core, laptop variant, for sure the IPCs are there, and I can’t wait for the Benches, single threaded and above, and maybe some Blender, and Gimp benchmarks, after the Debian folks get some of their branch working on this SOC, and full desktop driver support for OpenGL, etc. is a must, and with that Kepler graphics, some cycles rendering on a Tablet/chromebook form factor, under a Full Linux naturally close to the metal.
I mean in the sense that it
I mean in the sense that it is the first 64 – bit arn processor to support Android. Other operating systems supporting the K1 are possible though at least from a hardware standpoint if not a business decision or software support one.
Nvidia missing out?android is
Nvidia missing out?android is the number one is on the planet!so I am sure they can live with it
Android needs the Linux
Android needs the Linux Kernel, or Android is useless, Full Linux is where the creation will happen, and not on Google’s metrics gathering toy store run time. The development platforms for the K1 are all using Full Linux, and the SDKs run under Full Linux, Android is the number one for consumption, but the creation requires full Linux! So far Wintel has a full OS, based Tablet OS, but at an over priced cost. A Steam OS based K1 tablet will be made!
Nvidia milking the mobile
Nvidia milking the mobile market now just wait multiple shield tablets