AMD will deliver its latest round of APUs (Kaveri) on January 14th. These processors, built on a 28nm process, will combine the Steamroller architecture on the CPU with HSA-compliant Graphics Core Next (GCN) cores on the GPU. Together they are expected to bring 856 GFLOPs of computational performance.
Thomas Ryan at SemiAccurate, however, remembers that AMD expected over a TeraFLOP.
Of course Kaveri has been a troubled chip for AMD. At this point Kaveri is over a year late and most of that delay is due to a series of internal issues at AMD rather than technical problems. But now with the knowledge that Kaveri missed AMD’s internal performance targets by about 20 percent it’s hard to be very positive about AMD’s next big-core APU.
The problem comes from a reduction in the clock rate AMD expected back in February 2012. Steamroller was expected to reach 4 GHz but that has been slightly reduced to 3.7 GHz; this is obviously a small impact from a compute standpoint (weakened by just under10 GFLOPs). The GPU, on the other hand, was cut from 900MHz down to 720 MHz; its performance was reduced by a whole 25% (Update: 20%. Accidentally divided by 720 instead of 900). Using AMD's formula for calculating FLOP performance, Kaveri's 856 GFLOP rating corresponds to an 18% reduction from the original 1050 GFLOP target.
But, personally, I am still positive about Kaveri.
The introduction of HSA features into mainstream x86 processors has begun. The ability to share memory between the CPU and the GPU could be a big deal, especially for tasks such as AI and physics. AI especially interests me (although I am by no means an expert) because it is a mixture of branching and parallel instructions. The HSA model could, potentially, operate on the data with whichever architecture makes sense. Currently, synchronizing CPU and GPU memory is very costly; you could easily spend most of your processing time budget waiting for memory transfers.
856 GFLOPs is a definite reduction from 1050 GFLOPs. Still, if Kaveri (and APUs going forward) can effectively nullify the latencies involved with GPGPU work, an Intel Ivy Bridge-E Core i7 4960X has an instruction throughput of ~160 GFLOPs.
And before you say it: Yes, I know, Ivy Bridge-E can be paired with fast discrete graphics. This combination is ideal for easily separated tasks such as when the CPU prepares a frame and then a GPU draws it; you get the best of both worlds if both can keep working.
But what if your workload is a horrific mish-mash of back-and-forth serial and parallel? That is where AMD might have an edge.
Can’t you recover at least
Can’t you recover at least some of this performance with a stabe over clock?
It sure is one hell of alot
It sure is one hell of alot eazier(uHMA), passing a 64 bit pointer, than moving a whole frame buffer’s worth of memory, to and fro, and a whole bunch more power efficient!
-Jebediah Springfield-
Or particle data… sound
Or particle data… sound files… geometry… images from a video file… textures… etc.
Is it possible, that those
Is it possible, that those GFLOPs values are not taking the turbo into consideration? i know, i know. amd would include turbo core measurements to show the product in the best light possible but 3,7 GHz is awfully low for a turbo (even though 28nm bulk may induce a frequency regression) and there are rumors that there were plans for 3,8 GHz versions being evaluated and there is at least a 3,4 ghz base/ 3,8 ghz turbo ES in the wild.
either GF did fail to impress again, or there is something different going on. maybe amd is going for power efficiency? or there is indeed a turbo that’s not being factored in, yet.
there is something fishy here. i hope kaveri is not going to disappoint on that front.
Recently AMD were bashed for
Recently AMD were bashed for “up to 1Ghz” statement regarding R9 290x….,so I supose that they would not show “up to 1050 GFLOP” now 🙂 Probably 856 GFLOPs is base.
Sure peak performance of
Sure peak performance of 850GFlops or 1TFlop is not the point. The big picture is the efficiency of the HSA architecture that has to be proved for real applications. Nevertheless, I would expect Kaveri to be able to reach the 1TFlop barrier as AMD already has produced a 2TFlop APU for the PS4.
“from 900MHz down to 720 MHz;
“from 900MHz down to 720 MHz; its performance was reduced by a whole 25%.”
Although we can sense the drooling, the correct value is 20 percent, not 25:
900 – 720 = 180
180 / 900 = 0.20 = 20 percent (less)
720 is 900 less 20 percent (not 25)
OR
720 / 900 = 0.80 = 80 percent
720 is 80 percent of 900
720 is 900 less 20 percent
Arithmetic is soooooo hard.
But hyperbole isn’t! Bigger
But hyperbole isn't! Bigger numbers means bigger hyperbole!
Yep. I screwed up. Got the
Yep. I screwed up. Got the 18% reduction from 1050 TeraFLOP to 856 TeraFLOP correct though.