Cache Structures, Complexes, and More

The L1/L2/L3 cache structure has changed from the previous architectures as well.  This is not surprising as these caches are absolutely key for overall good performance and throughput for any architecture.  An effective cache system also can improve upon energy efficiency as there are fewer wasted cycles going to main memory as well as the power required to make those accesses.  Each core features 96KB of L1 divided into 64K 4-way instruction and 32K 8-way data.  There is then 512 KB 8-way of L2 that is private to each core.  This then is connected to a large and fast 8 MB L3 cache that is shared between four cores.  The caches can all transmit up to 32 Bytes per clock.  In a CPU with 8 cores, the two L3 caches look to feature a fast interconnect so that data accesses between cores in the different modules do not impose a significant bottleneck (cache accesses, writes, scrubbing, etc.).

AMD seems to have rebranded “modules” from the Bulldozer generation to “Complexes” with Zen.  Each complex is comprised of the four cores, their private L2 caches, and the larger 8 MB L3 cache.  AMD considers the cache structure “mostly exclusive”, but it appears as though the contents of the lower two caches are replicated in the larger 8MB L3.  This lowers overall effective cache size but makes accesses simpler.  So when Core 0 needs to access data written to Core 1’s L2 cache, it merely has to reach out to the L3 cache and the address where C1’s L2 is stored.  Latency to each slice of L3 is nearly identical for each core.

AMD implements their own style of SMT with Zen to optimize performance by interleaving instructions to fill up what would usually be empty cycles and bubbles in the pipeline.  I am unaware of the differences between AMD and Intel’s respective implementations, but AMD has gone into more depth about how resources inside each core are shared and partitioned out for the two individual threads.

AMD is introducing new instructions in Zen that were not present previously in their product stack.  We see that AMD is also introducing two new instructions that are not yet supported by Intel (we do not know if Intel will be embracing these or if they will ignore it like FMA4).

Zen is a large departure for AMD considering where they are coming from with the Bulldozer architecture.  Previously my best guess was that it would be on a IPC level matching that of the Intel i7 3000 series.  It now appears that AMD is closer to the latest generation Intel parts, but we cannot be sure until 3rd party reviews have product in hand.  AMD has embraced IPC while still focusing on multi-threaded capabilities.  The rebalance of priorities here has allowed AMD to provide good potential performance across many applications.  The scalable architecture in terms of both power efficiency and multiple “complexes” per die should allow AMD to apply this core technology from low end APUs up to high end server chips (four core/eight thread to 16 core/32 thread).

On paper the architecture looks like a successful one.  There are a lot of good ideas and features packed into a design that is also flexible.  It could very well be a true successor to the Athlon 64 and will give Intel a run for its money.  There are hurdles standing in their way, though.  AMD still relies upon GLOBALFOUNDRIES for the majority of their fabrication.  GF has not had the best track record in getting new, next generation processes off the ground in a timely manner.  AMD currently uses GF’s 14nm LPP process for their Polaris GPUs and yields have been characterized as “good”.  Not “great” or “outstanding”, but “good”.  GF still is improving their lines, but AMD also has contingencies in place to tap Samsung for wafer starts if yields and bins are not good enough from GF.

The first public demonstrations had the 8 core Zen CPU running at 3 GHz.  Lisa Su said that this is not the final clockspeed that will be offered and they will be increasing it for the retail products.  We do not know how much more they can squeeze out of these chips and process line by the end of the year, but we can assume that production wafer starts have already commenced as of now.  I do not expect miracles here, but as time goes on the foundry engineers can adjust properties of production to hopefully squeeze out more clockspeed over the next four months.  A 3.5 GHz with a 3.8 GHz boost Zen CPU would be a solid competitor when compared to Intel’s Broadwell-E line.  While we will certainly see limited numbers of Zen late this year, AMD looks to go full bore on production and provide 2017 with plenty of product.  While we do not know if Zen will be a runaway success for AMD, we do know that it is a big step forward from their previous architecture.

« PreviousNext »