While he was the director of research and development at Fairchild Semiconductor, Gordon E. Moore predicted that the number of components in an integrated circuits would double every year. Later, this time-step would slow to every two years; you can occasionally hear people talk about eighteen months too, but I am not sure who derived that number. In a few years, he would go on to found Intel with Robert Noyce, where they spend tens of billions of dollars annually to keep up with the prophecy.
It works out for the most part, but we have been running into physical issues over the last few years though. One major issue is that, with our process technology dipping into the single- and low double-digit nanometers, we are running out of physical atoms to manipulate. The distance between silicon atoms in a solid at room temperature is about 0.5nm; a 14nm product has features containing about 28 atoms, give or take a few in rounding error.
Josh has a good editorial that discusses this implication with a focus on GPUs.
It has been a good fifty years since the start of Moore's Law. Humanity has been developing plans for how to cope with the eventual end of silicon lithography process shrinks. We will probably transition to smaller atoms and molecules and later consider alternative technologies like photonic crystals, which routes light in the hundreds of terahertz through a series of waveguides that make up an integrated circuit. Another interesting thought: will these technologies fall in line with Moore's Law in some way?
interesting with Moore’s law-
interesting with Moore’s law- it was not written up in research until the mid 60’s in Electronic’s, even the data in the document only started at 1959.
i only spent a half hour on research – so not much time- but did not find any quick information to support Moore’s law starting in 1953
So i hope someone can come to the rescue of this poor researcher and show me the way of truth
50 years old today means it
50 years old today means it had to start at 1965, remember it’s 2015 not 2003
Damn. I’m really late for
Damn. I’m really late for work then.
interesting – i’m 59, guess i
interesting – i’m 59, guess i counted backwards….. happens in old age and not being able to drink beer at work
Well there goes the
Well there goes the affordable plainer die shrinks, it looks like going to high density libraries can eke out some extra space, at whatever smaller plainer nodes remain, maybe 7nm is the affordable limit. FinFet is a start in the Z axis to give a little more atoms to a transistor that is becoming smaller in the X and Y to the point of running out of atoms. With memory already starting to be stacked, it looks like that will be the only way for logical circuitry also, so maybe the only way to go is up, for greater density, or CPU dies will have to become larger, and less will fit on a wafer. I guess the processor’s Cache memory will be the first part of the CPU die to start getting stacked, before the other parts start being designed stacked also, I’ll bet there will be a lot of die stacking of Cache/other memory on the top levels of the CPU, or on interposer modules to free up more die space for CPU logic. Moving away from silicon, and the silicon process is going to cost trillions, as all of the old processes will have to be changed, and it took decades and trillions to get the silicon processes where they are today.
That’s basically what Moore’s law/observation was about in the first place, the economics of providing that doubling of circuit counts over a period of time, and the continued economic viability to keep up the pace of miniaturization. Even though Moore’s paper did not explicitly state the exact time period (18-24 months) the mathematics that Moore used was all that was needed whatever the exact number that came to be. Now that the process shrinks are running out of atoms in the plainer nodes, more into the 3d stacking will have to be used for the continued use of the cost efficient silicon process infrastructure, those new materials processes, and optical, are going to be very expensive.
“I guess the processor’s
“I guess the processor’s Cache memory will be the first part of the CPU die to start getting stacked, before the other parts start being designed stacked also, I’ll bet there will be a lot of die stacking of Cache/other memory on the top levels of the CPU, or on interposer modules to free up more die space for CPU logic.”
Intel’s dual core broadwell cpu at 14 nm is 82 square mm and this includes a GPU, system agent, memory controllers, and pci-e controllers. The gpu appears to take more than half of the die. When I tried to figure out the size of an individual core from a picture, I came out with around 12 square mm, and this included the L2 cache and a 2 MB L3 slice. Just google broadwell die images. For reference, a high-end gpu is 400 to 600 square mm. We do not need to free up more die space for cpu logic. Single threaded, non-SIMD performance does not scale easily; you can throw a lot more hardware at it and not achieve much of any performance improvement. This is why we have 8-core cpus in the consumer space now.
What do we need more single threaded cpu performance for? Things like video encode/decode perform much better on the gpus or specialized hardware. Integrating the gpu is obviously happening. We are also using the available die space to integrate more special purpose hardware like video encode. Stacked memory isn’t there to make more room for cpu logic. It does allow you to integrate the gpu and have gpu-like memory bandwidth though.
If you want higher single
If you want higher single threaded performance then the extra execution pipelines per core are going to have to get more transistors, as each additional core’s execution ports/pipelines, or larger reorder buffers, better prediction logic, will require more transistors. Do not discount extra cores, now that the Vulkan, and DX12 graphics APIs can make use of all the processor core/threads available. Stacked memory(HBM) on interposers, and yes the processor’s cache memory takes up space on the CPU DIE, and if the Cache memory can be stacked on die, and connected buy a large high bandwidth bus connection through VIAs then there will be more space on the CPU’s die for logic, or more graphics resources. I personally think that silicon interposers will be used, and the CPU, and GPU may be fabbed separately, with the CPU and GPU connected through extremely wide buses(4096 bit+) to each other, and to stacked HBM/other memory.
This will allow for higher yields and not make the whole die useless if the CPU, or the GPU, has defects. The extra wide BUS and the CPU placed right next to the GPU on an interposer will allow for plenty of low latency high bandwidth between CPU, GPU and HBM stacked memory. The extra wide buses that will be appearing on some GPUs will allow for lower clocked more power efficient memory controllers, and still the extra wide bus widths allow for more than enough bandwidth. Expect to see silicon based interposers and lots of stacking in the future, simply because large monolithic DIEs are more prone to defects, and fabbing the components separately, and placing them right next to each other on an interposer will be the more economically viable solution while providing the same bandwidth as if the components were on a single larger monolithic DIE.
Interposers with multi thousand bit wide busses will be more common for CPUs(Connected to HBM), and GPUs/HBM, and expect to see the average bus widths only increase in the future, as stacked HBM memory takes over the role of hosting the OS/essential code, and DDR4 RAM is relegated to more secondary memory uses, like staging large datasets, etc. For each doubling of BUS width the clocks could be halved on the memory controller, and still provide the same effective bandwidth.
Those specialized encoders/decoders could also start to be fabbed separately, and connected via interposers, or most likely included with on the GPU die. Expect the use of Interposers to be a more common occurrence as CPU designs begin to offer more cores with more individual on core execution pipelines(for higher single threaded performance), but also expect solutions like Soft Machines VISC virtual cores architectures to be licensed by the makers of CPUs/SOCs.
“Silicon proven VISC™ architecture delivers 3-4x IPC
advantage on single and multi-threaded applications
without software changes Resulting in ~2-4x performance/watt advantage” (1)
(1)
Soft-Machines-VISCtm-Architecture-Tech-Briefing-vF.pdf
Extra execution pipelines do
Extra execution pipelines do not increase performance for single threaded code. There is limited ILP to extract, and you quickly make other parts of the system the bottleneck. CPU cores have gotten tiny. Broadwell is 82 mm2 at 14 mm with the CPU portion only taking about 20 to 25 mm2. An Nvidia Titan X is 600 mm2. If you look at a die photo with the individual units labeled, besides cache, the largest amount of area taken is the FP units. FP takes a lot of die area and this is compounded by these being wide vector units (MMX, SSE, AVX, etc). If you were to pull out the vector units and supporting hardware, the core size would be ridiculously small. This is why AMD can sell an 8-core (with each 2 core module sharing an FP unit) for around $150. These processors already have a large out of order window. Increasing out of order resources or speculative execution causes power consumption to explode with little performance gain.
Bottom line, extra die area is not needed for CPU cores. Most of the extra die area will be taken up by GPU resources since GPUs are massively parallel and can actually take advantage of the die area. Stacked memory will mostly be there to support the gpu, not the cpu. Most stacked memory technology is DRAM, which allows for high bandwidth but is still not low latency. Since it is not low latency, it can not directly replace an on-die SRAM CPU cache.
It would be interesting if they could pull the vector units back out of the CPU and replace them with a simple scalar FP unit. If you have an on-die gpu, it has a lot more processing power than CPU vector units. I don’t know if it is reasonable to pass vector CPU code off to the gpu, but it may be possible for HSA systems (AMD heterogeneous system architecture).
Also, do not assume that transposers will be cheap. It adds several manufacturing steps and a lot of opportunities for things to fail (micro-solder bumps, stacked vias, etc). There can also be issues with economical feasability. If you design a gpu made to work with multiple other GPUs or a multi-chip “single gpu”, then will these chips still be usable for the mid-range or lower end markets? There is a good chance that such a design would only be useful for extreme high-end parts, and would not be high enough volume to be economically feasible. IBM has made MCM (multi-chip modules) for their high end processors for a long time, but these are ridiculously expensive.
Stacking chips may work okay
Stacking chips may work okay for low power devices in smart phones, but it would be an issue for a cpu, and a serious issue for a gpu using a couple hundred watts. The memory chips would act as an insulator which would block heat dissipation.
There could be a lot of
There could be a lot of different issues. I have to wonder how big of an issue the power density is becoming. Your average processor has been consuming around 80 W or so for a long time. Intel has pushed clock speeds up very high again (4 GHz), but all of that power is being used in a tiny area of the die. GPUs may have similar issues, although they generally have a lot larger die, the power density may be similar or even higher due to high density libraries. This is a separate issue from the total amount of power. We have giant air coolers that can dissipate a lot of heat, but these do not provide the same temperature differential that a water cooler does. At too high of power density, you would need to use water cooling, or even active cooling at some point, even if the total power consumption is not that high.
In addition to transistor size limits, we may be pushing up against the actual limits of the photolithography. They have done all kinds of things to get around the fact that they are trying to make structures much smaller than the wavelength of light used in the process. Correct me if I am wrong, but I believe they are still using 193 nm ultra violet light for a “28 nm” process (or “14 nm” in Intel’s case). This was predicted to be a problem a long time ago, and they have always found ways around it. This can’t go on forever, and alternatives like EUV or x-ray lithography do not seem to be working. At some point they will not be able to produce high enough yields for it to be economically feasible.
“Moore’s Law” may not have slowed down much as far as the latest and greatest is concerned, but demand has definitely slowed. I know a lot of people using pretty old hardware for gaming. I still mostly use an extremely old mac book pro from 2006. It is a 2,16 GHz Core 2 Duo. It still works fine for web surfing and playing youtube video. The main limitation for doing anything else is 3 GB of ram and 128 MB Radeon X1600 graphics. I do have an SSD in it though. Without that, it might get pretty sluggish, especially when getting close to the memory limit.
What all this “moores law”
What all this “moores law” hype really means is that going forward you can expect 5 year CPU cycles and 2 year GPU cycles to be the norm for any improvements higher than single digit %.
wellllllll, it needs to be
wellllllll, it needs to be said that “Moores Law” became self fulfilling, a goal and nothing more. I’m glad, because without Moore we probably woudn’t have advanced this speed, but it’s a declaration of goals, not a observed law of nature.