Everything—literally everything—we design today is defined by its power consumption said Chris Malachowsky, an NVIDIA co-founder, fellow, and senior VP of research. Malachowsky spoke yesterday at a luncheon during the ICCAD conference held this week in San Jose, California. At the low end of the system spectrum, mobile devices are defined by how much you can do with a Watt. At the high end, supercomputers and supercomputer performance are now defined by how much electricity you can afford. Malachowsky joked that supercomputers now use so much electricity that the local power company is giving them away for free when you sign up for a 2-year service contract just like mobile phone handsets are subsidized by the carriers here in the US. That’s funny enough to hurt.
Coincidentally, NVIDIA makes chips that go into both wireless handsets at the low end (the company’s Tegra series of processors) and supercomputers at the high end (the company’s Tesla series of GPU—graphics processing unit—processing chips). In between are the original NVIDIA products, the GeForce and QUADRO series of graphics chips and boards. Here’s a graphic of the NVIDIA product line from Malachowsky’s talk:
Tegra mobile application processors go into mobile handsets that cost roughly $100 but deliver about 2x the CPU performance and 4x the GPU performance of a PC that sold 10 years ago for about $3200. Here’s the specs for comparison:
The quest for performance isn’t going to stop in the handset space so NVIDIA has a roadmap for future processors. The product currently on the market is the Tegra 2 and NVIDIA has already previewed the next step up, code-named Kal-El (Superman’s original name on Krypton). (Note: for more information on Kal-El, see my blog posts “Processor Wars: NVIDIA reveals a phantom fifth ARM Cortex-A9 processor core in Kal-El mobile processor IC. Guess why it’s there?” and “Friday Video: Why do you need four ARM Cortex-A9 processor cores in a mobile processor SoC?”)
Here’s an NVIDIA roadmap for Tegra processors from Malachowsky’s talk:
Note that NVIDIA likes to use superhero alter-ego names for future Tegra processors.
One thing that’s not progressing quickly on the mobile handset front is battery capacity. Batteries are just not getting better as fast as we’re adding transistors to silicon die thanks to Moore’s Law. As a result, the Tegra processors, like all mobile application processors, are constrained by the amount of power available in a handset.
On the supercomputer front, NVIDIA Tesla GPU chips already power three of the five fastest supercomputers in the world: the Tianhe-1A, the Titan (evolved from the x86 Jaguar), and the Nebulae. These supercomputers use a lot of processors, as you can see from this image:
The reason that NVIDIA chips are in supercomputers at all is because researchers and students recognized that NVIDIA’s evolving line of graphics chips contained a lot of parallel processing power and if certain tough math problems and algorithms could be re-expressed to look like problems of drawing and shading triangles, then GPUs could be pressed into service for these other sorts of problems. This conceptual leap resulted in the development of the NVIDIA Tesla line of supercomputing GPU chips.
However, supercomputers are also being constrained by power. Not in how much power is available—it takes megaWatts to run a supercomputer—but by how much power is affordable. And don’t forget, for every megaWatt needed to power the supercomputer, you need a comparable amount of power to cool the supercomputer.
Even the US Department of Energy (DOE) is concerned. It recently put out a Request for Information (RFI) to find out how we might build a 1-Exaflop (an Exaflop is a billion Gigaflops) supercomputer that “only” consumes 20MW (!!!) On the current commercial trajectory, with no extra DOE help, we will eventually be able to build an Exabyte supercomputer but it will consume four or five times the amount of energy said Malachowsky.
Why do we need an Exaflop supercomputer? Because simulation has replaced the wet lab, said Malachowsky. Science, all science, needs simulation and the more the better. The more the faster. As Malachowsky said in his talk, science needs 1000x more computing (but without 1000x the power consumption) because simulation or “computational science” has become the third pillar of science.
(Note: Theory and Experimentation are the first two pillars. Yeah, I didn’t know that either, but I have it on the authority of the President’s Information Technology Advisory Committee.)
And get this: no more lazy, lazy processor or system architects. The “process fairies” aren’t working as hard as they used to, said Malachowsky. Oh sure, they’re still bringing us 2x the transistor count with each new IC process step just like Gordon Moore promised way back in 1965. Sure, the process fairies are keeping that promise. But poor Dennard. His observation about power and speed scaling with lithographic geometry—that’s dead. It died at 90nm. Party’s over.
So what? Here’s what. We’re going to have to rethink our approaches to getting more processing performance using less power. Scaling is out and here’s the graphic proof from Malachowsky’s talk:
Without architectural innovation, the average annual rate of processor performance improvement appears to be dropping from 52% to 20%. Architecture is in and we’ve got to get smarter because the process fairies aren’t working as hard as they used to.
What can we do? Well, one approach is already evident in the design of the multicore NVIDIA Kal-El mobile application processor. The Kal-El chip contains five ARM Cortex-A9 processor cores. Architecturally similar, one of the five ARM processor cores is synthesized for low-power operation. The other four identical cores are synthesized for maximum performance and consequently draw more power. When the Kal-El chip has a lot of work to do, one or more of the high-performance cores is operating. When there’s just a little work to do, the operating system transfers the work load to the low-power core and shuts down all four of the high-performance cores. The Android OS already knows how to do this.
Kal-El’s low-power “companion” ARM Cortex-A9 core is an example of an emerging SoC design style called “dark silicon.” Fortunately, dark silicon is much easier to understand that dark matter or dark energy. Dark silicon simply describes sections of an SoC that are shut down and powered off. In earlier days when there weren’t enough transistors to go around, letting a piece of silicon go dark was unthinkable. In fact, we loaded up a processor with as much work as it could do and perhaps even a little more if we needed to push things. Dark silicon? Fugetaboutit. But now in the multicore era, we’re getting quite used to the idea.
However, dark silicon isn’t going to save us by itself. We need to get smarter about what we do inside of a single core as well said Malachowsky. We’re going to get smart about the energy cost of everything we do inside of a processor core. Malachowsky didn’t directly explain what this means but he did provide a clue.
Here’s a table from Malachowsky’s presentation that shows the energy cost of typical processor transactions:
Note the pattern in this table. The energy costs for moving operands around on chip are comparable to those for performing a computation. This ratio actually gets worse for data movement as lithographic scaling progresses because gates get smaller but the average wire length and cross-sectional resistance get larger. The energy cost for moving an operand on or off chip is higher still. It takes power to wiggle those printed-circuit board traces.
As I said, it was just a clue.
One of the most interesting parts of Malachowsky’s talk for me was where the funding for this architectural research will come from. I would never have guessed.
Video games.
That would be the two middle NVIDIA product lines shown in the first image in this blog post—GeForce and QUADRO. It seems that the video gaming market is pretty big—about $35 billion per year. That’s bigger than the movie market (and way bigger than EDA). Hard-core gamers will apparently pay handsomely for architectural advances as long as it lets them shoot faster.
So when we cure cancer, you can thank a gamer. Meanwhile, give some thought to Malachowsky’s words. There are a lot of really sharp ideas for designers of low-power systems in this presentation.






