I attended DATE (Design and Test Europe) this month in Grenoble and was fascinated by Steve Furber’s keynote titled “Biologically-inspired massively-parallel architectures—computing beyond a million processors.” Furber’s introductory remarks really clarify what’s been happening to the energy cost per instruction executed over the past 60 years—and what’s likely to happen in the future. Strike that—make it “what’s got to happen.” Just in case you didn’t know, Furber was the principal designer of the original ARM processor back when “ARM” stood for “Acorn RISC Machine.” Acorn was a leading UK personal computer maker and in the early 1980s, it decided it needed its own microprocessor. The rest, as they say, is history. Acorn is gone. ARM is here, big time.
But back to Furber. Today, he’s the ICL Professor of Computer Engineering at the School of Computer Science, Manchester University, UK and his CV sports a long list of impressive achievements. Let’s just say he’s been busy since leaving ARM. These days, he and his group at Manchester University are developing digital ways to emulate organic brain functions. In essence, his group is developing digital analogs of neural networks. Now electronic neural networks aren’t something new. I can remember discussing them when I was a college freshman. That was 1971. Not new. Not recent.
The Manchester University team is developing an SoC with a “massively” parallel network of eighteen ARM 968 RISC processors all mutually interconnected through a Silistix self-clocked network on chip (NoC). Furber had a hand in the early development of this NoC, also at Manchester University. (See, he’s been busy, like I said.) The project is called SpiNNaker. (http://apt.cs.man.ac.uk/projects/SpiNNaker/)
Now there’s a reason for repeatedly emphasizing Furber’s connections to Manchester University and he discussed it in his keynote. Any serious discussion of the history of computing must include the Manchester University Mark I “Baby,” which was the first fully programmable, stored-program digital computer to go online. Baby executed its first program in 1948. ENIAC, developed at the Moore School of Electrical Engineering at the University of Pennsylvania and usually called the first fully electronic computer, was operational two years before the Manchester Baby. But ENIAC was physically programmed with wires—at least initially. Eventually, ENIAC was retrofitted with some programmability but the Manchester Baby was first.
When operational, the Manchester Baby computer executed roughly 800 instructions per second. That was a heck of a lot faster than the mechanical calculators and punched-card equipment of the day but it’s laughably slow when compared to today’s processors. (Even the Intel 4004, the world’s first commercial microprocessor introduced in 1971, executed 108,000 instructions/second.) More to the point for the purposes of this blog, the Manchester Baby consumed approximately 5 Joules of energy to execute each instruction.
Fast forward to today and those ARM 968 microprocessors in the SpiNNaker chip. An ARM 968 processor executes roughly 20 million instructions per second, dissipating 10^-10 Joules per instruction. In other words, the per-instruction energy consumption needed to execute a machine instruction has improved by a factor of about 50 billion in 60 years.
Now the old, worn comparison usually asks you to consider what the world would be like today if automobile manufacturers had improved the energy consumption of their products by a factor of 50 billion in 60 years. That’s not the point here.
Furber’s point is this: if the energy cost per instruction had not improved by such a huge amount since 1948, this world would be a very different place. There would be no cell phones, no iPads, no personal computers, no personal music players, and very few embedded systems of any sort. These would simply be impractical for reasons of all three “P”s: price, performance, and power.
We have relied almost exclusively on Moore’s Law to get to this point.
That ride’s over.
At today’s bleeding-edge IC fabrication process lithographies, 28nm, we’re imaging individual atoms. Layers are a handful of atoms thick. The number of atoms in a transistor is so shockingly few that dopant atoms no longer operate statistically. The resulting on-chip parametric variability is becoming a very real problem that forces physical designers to use bigger and bigger guard bands on design rules. Speed and power gains are slowing from IC generation to generation. We have arrived at the point of rapidly diminishing returns and we’re clearly not getting another factor of 50 billion improvement in the power needed to execute a machine instruction from here on.
Yet the guidepost pointing to lower power operation is frustratingly close and familiar. It sits between your ears. We have chosen to design processors that execute one (or perhaps a few) instructions at one time, but at a very high execution rate. The higher the better. The brain is designed with an entirely different approach. It’s a highly parallel machine where “parallel” means a lot more than 18 processors. The brain contains approximately 10^11 neurons with 10^15 synapses. The neurons are the brain’s processors and the synapse connectivity is the brain’s memory and programming.
Neurons are very simple and very slow processors, but there are a lot of them working in parallel.
The entire brain human operates at roughly 100W—about the power consumption of a PC processor—but the brain runs at 100Hz. Although we can certainly get a lot of processing done with 100W, it’s not a drop in the bucket compared to the brain’s audio and visual processing abilities, let alone its ability for abstract thought. And we can’t get anything done at 100Hz. Our programming models cannot currently accommodate brain-style processing. We do not yet understand parallelism on the brain’s scale.
In addition, our processing systems are remarkably intolerant of failure. Microprocessors represent single-point failure nodes in most embedded designs with a few exceptions such as majority-voting avionics systems where single-point failure usually means death, so we go “massively” parallel with three processors.
The brain however is very tolerant of failure. Our brains lose neurons all the time. In fact, some of us hurry that process a bit by regularly drinking alcohol and killing off a few extra neurons a day. So what? When you’ve got 10^11 neurons, you’re not going to miss a few and of course the brain doesn’t.
The goal of the SpiNNaker project is to create an early parallel platform that will allow brain researchers to study the operation of a machine that can digitally emulate mechanisms that the brain uses to process a wide range of sensory data, to control an incredibly complex system of muscles and organs, to deal with the complex issues of written and spoken language, and to make huge leaps in abstract thought. SpiNNaker will not produce a leap by a factor of 50 billion, but perhaps it will get us going on the right path, now that we’ve managed to come this far.