2011 was a great year for low-power design. I don’t think I can remember a year as good to low-power designers. I thought I’d devote this blog to a review of some major developments in 2011 that made low-power designers’ lives easier. In fact, there’s so much to talk about that I’m splitting this blog post in two. In the first half, I’ll write about significant developments in standard silicon offerings including microcontrollers, embedded application processors, and FPGAs. In part B, I’ll discuss some of the year’s most significant developments in design at the silicon level and the implications for people who design ASICs, SoCs, and ASSPs. It truly was a bountiful year.
If there ever was a year for microcontroller advancement, this was it. Every major microcontroller vendor had something new and exciting on the low-power front. So many developments that I can only hit the highlights:
In August, ARM’s Alan Rampon wrote a blog post listing 17 microcontroller vendors that were offering a broad range of low-power devices based on various ARM Cortex-M series processor cores. The vendor list includes:
- Analog Devices (Cortex-M3)
- Atmel (Cortex-M3)
- Broadcom (Cortex-M3)
- Cypress Semiconductor (Cortex-M3)
- Dust Networks (Cortex-M3)
- Ember (Cortex-M3)
- Energy Micro (Cortex-M0, M3)
- Freescale Semiconductor (Cortex-M4)
- Fujitsu (Cortex-M3)
- Holtek (Cortex-M3)
- Nuvoton (Cortex-M0)
- NXP (Cortex-M0, M3, M4)
- ON Semiconductor (Cortex-M3)
- Samsung (Sortex-M0, M3)
- ST Microelectronics (Cortex-M3)
- Texas Instruments (Cortex-M3)
- Toshiba (Cortex-M3)
That list is probably somewhat dated already, but you get the idea. The proliferation of low-power microcontrollers greatly accelerated during 2011. One such device that really sticks in my mind (because it’s recent), is the onset of shipments of the NXP Semiconductor LPC4350, which packs an ARM Cortex-M4 and an ARM Cortex-M0 into one microcontroller that costs less than $4 in quantities of 10,000.
This microcontroller is on the forefront of a new wave of processor design called “asymmetric multiprocessing” and there’s a real “wave-of-the-future” look to this development. (See “More news on the asymmetric processing SoC front”)
The microprocessor is 40 years old (last month!) and silicon microprocessor implementations have really advanced over those four decades while many of our design memes have not. In particular, I’m thinking of the meme that says “processors are expensive, so layer as many tasks as possible on a processor to save money.” The net effect of this meme is to make us develop increasingly complex multitasking schemes in an attempt to get processor utilization up to 80% or 90% or perhaps even 95%.
Now any engineer can tell you that when you load any component to near 100%, you have just sent and engraved, gold-plated invitation to Murphy, asking for an audience. In other words, something will go wrong. You won’t always get the latency you expected. You won’t always get the bandwidth you need.
So you’d better ask yourself: Are complex multitasking systems really worth the effort when I can get two 32-bit microprocessor cores in one device for less than $4? You’d better be serious coming up with that answer. I believe that asymmetric multiprocessing will remake all of design, including low-power design, during this coming decade.
Asymmetric multiprocessor design wasn’t the only innovation that loomed in 2011. Xilinx finally announced the first four members of its new Zynq 7000 EPP (Extensible Processing Platform) family, which fuses a processor complex containing two ARM Cortex-A9 processor cores with an FPGA fabric.
Now assembling systems with microprocessors and FPGAs isn’t new. In fact, putting processor cores and FPGA fabrics onto the same piece of silicon isn’t particularly new either. However, doing it right? That is new. And this development fits into the low-power design world because putting the processor complex and the FPGA fabric on chip with a massive on-chip interconnect between the two cuts interface power significantly by reducing the interconnect frequency. You don’t need GHz interconnect clock rates when you have thousands of wires for parallel interconnect.
2.5D IC Assembly
Speaking of Xilinx, the company started shipping engineering samples of the Virtex-7 2000T FPGAs to customers last month and this too is a low-power design story. The story is completely told in this graphic:
The Xilinx Virtex-7 2000T is a very large FPGA with two million logic elements. But it’s not a monolithic piece of silicon. Rather, the Virtex-7 2000T consists of four “identical” FPGA tiles, each with half a million logic elements (and a ton of other stuff). The FPGA tiles are mounted on a silicon interposer, which establishes more than 10,000 connections between each tile (56,000 connections in total). The silicon interposer is a fascinating piece of technology. It’s a 65nm IC with four layers of metal on each side of the die and no transistors. It’s a silicon circuit board that must be made in a wafer fab. In this case, TSMC owns the fab. The interposer is as large as the stepper reticule will allow. The advantage here is that each FPGA tile is a quarter of the size of the interposer, and die yield has an exponential relationship to die size. The smaller the die, the better the yield percentage. So 2.5D assembly makes a lot of sense in several different ways.
The 2.5D IC assembly-with-interposer approach taken to create the Xilinx Virtex-7 2000T allows the FPGA tiles to use lower power I/O drivers because these drivers will only be driving short, closely controlled traces between adjacent tiles. That system-design knowledge saves power. Although the Xilinx Virtex-7 2000T uses four identical die fabricated with a 28nm process technology to realize the active elements, 2.5D IC assembly permits heterogeneous die assembly as well, as shown in this image from Xilinx:
As you can see, 2.5D IC assembly allows designers the freedom to intermix die from radically different IC technologies such as logic, memory (DRAM, Flash, SRAM, etc.), analog, and RF. It’s a pc-board-like technology but on a much smaller scale. The resulting 2.5D device may well be better optimized and cost less than it might if the design team attempted to place everything on one monolithic die. That’s a topic I’ll take up in Part B of this blog entry.