Will Compaan’s HotSpot Parallelizer technology take us to the promised land of parallel computing?

In connection with my just-written blog entry on the massively parallel SpiNNaker project (see below), I want to relate some information about another meeting I had last March at the DATE (Design Automation and Test) conference in Grenoble, France. I met with Compaan (www.compaandesign.com) and got a presentation on the company’s HotSpot Parallelization technology.

Here’s how it works. You start with application code written in C. You add pragmas around known code hotspots to switch on Compaan’s HotSpot Parallelizer and to switch it off. You discover these hotspots using regular code-analysis techniques already used for other sorts of software-specific optimizations. So far, nothing new here.

Then you submit the code to the Compaan HotSpot Parallelizer for analysis. The Parallelizer analyzes the code and creates a Kahn Process Network (KPN, http://en.wikipedia.org/wiki/Kahn_process_networks) that consists of many independently executable processes and the communications linkages needed to pass data between these processes. What you then end up with is several independent C programs that can be compiled and run on one processor, run on several processors, run or on some mix of processors and hardware built using a C-to-hardware compiler. Here’s a picture of the process:

The advantage of this approach is that it’s entirely automatic once you mark the hotspots with pragmas. To use this approach, your design will need to consist of deterministic, independent processes. Parallelization consists of creating a Kahn Process Network and then generating C code for the various independent programs. That generated code must include all of the inter-program communications needed to operate the KPN. Inter-program communications take place through FIFOs, which might be real hardware FIFOs or, more likely, FIFOs implemented in shared memory.

You could do this by hand and in a simple system you can do this by hand. In a complex system, you’ll want all the automation you can muster because otherwise the complexity will kill you, your team, and your project.

Once you have the several C programs that constitute the KPN, you can decide where each will execute. Some might execute on the same processor. That’s convenient because the inter-program communications is simple and takes place in the processor’s memory space. However, you’ll get no acceleration by running everything on one processor. In fact, you’ll likely slow things down with the added inter-program communications overhead. So, you might choose a multicore processor. Compaan’s HotSpot Parallelizer would seem to be a fast way to accelerate code execution for multicore designs. You might also wish to take some of the C programs in the KPN and transform them into hardware to maximize the acceleration potential. It’s your choice, based on cost/performance tradeoffs that are familiar to every System Realization team.

Compaan’s got some case histories that are certain to interest you. Just ask them to share.

Now the reason I’m writing about this product in my Low-PowerDesign blog is because you must use parallelization to drop power consumption. Stacking every possible task on one multi-GHz processor is not going to result in low-power operation and we should all know that by now. However, there are naysayers who tremble and say “we don’t know how to code for parallel execution.” Well, Compaan’s HotSpot Parallelizer apparently does.

Posted in Design, Low-Power, Multicore | Tagged | Leave a comment

Think Globally, Act in Parallel. What can you do with one million ARM cores acting in parallel and how do you get there?

Professor Steve Furber’s SpiNNaker project is in the news again. I wrote about Furber’s massively parallel brain-emulation project back on March 30 after listening to his keynote at this year’s DATE (Design Automation and Test Europe) conference in Grenoble, France. (See “The incredible vanishing power of a machine instruction. Is this the way to the brain?”) Furber’s DATE keynote title says it all: “Biologically-inspired massively-parallel architectures—computing beyond a million processors.” Furber and his team are referencing nature to help them tackle the really hard processing problems we need to solve in the future through massively parallel, brain-like computing. Brain-like computing—go slow, go wide, go massively parallel—seems to offer a proven, low-power approach to solving some of these big computational problems.

The SpiNNaker project is again in the news at EETimes Europe (see “A million ARM cores to host brain simulator”) and the idea of harnessing one million ARM processor cores is certainly a big idea. It excites me. However, we’re still at the humble beginnings of the project.

The SpiNNaker project’s first test chip harnesses 18 ARM9 cores on one 130nm chip manufactured by UMC in Taiwan. This is a 100M-transistor chip and, like most many-processor SoCs, the SpiNNaker SoC mostly consists of memory. The memory needs to be close to the processors for speed and for low-power consumption and there are 55 32Kbyte SRAM blocks on the SpiNNaker die. That’s 14 million bits of SRAM and, frankly speaking, that’s really not very much SRAM. Eighteen processors isn’t really a large number of processors either when your stated goal is one million.

The ARM processors on the SpiNNaker chip use packet communications to emulate the electrical spike communications that occur among the neurons in human and animal brains. From a hardware perspective, I think it’s easy to conceive of a system-level design like this and even conceptually scaling the design to a million connected ARM9 processors isn’t really hard, as long as you don’t try to enumerate all of the processors in your mind. However, with 18 processors per chip, you’ll need approximately 55,600 chips to build an interconnected network of one million processors. That’s still a mighty big box of hardware. More on that in a bit.

The rub is that we really don’t have many good ideas for programming such a massively parallel system. The SpiNNaker project seems to be mostly a hardware endeavor with the explicitly stated intent of developing a hardware testbed for brain researchers who will use SpiNNaker systems for studying various theories of brain function. Presumably, we’ll learn more about massively parallel programming by working with these systems and no doubt we will. As Furber says in a quote published in the EETimes Europe article, “We don’t know how the brain works as an information-processing system, and we do need to find out. We hope that our machine will enable significant progress towards achieving this understanding.”

Each SpiNNaker chip in the current design is bundled with a 166MHz, 1Gbit DDR SDRAM and packaged in a 300-pin BGA package. But we’re not going to be building million-processor testbeds with 18 processors per packaged chip. I’m almost absolutely, positively certain about that. This first SpiNNaker prototype just doesn’t scale to one million processors very easily. So the question is, how to get there?

Well, possible clues to answer that question can be found in two recent blogs that I wrote on the EDA360 Insider blog. First, Samsung has just announced successful tapeout of a 20nm test chip incorporating an ARM Cortex-M0 processor core. (See “Samsung 20nm test chip includes ARM Cortex-M0 processor core. How many will fit on the head of a pin?”) Now an ARM Cortex-M0 processor is not as powerful as an ARM9 processor, but then it’s not supposed to be. It’s designed for control-oriented applications and its 3-stage execution pipeline isn’t designed to get maximum speed from any given process technology. However, we’re building a system that emulates a brain that operates at a few hundred Hertz (that’s Hertz, not kilohertz, megahertz, or gigahertz) so I really don’t think the clock speed is all that critical when you’re talking about a million processors. The ARM Cortex-M0 processor core is still a 32-bit RISC processor and I am guessing with a high degree of confidence that it’s fully up to the task of executing the required electrical-spike calculations, albeit not quite as quickly as an ARM9 processor.

What’s interesting about a 12-to-14Kgate ARM Cortex-M0 processor implemented in 20nm process technology is that my calculations suggest that more than half a million ARM Cortex-M0 processors would fit on a chip the size of an Intel “Tukwila” Itanium processor (OK, that’s a big chip, but it’s a commercial one) and that calculation is based on the published number for the area required by an ARM Cortex-M0 implemented in 90nm process technology, not 20nm. Now there’s a lot of slop in this calculation. First, there’s the disparity of using 90nm numbers instead of 20nm numbers. Then there’s the disparity caused by putting no memory at all into the calculation. I just mentally tiled processors edge to edge. Ditto, there’s no on-chip interconnect.

So you probably won’t get half a million ARM Cortex-M0 processor cores on one 20nm chip. But you might get 100,000 or 200,000 ARM Cortex-M0 processor cores on a chip along with an interesting amount of memory and the required interconnect. Now we’re talking about only a handful of chips to get to one million processors. We’re talking about a tabletop box. Now we’re getting into the realm of the feasible for million-processor systems.

The second related blog entry I recently wrote in EDA360 Insider that also bears on this very interesting endeavor was about an announcement from Imec, a global research company. Just days ago, Imec announced that it and its partners successfully assembled a custom logic chip with two DRAMs in a stacked 3D configuration. (See “3D Thursday: IMEC prototypes 3D chip stack, finds some thermal surprises”.) This 3D stacked-chip prototype allowed Imec to test out some process ideas for manufacturing 3D stacked chip assemblies and to make some critical thermal tests to verify thermal models that will be so necessary when 3D assembly goes mass market. The 3D chip stack uses copper-tin micro-bumps and compression bonding for the electrical and mechanical assembly of the chip stack and you can see photos of the assembled stack below.

Here’s a photo of the overall chip stack:

And here’s a close-up of the edge of the chip stack to show the three stacked die.

The 3D Stack’s base chip is approximately 750µm thick. The two top components in the chip stack are each 25µm thick. There’s more technical info in the referenced EDA360 Insider blog.

I am convinced that 3D stacking of logic and RAM chips will be absolutely essential to developing massively parallel, low-power systems like the ones envisioned by the SpiNNaker project. First, the only way to feed data and instructions to massively parallel processing chips is through large amounts of on-chip memory and through high-bandwidth, low-energy channels connected to large off-chip memories. 3D assembly techniques permit both Wide I/O and high-speed serial I/O channels to work most effectively and at minimal energy levels and I expect to see rapid adoption of 3D assembly—even and perhaps especially in high-volume, cost-sensitive applications such as mobile phone handsets—in the next few years. This is precisely the sort of manufacturing technology we require to think seriously about million-processor systems.

Now all we need to do is figure out how to program them.

Posted in ARM, CMOS, Design, DRAM, Low-Power, Networking, SDRAM, SOC, SRAM | Tagged , , , , , | Leave a comment

Cadence’s Qi Wang discusses the use of good methodology for low-power, advanced IC designs

You can read Qi Wang’s writeup of a paper on low-power IC design presented by Global Unichip’s Alex Kuo here.

Posted in Low-Power | Tagged , | Leave a comment

Richard Goering discusses the low-power aspects of 40nm and 28nm design with Global Unichip’s Alex Kuo

Cadence blogger and long-time EDA editor Richard Goering spent some time at the recent DAC event in San Diego discussing the finer points of 40nm and 28nm design with Global Unichip’s Alex Kuo. Among the interesting tidbits from the interview are the increased use of IP blocks and how that complicates clock trees, the use of DVFS (dynamic voltage and frequency scaling), and how low-power description formats help reduce power consumption during design. You’ll find the interview here.

Posted in Low-Power | Tagged , | Leave a comment

Need to cut IP power? (Who doesn’t?) “Press here” says Calypto

All SoCs are built with IP blocks. Some of those are legacy IP blocks. Some are purchased from other vendors. Some are developed in-house. All of them draw power—static and dynamic power. At nanometer lithographies, the way to cut static power is through circuit tricks like high-Vt transistors and by powering down entire blocks when not needed. The way to cut dynamic power within an IP block is to stop clocking anything that doesn’t need to be clocked. Designers can gate clocks during the development of an IP design but what about existing IP blocks? Some can be retrofitted with clock gating but the ease of that exercise depends on how familiar the IP designer is with that IP block and how well documented the block is.

 

Face it, some most IP blocks aren’t that well documented. You may never know enough about the internals of a purchased IP block to fiddle with its clocking. Legacy IP blocks may have been long abandoned by their designers who have gone off to other tasks, other companies, other planes of existence. Even a block you’ve designed yourself may have scrolled off your own internal memory window long ago.

 

Designers everywhere have a common solution for these sorts of problems. “Give me a tool to do this” they demand from EDA vendors. “I just want to push the button.”

 

Usually, that’s easier said than done. Calypto’s got a tool you can try however. It’s called PowerPro and comes in two flavors: CG and MG. The CG flavor is based on the company’s SLEC sequential logic equivalency checker. That’s a tool that checks to see if modified IP block “A prime” works the same as original IP block “A.” It’s a general-purpose EDA tool with a variety of uses and one of those uses is for comparing an IP block’s function before and after clock gating.

 

Calypto’s PowerPro CG encapsulates the SLEC EDA tool to produce a “done for you” tool that can automatically insert clock gating into an IP design. It also checks to make sure the IP block’s behavior doesn’t change as a result of the added clock gating. Usually the insertion process takes 4 to 8 hours according to Calypto CEO Doug Aitelli who spoke to me about the product at DAC 2011 in San Diego. What do you get for this overnight run? Usually 10% to 30% reduction in dynamic power said Aitelli. Sometimes as much as 60%. Not bad for “pushing the button” I’d say.

 

There’s another flavor of PowerPro called PowerPro MG. Nope, not named for a cute little British sports car, “MG” stands for “memory gating.” We tend to forget that today’s SoCs are more than half memory measured by die area. Usually SRAM. We sort of allude to this fact when we talk about MPSoCs—multiple processor SoCs. With each of those processors comes a boatload of on-chip SRAM for fast execution. However, we don’t seem to explicitly call out the memory. We tend to ignore it. I guess MMSoC—mostly memory SoC—doesn’t have the same cachet as MPSoC in our processor-centric world.

 

However, if more than half of an SoC is SRAM, it makes sense to pay some attention to reducing the power consumption of an SoC’s on-chip SRAM blocks. That’s what Calypto’s PowerPro MG does. It can automatically add memory gating to an SoC design by evaluating the design’s behavior across many cycles.

 

It also goes a step further. Many SRAM blocks for SoCs now have a sleep mode where the memory’s operating power can be reduced by shutting down peripheral circuitry such as address decoders and sense amps while keeping the memory storage array alive. According to Calypto’s Aitelli, most SoC designers find these sleep modes too hard to use, so they simply don’t use them. They don’t have the time. But those sleep modes are still there just waiting to be used. PowerPro MG will add the necessary sleep/wake-up state machine to exploit this little-used memory feature. Push the button, save power.

 

Just a story from a chance meeting at DAC. Par for the course. There’s always something new to learn, something new to try.

 

To read my blog on the Low-Power Report Card Panel at DAC, click here.

Posted in Clock Gating, CMOS, Design, EDA, Low-Power, SRAM | Tagged , | Leave a comment

IBM Researchers Develop Planar, Monolithic, 1-Transistor Graphene IC—Make Graphene Party Like It’s 1959

This week in Science Magazine, IBM researchers published an article documenting the first graphene IC built using recognizable IC processing techniques. The simple 1-transistor, 2-inductor monolithic circuit operates as an RF mixer with a useful operating frequency of 10GHz. The operating speed is not especially impressive. The lithographic geometries are also unimpressive: a 550nm FET gate length is a process node that dates back well more than a decade for silicon IC processing, even if the researchers used e-beam lithography to draw the patterns.

IBM Graphene chip 

What is impressive? It’s the entire package. The thing that differentiated Fairchild Semiconductor’s planar IC concept from Kilby’s IC concept at TI back in 1959 was that the planar IC process could build circuits of increasing, arbitrary complexity using highly automated lithographic printing techniques. It was the beginning of mass production for electronics.

 

That’s what’s different about this process as discussed in the latest issue of Science Magazine. IBM’s researchers started with a silicon carbide wafer. They grew a two- or three-layer graphene film on the silicon face of the SiC wafer using high-temperature expitaxy. They then patterned the FET gate using PMMA (a transparent acrylic plastic commonly used for e-beam and nanoimprint lithographic processing) and hydrogen silsesquioxane (HSQ, a high-resolution e-beam photoresist) which they exposed with an electron beam. (The authors admit they could also have used more convnentional optical lithography for the geometries used in this experiment.) Researchers removed the excess graphene with an oxygen plasma etch. The FET’s gate dielectric is aluminum oxide, since silicon oxides aren’t to be normally found in this process. The inductors are patterned aluminum. The entire 3-element circuit operates similarly to an RF mixer built from more conventional silicon counterparts.

 

Don’t expect to see graphene ICs rolling off the production lines this year or next. That’s not what this demonstration is about. What this exercise proves is that you can indeed make graphene ICs with processing techniques familiar to anyone in the silicon IC manufacturing business. You can also make graphene FETs that operate at interesting frequencies using fairly large geometries by 21st-century standards even though graphene doesn’t have a natural band gap. As IBM’s press release points out, these same researchers have built graphene FETs with much higher operating frequencies using smaller gate lengths, but these earlier experiments did not employ assembly techniques resembling those in common use today for making silicon ICs. Now there’s an initial manufacturing process for mass production of graphene ICs. Now, it gets interesting.

Posted in Graphene | Tagged | Leave a comment

The Return of Heathkit (in spirit) at Maker Faire

I visited the most recent edition of Maker Faire last month and from what I can see, the event just keeps rolling along. I think it’s exciting to be in the company of people who love to make things. Anything. There is indeed a special joy in creating something all your own. We all experience this joy as kids and some of us stay enchanted for all of our lives.

 

One of the paths leading to becoming a maker used to involve the construction of Heathkits. The Heath Company was the most successful of a large number of electronic-kit vendors that emerged in the 1950s and 1960s. The list of kit vendors also included Eico and Lafayette Radio Electronics. Radio Shack even had a line of kits packaged in red and clear plastic boxes that doubled as perfboards.

 

The Heath Company started as an aircraft manufacturer in 1912 and eventually sold aircraft kits. However, after World War II there were a lot of government surplus electronic components available. Heath bought tons of these parts and developed a growing number of kits from them. (Tektronix started in the ‘scope business in a very similar manner.) Before starting college, I’d built two Heathkit oscilloscopes (one tube-based, one transistorized), a couple of FETVOMs, a transistor portable tester, a signal generator, a bench power supply, an RF generator, and a Heathkit microwave oven. I was fully on my way to becoming a maker.

 

Heathkit exited the electronic kit business in 1992, a victim of the rise of surface-mount technology (can’t hand-solder much of anything if you can’t see or hold it) and the explosion of advanced consumer electronics. Factories in Asia produce huge quantities of electronic products for unbelievably low prices. With the end of Heathkits (most of the other kit vendors faded to oblivion even earlier), one great path to becoming an electronic maker died.

 

Now it’s the 21st century and there’s a new maker movement afoot, abetted in part by Make Magazine and the associated Maker Faire, which is held in many locations but the original site is in San Mateo. That’s where I spent a pleasant afternoon last month.

 

Lost in Space B2 RobotAnd what did I find? Many crazy things including a hand-built replica of the robot from Lost in Space (update: you too can have one of these robots, click here), giant singing Tesla coils, imaginative giant pedal-powered contraptions that are related to bicycles in the way that dinosaurs are related to birds, a Sterling engine exhibit brought by my friend Doug Conner, a Chevy Volt to drive (I did take a test drive), about a zillion Steampunkers in costume, and the spirit of Heathkit.

 

 

The company that appears to be resurrecting the spirit of Heathkit is Sparkfun from Boulder, Colorado. It’s been a lot of fun to watch this company evolve. Sparkfun now produces a huge range of simple and interesting electronic kits that are easy to build and that do fun things like making noise and flashing lights. They also offer more sophisticated kits such as RF point-to-point and network data links, microcontroller boards like the Arduino, and interface “Shields” for the Arduinao boards. The company recently sponsored its third annual autonomous vehicle competition in Boulder. They have a lot of fun and I think they’re excellent marketers. Which is why they were at the Maker Faire.

 

Sparkfun brought its kits, but their booth at the Maker Faire was far from a simple vendor’s booth. They set up about 30 soldering stations so that people could buy a kit, solder it up, and get it working on the spot. There was a line of people waiting to sit at a station, so you could see this was a fabulously successful idea. But more than that, you only needed to look at the clientele sitting at the soldering stations to get a quick shot of optimism. Who was sitting there soldering?

 

Kids.

 

Sparkfun Soldering Stations

 

Lots and lots of kids, all intently focused on building their kit. Some were working with parents. Some were working in teams of two and three. Parents from high-tech Silicon Valley were teaching their kids to solder. Sparkfun team members were explaining the secrets of keeping a soldering tip clean. It looked like everyone was having a great time.

 

Sparkfun Vending MachineSparkfun also had another fun sort of detail in its large booth: a vending machine set up to vend entire electronic kits in the bright red cardboard boxes that are as much a part of the company’s brand as the box sporting the Heathkit logo was to the Heath Company.

 

When you see the kind of activity I saw at the Maker Faire, you really can’t help but get inspired by something: the energy of the people, the variety of projects, the sheer scale of the audience. Think about visiting the Maker Faire next year. I guarantee you’ll see something to strike your fancy.

(To see Paul Rako’s blog entry on the Maker Faire, click here. He’s got some great photos from the event in his blog.)

Posted in Low-Power | Tagged , , , | Leave a comment

The DDR4 SDRAM spec and SoC design. What do we know now?

DDR4 SDRAM is coming. JEDEC may not have released the final spec yet but Samsung made the first DDR4 memory chip announcement in January of this year—a 2133MHz device built with a 30nm process technology—and Hynix followed suit in April by announcing a 2400MHz device, also built with a 30nm process technology. Cadence announced a complete DDR4 IP package for SoC designers the same month. (See: “Memory to processors: ‘Without me, you’re nothing.’ DDR4 is on the way.”) Nanya “sort of announced” a DDR4 memory device when it appeared in their most recent quarterly report. So there’s visible momentum for the DDR4 specification already even if JEDEC has yet to roll it out.

 

At today’s EETimes Virtual SoC event Marc Greenberg from Cadence pulled back the veil on DDR4 a bit more. Here’s what he had to say.

 

First, even though we don’t have a final specification, some details are public. DDR4 SDRAMs will have double the maximum capacity of DDR3 SDRAMs. They’ll also have twice the maximum clock frequency. Like DDR3 SDRAMs, DDR4 SDRAMs will have an 8n prefetch (important for cache-line-filling operations) but a DDR4 memory controller must alternate or rotate between SDRAM bank groups for maximum SDRAM performance. That’s a new restriction.

 

The DDR4 I/O voltage has been reduced to 1.2V—DDR3 SDRAMs use 1.5V—so you can expect that the DDR4 SDRAMs will consume less power and energy than DDR3 SDRAMs simply from the lower operating voltage and from the more advanced process technology. However, Greenberg warned that some systems might not realize such savings due to architectural issues. In addition, DDR4 SDRAMs will not use stub-series terminated logic drivers. Instead, they’ll use pseudo-open drain (POD) drivers with Vdd terminations. DDR4 memories also have new features to improve signal integrity. They’ll use data-bit inversion (DBI, more on that below), on-chip parity detection for the command/address bus, and CRC error detection for the data.

 

Because of the higher maximum clock rate, DDR4 memories may permit a pin-count reduction for some SoC designs. How? At double the clock rate, SoC designs can get the same data bandwidth with 16 data bits clocked at 1600MHz (3.2 Gtransfers/sec) as DDR3 designs get with an 800MHz clock rate. However, there’s a design caveat or two. First, SPB (silicon, package, board) design for DDR4-3200 SDRAM is going to be considerably harder than for DDR3-1600 SDRAM. In addition, most memory experts predict that designs with multiple DDR4 DIMMs on each memory channel will not be able to work reliably (or at all) starting with data transfer rate considerably below the 3.2 Gtransfers/sec maximum. Similarly, DIMMs with multiple memory ranks on the board may also fail before the data transfer rate reaches 3.2 Gtransfers/sec.

 

There are a couple of possible solutions to these DDR4 signal-integrity challenges. The first and simplest solution is to allow only one DIMM slot per DDR4 memory channel and allow only single-rank DDR4 DIMMs. The problem with this solution is that it increases the number of SoC memory channels for a given memory capacity and thus drives up the SoC’s pin count, cost, and board-level real estate.

 

No one likes any of those consequences. Not at all. So an alternative solution is the use of load-reduced DIMMs (LRDIMMs) as shown in the following figure.

 

Greenberg - LRDIMM 

Now you may be familiar with RDIMMs (registered DIMMs) used in servers. RDIMMs have an extra register chip soldered to the board that stores and buffers the address/control information from the memory controller and distributes that information to the memory chips on the DIMM. LRDIMMs also buffer the data lines to present a single load to the memory controller even when multiple memory ranks are soldered to the DIMM. RDIMMs and LRDIMMs increase memory latency, so the DDR4 controller must be able to understand and accommodate this kind of buffering.

 

Finally, in the what-we-know category, DDR4 SDRAMs will stay with the 8n prefetch used for DDR3 memories but they will add an extra level of multiplexing so that the memory controller must manage traffic to and from the SDRAM even more carefully than before to extract maximum performance from the device.

 

Here is where we leave the known DDR4 world and enter into the realm of conjecture.

 

Although there are no public details on how DDR4 SRAM’s extra multiplexing level works, GDDR5 memory already employs a bank-grouping scheme with an extra level of multiplexing. GDDR5 memory adds new command timings that differ depending on whether successive commands address the same or different bank groups. These extra timings mean that a DDR4-optimized memory controller must be a bit more complex than the controller used for DDR3 memories. The controller needs better command scheduling and it must deal even more efficiently with high-priority memory commands. The Cadence DDR4 memory controller that was just introduced last month has several new features to accommodate the new complexities of the upcoming DDR4 memory protocol said Greenberg.

 

Here’s a table of enhancements made to the controller’s command queue to accommodate DDR4 requirements and maximize memory-subsystem performance:

 

Greenberg - DDR4 command queue table

 

One key feature here is a new command-prioritizing scheme that prioritizes DDR4 commands when they enter the command queue (like the DDR3 version of this controller) and then reprioritizes the commands when they’re about to exit from the queue, to be issued to the DDR4 memory. That part’s new. This new feature allows high-priority commands to go straight to the head of the command queue when they’re received, but controller can delay the command’s exit from the queue (and the issue of that command to the memory) until the target DDR4 memory page and bank are ready to accept that command. This capability reduces the impact of high-priority commands and helps to maximize memory bandwidth and throughput.

 

Another new controller feature is support for DBI. The following figure illustrates the problem:

 

Greenberg - DBI 

The left side of this figure shows four consecutive data transfers. In the first transfer, all of the data bits are “1.” In the second transfer, they’re all “0.” As a result, all data bits change state from the first transfer to the next. This is a bad thing, especially at multi-GHz transfer rates. The effects of capacitive charge and discharge for all data lines at high speeds creates a problem called simultaneous switching output (SSO), which stresses the DRAM’s power-distribution system on the chip, in the package, and on the board. The next transfer shows a transition from all zeroes to all ones except for one data bit. Because of capacitive coupling, the data lines making the zero-to-one transition become aggressors that try to induce that lone holdout bit to also make the transition even though it does not want to do so. The fourth transfer exhibits a similar problem. All of the bits make a transition but one bit steadfastly wants to make the transition in the opposite direction. Again, it’s up against a number of aggressors.

 

The way to solve this problem, implemented in GDDR5, is to add a DBI bit. The right side of the figure shown above illustrates the same state transitions, but with the addition of a DBI bit. When asserted, the DBI bit indicates that the data bus should be inverted. The inversion state can change from transfer to transfer and it is changed to minimize the number of data-bit state changes and thus minimize the I/O switching current and the number of aggressor bits from one transfer to the next. Again, this is how it’s done for GDDR5 memory. The DBI method used for DDR4 SDRAM is not yet public.

 

With these and other changes to the memory interface specification, SoC designers will need a new tool set to add DDR4 memory interfaces to their designs. That’s why Cadence has introduced a DDR4 IP package and design kit now—because SoC designers preparing early designs that incorporate DDR4 memory need to start now. The Cadence DDR4 offerings include a DDR4-enhanced memory controller (based on the existing, configurable SDRAM controller Cadence obtained when it purchased Denali Software last year), hard and soft DDR4 PHYs, design kits for board- and package-level DDR4 design, and verification IP and memory models for DDR4 memory.

 

Over the next several years, we will see DDR4 SDRAM gradually enter and then take over the SDRAM memory market. It’s happened with the DDR, DDR2, and DDR3 SDRAM generations and there’s little reason to believe this won’t happen with DDR4 as well. Greenberg said to expect to see designs from early adopters who need maximum memory subsystem possible performance in 2013, early majority adoption for high-performance (but not the most bleeding-edge) designs in 2014, majority adoption in desktop and laptop PCs in 2015, and then pretty much total market penetration all the way down to low-cost devices by 2016.

 

Time to get going, isn’t it?

Posted in DDR4, DRAM, Low-Power, SOC | Tagged , , , | Leave a comment

Sometimes, all you need is tape: 3M Uniformity Tape solves problem of uneven LED edge lighting

We’ve all seem LCD displays with LED backlights that provide non-uniform illumination. That’s one reason why fluorescent backlights have held on for as long as they have. It’s not like anyone enjoys designing in the high-voltage power supply for the fluorescent bulbs. It’s because the light is truly better. One way to get more uniform lighting from LEDs is to use more LEDs with tighter spacing between LEDs. However, there are both cost and power penalties for solving uneven illumination problems with this approach.

 

3M now offers another alternative to solving this problem: 3M Uniformity Tape. It’s a clear tape with adhesive backing to stick along the edge of an LCD. The tape has a spatially uniform optical pattern embossed on it that spreads the light from each LED, making the resulting edge lighting more uniform. The montage image below shows the results on a large flat-panel display.

 

3M Uniformity tape

 

The top image in the montage shows flat-panel LCDs without (left) and with (right) the 3M Uniformity tape applied. The second (middle) image show an enlargement of the bottom edge of the LCD panel without the tape. This image shows the “head-lighting” effect. You can see bright cones where the LEDs are and darker areas between the LEDs.

The bottom image of the montage shows the same display with the 3M Uniformity Tape applied. The edge lighting is far more uniform with no increase in the number of LEDs and no increase in edge-lighting power.

Posted in Low-Power | Tagged , , | Leave a comment

Future cars: The word from GM at IDC’s Smart Technology World conference

Last month, IDC ran a really interesting conference called Smart Technology World. One unusual presentation was about energy use for transportation, given by Byron Shaw, Managing Director of Advanced Technology at General Motors of Silicon Valley. That’s right, GM has a toehold in Silicon Valley. Far, far away from Detroit. Shaw gave such an interesting presentation that I’m going to review it here in depth.

Shaw started with two photo montages. The first was filled with images of GM’s famously styled vehicles from the 1950s and 1960s: Cadillacs, Corvettes, GTOs, and (gulp) Buicks. These images are from the days when cars were rolling metal sculptures on basically similar chassis.

 

GM Cars of the mid 20th century

GM Cars of the mid 20th century

 

Then came the dark ages for GM, said Shaw. The time when GM was “stylistically challenged.” Shaw displayed another photo montage of GM’s stylistic creations of the 1970s and 1980s. To save you from feeling slightly ill, I’ll not reproduce that montage here. You really don’t need to remember those days.

 

Finally, said Shaw, we come to today and the future. It is and will be a world quite different from that of the 1950s and 1960s that spawned the land yachts and muscle cars of the mid 20th century. By 2030, said Shaw, 60% of the world’s population will live in cities and 80% of the world’s wealth will be concentrated in those cities. As a result, you can expect even more traffic congestion that we have today. (Although it’s hard to envision what’s more congested than the gridlock we often see in today’s large cities.) Parking will also become an even bigger problem.

 

Although petroleum products supply 35% of the world’s energy needs at the moment, they supply 96% of the world’s transportation energy needs. And petroleum is not a renewable resource. Although we actually have more proven or estimated reserves today than in 1980, as the oil companies like to point out, we also use petroleum far faster today than 30 years ago. (Somehow, that part of reality doesn’t get emphasized as much as the first part about the reserves.)

 

The rise of the middle class in emerging markets coupled with unchecked expansion of transportation as it exists today means that we must reinvent “personal mobility” for the 21st century, continued Shaw. The challenges are many including energy, safety, congestion, materials, and manufacturing.

 

Electricity is the number one candidate for the energy needs of future vehicles. However, the energy needs of all vehicles are not equal. Heavy hauling and long-distance transport are likely to continue to depend on chemical energy storage: petroleum products and alternative fuels such as ethanol, biodiesel, compressed natural gas, liquefied petroleum gas. The lighter the vehicle load and the shorter the distance traveled, the more likely that electricity can supply all or most of a vehicle’s energy needs. Candidate energy technologies to supply this electricity include fuel cells, energy-recovery, and batteries.

 

Electrical power is an attractive alternative to chemical energy because of its relative energy efficiency, said Shaw. The energy efficiency of internal-combustion engines isn’t great and it’s not going to get a whole lot better. By throwing a ton of technology at existing internal combustion engines and power trains—such as cam phasing, cylinder deactivation, flexible valve actuation, stratified-charge ignition, friction reduction, stop/start augmentation, and regenerative braking—the best that GM’s technologists hope to achieve for internal-combustion engines is an additional 20-30% efficiency improvement.

 

Stored electrical power is more attractive, except for those pesky batteries. GM has been working on battery technology for transportation since at least 1997, back in the days of the EV-1 electric vehicle. There are lots of alternative battery chemistries to investigate and all have problems scaling to automotive sales volumes at commercially viable prices, said Shaw.

 

One way to improve energy efficiency for personal transport is to make the vehicle lighter. A lot lighter. However, light cars don’t withstand collisions as well and therefore subject passengers to more injury in a crash. So you have to design cars that won’t crash, said Shaw. It seems so logical, the way he said it. However, to make cars that won’t crash, you need drivers that don’t get distracted or fall asleep.

 

You need autonomous vehicles.

 

Although autonomous vehicles have been a favorite science-fiction topic since the 1940s and 1950s, the track record for autonomous vehicles isn’t great—so far. Today, some high-end cars can park themselves but only a few experimental vehicles can drive themselves. Here’s a list of technologies you need to create a self-driving vehicle:

 

  • Collision Avoidance (Steering)
  • Vehicle-to-Vehicle Communication
  • Vehicle-to-Infrastructure Communication
  • Steer-by-Wire
  • Lane Keeping
  • Forward Collision Avoidance (Braking)
  • Driver Performance Monitor
  • Lane Sensing/Warning
  • Active Roll Control
  • Forward Collision Warning
  • Adaptive Cruise Control
  • Vision Enhancement
  • Near Obstacle Detection
  • Electronic Stability Control
  • Adaptive Variable-Effort Steering
  • Semi-Active Suspension
  • Traction Control
  • Anti-Lock Braking Systems

 

That’s a lot of technologies and we’re currently about half-way up the list with commercial technologies.

 

Now there are vehicles that drive themselves and do so successfully. GM’s “Boss” is one such example and it won the 2007 DARPA Urban Challenge. However, take a look at the vehicle to get an idea of just how many sensors “Boss” needed to be the boss of the road:

 

Autonomous GM Boss, competitor in the 2007 DARPA Urban Challenge

Autonomous GM Boss, winner of the 2007 DARPA Urban Challenge

 

There are a lot of sensors on the roof of that car. That’s not to say that we can’t add sensors to vehicles sold commercially. Ultrasonic distance sensors are quickly becoming commonplace (look for the little donuts in the rear plastic car bumpers) and rear-view video cameras appear to be on their way to becoming mandatory in the near future because we Boomers can’t turn our necks like we used to. But a self-driving vehicle needs 360-degree sensing capability, as shown below:

 

Autonomous vehicle sensors 

That’s a lot of sensors that will require significant cost reduction to become commercially attractive for high-volume automotive applications. That’s also a lot of opportunity.

 

Next, Shaw turned to the congestion problem. If you’re making autonomous vehicles that are light to save energy and autonomous to prevent accidents, then you can make the vehicles small too. GM has done exactly what with the experimental EN-V program conducted jointly with the 2-wheeled electric scooter company Segway. The resulting EN-V vehicle prototypes are enclosed, overgrown Segway scooters.

 

EN-V

 

Although they look different, the EN-V vehicles are all based on the same propulsion skateboard: a 2-wheeled platform based on now-familiar Segway technology. One difference between the Segway scooters and the EN-V skateboard platforms (besides scale) is that the EN-V platforms have a sliding mechanism that projects a couple of caster wheels so that the vehicle can rest in a stable position without the need for power to balance car when parked. By the way, EN-Vs can drop you off and then park themselves. They can come back to pick you up too.

 

EN-V skateboard 

Finally, once you’re reducing the size of the car and making it “crash-proof,” you can start to consider a raft of alternative materials. This isn’t a new quest. Shaw showed two pie charts depicting the material used in a vehicle in 1977 versus today. In 1977, low-carbon ferrous materials and high-strength steel made up 74% of the vehicle by weight. Today, that’s down to 63%. More advanced materials can reduce the weight of a vehicle by another 35-60% compared to steel, said Shaw.

 

Vehicle Materials 

Towards the end of his talk, Shaw projected an image showing all of the applications for electronics in the modern vehicle. It’s a long, long list as you can see:

 

Vehicle Electronics

 

In the end, Shaw made it clear that future personal vehicles would be at least as much about technology as they are about sculpture. But residing in Silicon Valley, he would say that, wouldn’t he?

Posted in Low-Power | Tagged , , | 1 Comment