Multicore server, PC, and embedded designs push memory power, drive use of advanced DDR3 SDRAMs

July 2, 2010 on 9:32 pm | In DRAM, Design, Green Design, Low-Power, SDRAM | 4 Comments

Systems designers try all sorts of methods to reduce system power consumption. For years, we’ve relied on circuit tricks and have been reducing logic supply levels from the 5V power supplies that were so common in from the 1970s and throughout the 1980s to the 1V levels we now employ with today’s advanced logic chips. Memory supply voltages have dropped as well. For example, the original DDR SDRAMs had a 2.5V supply voltage and DDR2 SDDRAM employs 1.8V supply voltage. That’s nearly double today’s SOC, processor, and microcontroller core voltages. The reason for this lag in supply-voltage reduction is that memory vendors prefer to stay in the economic sweet spot for IC lithography as opposed to logic design which prefers to stay on or near the bleeding edge. Consequently, memory’s share of a system’s power-consumption pie has been rising and there really hasn’t been much attention paid to reducing memory power consumption. The advent of DDR3 SDRAM provides another opportunity to cut memory power through further reductions in memory supply voltage and coupled with advanced process technology, Samsung has attained a supply voltage of 1.35V for its 40nm DDR3 SDRAMs. This drop in memory supply voltage can produce a 38% cut in server power consumption, according to Samsung.

 

Performance isn’t really the engine that drives DDR3 adoption. The real driver is bandwidth and there are two design trends that force the quest for ever-increasing amounts of memory bandwidth. The first such design trend is the wholesale adoption of homogeneous and heterogeneous multicore architectures. As an industry, we’ve embraced the use of multiple processor cores as a solution to the death of Dennard scaling. Although most people attribute the increase in operating frequency and the decrease in per-transistor power consumption through lithographic shrinks to Moore’s Law, which Gordon Moore codified in an article he published in 1965 while working at Fairchild Semiconductor, that attribution is not factually correct. Moore simply predicted that the number of transistors on a chip would grow exponentially over time as lithographies shrank. It was IBM’s Robert Dennard who observed in 1974 that lithographic advances in IC manufacturing also consistently produced faster transistors that consumed less power. For decades, we’ve used Dennard scaling to produce faster and faster processors (while attributing the improvements to Moore’s Law).

 

The semiconductor industry has poured billions of dollars into keeping Moore’s Law alive but Dennard scaling died at 90nm. We continue to get more transistors on a chip with each advance in IC lithographic scaling, but the transistors no longer get appreciably faster, so the MHz wars have ended. Worse, pushing transistors to their performance limit now produces leaky transistors that dissipate as much power when off as when on. We now recognize that the way to get more performance is to use the transistor bounty to increase the number of processors and to distribute the work load across these processors without striving for multi-GHz clock rates.

 

With all of these on-chip processors executing code and accessing data on a multicore chip, system designers must find a way to make large amounts of inexpensive memory available to these processors. For the last decade, the most cost effective way to provide a system with large amounts of low-cost memory has been the SDRAM. The classic system design teams a multicore processor or SOC with one or more SDRAM channels. As memory bandwidth needs rise, the SDRAMs’ per-channel transfer rate and the number of SDRAM channels used has increased. DDR transfer rate have now reached and exceeded 1600 Mtransfers/sec and it’s not uncommon to find server processors with three SDRAM channels, for example. Because of the constant thirst for memory bandwidth, DDR3 SDRAM sales exceeded DDR2 SDRAM sales beginning with the first quarter of 2010, according to the leading SDRAM vendor Samsung, and the company expects DDR2’s share of SDRAM market sales to drop below 20% by the end of the year.

 

When you move that much data between a processor and memory, you’re likely to dissipate a considerable amount of power and indeed, memory power consumption has been on the rise. Lowering memory power consumption can substantially lower system-level power consumption. For example, states Samsung, going to 40nm, 2-Gbit DDR3 SDRAM with a 1.35V power supply can cut a server’s memory power consumption by 80% compared to the equivalent number of storage bits implemented with 60nm, 1-Gbit, DDR2 SDRAMs running at 1.8V and can even cut memory power consumption by 38% compared to equal-sized memory arrays consisting of 60nm, 1-Gbit, DDR2 SDRAMs running at 1.5V.

As a result, according to Samsung’s measurements, 40nm, 2-Gbit DDR3 SDRAMs running at 1.35V can cut power by an astonishing 38% at the system level for servers. To put that into economic perspective, says Samsung, the use of 1.35V DDR3 SDRAMs in a server can save 2564 kilowatt-hours per year. Samsung estimates that there will be 32 million servers operating in data centers worldwide by the end of this year. If they all were equipped with 1.35V DDR3 memory, the annual power consumption would be reduced by 82 terawatt-hours, worth an estimated $28 billion. That kind of money gets any data-center manager’s attention.

The same sort of energy savings apply to any multicore system whether it’s a server, a PC, or an embedded system based on a heterogeneous multicore processor design.

SPMT engulfs LPDDR2 standard, making adoption a no-brainer. Meanwhile Marvell jumps on the bandwagon.

June 7, 2010 on 9:00 am | In DRAM, Design, LPDDR2, Low-Power, SDRAM, SOC | No Comments

SPMT LogoAn insidious power problem has slowly crept up on embedded-system designers. While most of us were firmly focused on the power dissipation of our ever-expanding logic designs with their increasing number of processor cores in multicore designs, we mostly ignored the huge leaps in power consumption being caused by the rapid growth in memory size and big jumps in memory-access speeds and memory bandwidth. To cut memory costs, most high-end mobile and embedded designs today employ one high-bandwidth SDRAM device or array to satisfy all of a system’s memory requirements. Yet we think very little about the power impact of hooking big DDR SDRAMs up to our SOCs and ASICs—and these SDRAMs run at clock rates measured in hundreds of MHz or GHz, at transfer rates that are double the clock rate. It takes some real power to sling bits between a processor and SDRAM at transfer rates approaching or exceeding 1 Gtransfers/sec and even though the supply and I/O voltages have been dropping on SDRAM keeping memory power somewhat in check (only somewhat), wide DDR2 and DDR3 memory interfaces that deliver the highest bandwidths may now consume Watts of power. Watts! This simply cannot stand.

Not coincidentally, that’s the position of the SPMT (Serial Port Memory Technology) Consortium, which has been developing a low-power, high-performance memory interface for mobile and embedded applications. The low-power aspect arises primarily from SPMT’s use of low-voltage differential signaling (LVDS), which transfers information using 150 mV differential signal swings instead of single-ended, ground-referenced signal swings of more than a volt. The high-performance aspect arises from the use of multi-Gbits/sec transfer rates per SPMT data lane.

But there’s been a big, ugly fly in the SPMT ointment. Memory vendors know that more than 80% of all DRAMs go into PCs and servers and they stick with memory designs—and memory interfaces in particular—that best suit the needs of PC and server designers. Today, that means DDR2 memory, which is the mainstream DRAM technology, but the industry is quickly switching to DDR3. DDR4 is yet undefined but it too is a rapidly approaching memory-interface specification that will most assuredly ”fix” the problems we have with DDR3. These PC- and server-centric, high-speed parallel SDRAM interfaces burn a lot of power to deliver high bandwidth, which creates the niche opportunity that the SPMT Consortium has been trying to fill for mobile and embedded designs. Unfortunately, DDR memory has such a huge presence in the DRAM arena that there’s been little chance for any other interface approach to take hold.

Until now.

Today, the SPMT Consortium announced a major revision to the SPMT standard that may well spell the difference between an interesting technical exercise and an immensely successful new memory-interface standard. Previously, the SPMT specification multiplexed read/write commands and the data on the same unidirectional LVDS lanes. Doing so somewhat reduced the throughput on the data lines but it also reduced the memory pin count because SPMT memory didn’t need separate control/address (CA) lines. The reduced pin count was considered a major benefit that reduced the cost of packaged SPMT memory devices. The new SPMT specification, which completely supersedes the prior specification, does away with this control/address/data multiplexing in favor of using the same CA signal and pin definitions that LPDDR2 memory uses to carry control and address signaling.

This is a significant and important change to the SPMT spec because LPDDR2 is already poised to take over the mobile and embedded design spaces. (See LPDDR2: The new mainstream memory for embedded and mobile applications? on Denali Software’s Memory Report blog.) Further, four pairs of unidirectional SPMT data lanes now precisely overlap the 16 bidirectional data lines of a x16 LPDDR2 memory, making it possible to build one memory chip that can support both LPDDR2 and SPMT protocols using the same set of pins. What that means is that with only a few changes to the memory controller and memory PHY, an SOC or embedded processor can accommodate both LPDDR2 and SPMT memory using exactly the same set of interface pins. It also means that SDRAMs designed to the new SPMT specification can be used as LPDDR2 SDRAMs, ensuring a ready market when commercial SPMT SDRAMs first hit the market near the end of 2011—assuming things go according to the SPMT Consortium’s current plans.

So where’s the power advantage? It kicks in after the required SDRAM transfer rate hits a critical level. For example, the SPMT Consortium’s data estimates that a x32 LPDDR2 memory interface operating at 400MHz dissipates about 180mW while providing 3.2 Gbytes/sec of peak data throughput over 32 data lines (800 Gbits/sec/pin) and 360mW at a peak data throughput of 6.4 Gbytes/sec over 64 data lines. (Regular old DDR2 and DDR3 SDRAM interfaces would consume a lot more power than this.) By contrast, the SPMT interface dissipates 180mW while transferring 6.4 Gbytes/sec over eight data lanes (8 Gbits/sec/lane) and 360mW when transferring 12.8 Gbytes/sec over 16 data lanes. So the SPMT interface appears to be about twice as power efficient as the LPDDR2 interface at higher data rates, which LPDDR2 memory can’t attain without resorting to a very wide data bus and using several memory devices in the bargain. However the LPDDR2 parallel interface has a power advantage over the SPMT serial interface at lower transfer rates. So LPDDR2 memory might suffice for today’s embedded and mobile applications and might also suffice for low-activity modes in future applications.

The graph below, supplied by SPMT, tells the story. The graph shows that at low data rates, LPDDR2 memory dissipates less power than SPMT memory—largely because of the DLL integrated into SPMT memory. (DLLs consume non-negligable amounts of power and although DDR2 and DDR3 memories incorporate DLLs, LPDDR2 memory does not.) So the SPMT Consortium has done something very smart and has developed an integrated mode-switching mechanism called SerialSwitch, which allows an SDRAM controller to programmably shift an SPMT memory between its LPDDR2 and SPMT serial interface modes using a control register built into the memory device.

 

 Memory Crossover

 

Mobile phone vendors and other embedded/mobile system designers know that video will be heavily used in many future products and they also know that memory transfer-rate and bandwidth requirements will only go up as a result. SPMT’s SerialSwitch mechanism provides a way for one memory device to support both low- and high-bandwidth operating modes with an appropriate level of power consumption depending on a system’s instantaneous bandwidth requirements. By definition, all commercial SPMT memories will incorporate the SerialSwitch feature. The following figure shows how the SPMT SerialSwitch mechanism works.

 

SerialSwitch

 

During Tg, the figure shows SPMT memory operating as a x16 LPDDR2 memory. Note that the data lines (DQ/HS) employ full-voltage, single-ended signaling in this mode. During time Tg, the memory’s DLL is off, which saves power. At the beginning of time Th, the system determines that more bandwidth is or soon will be needed, so it directs the memory controller to send a command to the memory to spin up the DLL in preparation for switching to SPMT serial mode. That process takes 5 to 10 microseconds. During this time, the memory continues to operate as an LPDDR2 memory so the DLL spin-up time is hidden and doesn’t interfere with system operation but power consumption will rise. Once the SPMT memory’s DLL has spun up, at time Ti, the system’s memory controller commands the SPMT memory to switch to serial communications mode. This transition takes a maximum of 10 clock cycles. After that and during time Tj in the figure above, the memory operates in SPMT serial-communications mode. Note that the data lines have switched to LVDS signaling, as shown in the figure. LVDS signaling reduces the memory interface’s power consumption. At some later time depending on system requirements, the memory controller can power down the memory (shown as time Tk) or switch back to LPDDR2 mode (the period following the period that starts at time Tk in the above figure). Don’t be misled by this figure by the way—SPMT memory need not pass through the power-down mode to switch from SMPT-serial communications to LPDDR2 mode.

Systems can use SPMT memory in LPDDR2 mode at boot time and whenever the system is operating in a mode with low memory-bandwidth requirements. The system can quickly switch to the LVDS SPMT-serial mode whenever it requires higher memory data rates—for example when video is activated, when multiple operating modes are in use simultaneously, or when multiple processors are running in a multicore device. The SPMT Consortium estimates that the optimum crossover point between LPDDR2 and SPMT serial interface data rates for a x16/8-lane LPDDR2/SPMT-serial memory device is around 1.6 Gbytes/sec based on energy considerations.

By subsuming the LPDDR2 standard and making SPMT memories wholly superset compatible with LPDDR2 memories, I think the SPMT consortium has significantly raised the likelihood of adoption when commercial SPMT memories finally appear late next year. I also think the likelihood of such memories appearing is pretty high considering that the top two DRAM vendors, Samsung and Hynix, are members of the SPMT Consortium. Together, Samsung and Hynix have a bit more than half of the overall DRAM market according to the latest stats from the DRAMeXchange (http://j.mp/aNaNiY).

On the embedded processor side of the equation, Marvel has announced that it too has joined the consortium, which further improves SPMT’s chances of success. In fact, Marvell supplied a canned quote for the SPMT Consortium’s press release with one of the strongest statements I’ve seen in such press releases, so I am suspending my usual cynicism about such quotes and reproduce it here:

“Today’s mobile DRAM technology is geared to support the bandwidth needs of single core processors. As devices evolve to integrate multi-core CPU, multi shader 3D graphic engines at multi-GigaHertz speeds, it’s clear that DRAM will be the single performance bottleneck, especially for handheld systems where power budget is a major constraint,” said Dr. Sehat Sutardja, chairman, president and chief executive officer at Marvell. “Marvell is joining the SPMT Consortium to actively promote Serial Port Memory Technology as an industry standard and address the immediate needs of the industry. We encourage other companies active in the sector to join us in our mission.”

Strong backing like this from a market maker like Marvell can only help SPMT’s cause. Whether or not SPMT actually reaches critical mass is something that we’ll all be watching as events unfold in the hotly competitive memory arena over the next 18 to 24 months.

Xilinx redefines the high-end microcontroller with its ARM-based Extensible Processing Platform – Part 1

May 1, 2010 on 7:10 pm | In DRAM, Design, FPGA, Low-Power, SOC | No Comments

Last week at the Embedded Systems Conference (ESC) held in San Jose, California, Xilinx disclosed additional information about its upcoming Extensible Processing Platform (EPP), which I previously discussed in a February 1 blog entry written just after RTECC (the Real Time Embedded Computing Conference, see Designing Low-Power Systems with FPGAs, Part 2). This past week at a press conference, Xilinx’s Senior VP of Worldwide Marketing and Business Development Vin Ratford again spoke of the upcoming processor-centric devices Xilinx plans to introduce next year, but this time he provided far more detail. As promised, the devices fuse features of a high-end microcontroller (hard-core implementations of a 32-bit processor, memory, and I/O) with an FPGA fabric. But wait, you say, haven’t both Xilinx and Altera (and other FPGA vendors) tried this before? Yes, they have, with uninspiring results. However, I submit that Xilinx’s EPP is substantially different and it stands a very good chance of capturing significant market share from microcontrollers and from discrete processors. It may also be very attractive to design teams considering the development of certain types of SOCs. Consequently, the Xilinx EPP family may well become the family of high-volume parts Xilinx wants to have in its product catalog. Ratford provided so much information in his ESC announcement that I’ll need multiple blog entries to cover it all. In this first entry, I’ll describe what Xilinx’s EPP is and I’ll cover some of the thinking behind the architecture; In the second entry, I’ll describe some case studies that illustrate why this component family might be very attractive for a certain class of embedded product—because it promises lower parts count, lower cost, and higher performance with lower power consumption. Please understand that Xilinx stopped short of announcing actual products. Ratford described an architecture that will be used to produce a product family with actual products starting to appear next year.

 There are two major components to Xilinx’s EPP: a hard-wired, high-end, microcontroller-like block and a connected FPGA fabric based on Xilinx’s 28nm unified FPGA logic-cell design as shown in the diagram below.

 

Xilinx EPP Block Diagram

Xilinx EPP Block Diagram

 

 

First, let’s look at the hard-wired portion. It’s well known that processors don’t run very fast when implemented with FPGAs. The reason mostly revolves around the wiring congestion associated with the large register files of 32-bit RISC processors. Wiring congestion translates into “slow” and you can figure on giving up 50-75% or more of the processor’s maximum clock rate in a given process technology when comparing a synthesized ASIC implementation against a synthesized FPGA implementation. Hand optimization can reclaim some of that speed but if you’re planning on using a standard processor architecture anyway, it makes perfect sense to implement the processor on the FPGA as a hard core using a standard ASIC synthesis flow. That way, you get the full speed of the IC process technology along with the full logic density and therefore a much lower silicon cost.

Xilinx has chosen ARM’s Cortex-A9 32-bit RISC processor core for the EPP but has gone a step farther by implementing a dual-core version of this processor. That choice immediately puts the Xilinx EPP family at the high-end of the microcontroller spectrum. First, there are two 32-bit processor cores. Second, a Cortex-A9 processor can run at 2 GHz in TSMC’s 40nm, high-performance process technology. That’s one fast processor—much faster that many embedded applications require. A dual-core version, as is employed in Xilinx’s EPP family, is faster still.

In choosing a standard processor core from ARM’s extremely successful stable of processors, Xilinx has plugged directly into a broad community of embedded software developers. In other words, choosing the widely used ARM architecture telegraphs Xilinx’s recognition that embedded software development is now the largest and most expensive part of any high-end embedded project. In many such projects, software developers often outnumber hardware developers by 10:1. In announcing the EPP, Xilinx shows that it fully recognizes the need to make the software development team happy first. The company’s selection of an ARM processor core also leverages the associated large and familiar development-tool set, the good selection of operating systems, and the extended ecosystem that goes with the ARM architecture’s large and growing market dominance in the embedded space. All of these factors make the ARM processor very attractive to embedded development teams.

To the dual-core ARM Cortex-A9 processor, Xilinx has added a number of hard-core peripherals including SRAM caches, timers, interrupt controllers, switches, memory controllers, and commonly used I/O peripherals certain to be useful for many high-end embedded designs. Because these additional blocks are all hard-core implementations, they too take little room on the chip and consume much less power than they’d need if implemented in an FPGA fabric. Note that the EPP chips will contain enough SRAM for caches and small scratchpads however bulk memory, generally implemented with DRAM, will be off-chip. Consequently, the EPP architecture includes hard-core DRAM controllers to manage off-chip memory. Ratford’s talk at ESC did not elaborate on the type of memory the on-chip controller can handle however DDR2, DDR3 or both DDR2 and DDR3 would probably be a good guess, considering the high-end nature of the EPP family. The targeted applications will need a lot of memory and DDR2 and DDR3 DRAM are now the best choices in terms of cost/bit.

Key to the software-friendly approach Xilinx is taking with the EPP, the architecture boots code upon power up just like a microcontroller. Only then is the FPGA fabric configured. This approach makes the EPP look very familiar to software developers who are not at all comfortable with writing code for a fluid, amorphous system that’s not well-defined when power comes up. The FPGA vendors spent a lot of money on reconfigurable architectures learning this lesson. In addition, HLL compilers don’t much care for undefined hardware either—undefined hardware just doesn’t fit the standard software-programming models. So the implementation of a complete, hard-wired microcontroller within the EPP cuts out a lot of that old unfamiliar strangeness associated with previous attempts to marry hard processor cores and FPGA fabrics.

Speaking of the FPGA fabric, Xilinx will be using the unified 28nm FPGA fabric in the EPP. Xilinx developed this fabric for its next-generation Spartan and Virtex FPGAs. (If you want more details about this FPGA fabric, take a look at the White Paper here. According to Ratford, Xilinx’s Virtex and Spartan FPGAs will both employ this fabric, which is the first time that Xilinx has used the same FPGA fabric for its high-performance and its low-cost FPGA product families. Using the same fabric for the two Xilinx FPGA product lines and for the EPP means that Xilinx need only develop one set of hardware-design tools for the 28nm node and it also means that hardware designers only need to learn one set of tools as well.

The EPP’s hard-core embedded microcontroller communicates with the on-chip FPGA fabric using ARM’s newly announced AMBA 4/AXI bus. Ratford said at RTECC and repeated again at ESC that Xilinx worked with ARM to develop a version of this new bus specifically for FPGA use but he’s not provided details. The diagram of the EPP Ratford projected (reproduced above) shows multiple buses connecting the EPP’s hard-core embedded microcontroller and the on-chip FPGA fabric. Although Ratford provided no additional details, I plan to write a third blog entry discussing possible ways of optimally connecting the processor cores to the FPGA fabric. In the next installment of this blog, I’ll discuss some specific case studies Ratford covered in his ESC presentation that show how the EPP can reduce the parts count, cost, and the power consumption of high-end embedded systems.

(You can find a White Paper describing the Xilinx EPP here.)

The Surprising Popularity Rise of On-Chip Memory

November 8, 2009 on 4:53 pm | In CMOS, DRAM, Design, Low-Power, SOC | No Comments

I attended the 7th International SOC Conference in Newport Beach last week and several of the speakers addressed issues relating to SOC and system power. One of these speakers was Bob Madge, Director of Technology Marketing at LSI Corp (formerly LSI Logic). In case you didn’t know, LSI has been evolving its business from its original focus on developing ASICs and SOCs for customers to a focus on programmable ASSPs (application-specific standard products) and custom silicon specifically aimed at the networking and storage markets. Madge’s first slide explained the reasoning: annual storage-capacity growth is a projected 49% per year and annual network-traffic growth is a projected 42% per year. Good growth numbers for a business to target.

To deliver competitive parts, LSI stays on top of IC design and manufacturing trends. One trend that caught LSI and the semiconductor industry by surprise has been the rapid growth in on-chip memory use. On-chip memory makes sense for two reasons. First and foremost, it provides better performance than off-chip memory because putting memory on the chip along with the logic circuitry eliminates two sets of off-chip drivers and receivers, which reduces power consumption for memory transactions. Second, on-chip logic can communicate with on-chip memory over extremely wide memory interfaces—pin count is not an issue if you stay on the chip. A wide memory interface reduces the number of transfers needed to move a given amount of data and lower transfer rates cut power as well.

However, merging logic and memory on one piece of silicon has always presented design and manufacturing issues. Bulk, high-volume, high-capacity memory manufacturing processes differ from logic manufacturing processes because the two processes must optimize different parameters. Memory processes emphasize low cost manufacturing and tend to have fewer metal layers than logic processes, which emphasize speed and on-chip connectivity. “Frequency, density, and power are always a challenge,” said Madge.

For example:

  • Today’s network routers use 400-Mbit buffers. Switches need 512 Mbits of storage or more. In the future, said Madge, these devices will need as much as 1 Gbit of on-chip memory in multiple configurations.
  • IP controllers used in network storage applications currently use 60 to 100 Mbits of cache memory. In the future, these devices will need 200 Mbits of memory or more.
  • Media processors currently use 60 to 80 Mbits of memory running at 500 MHz. Future needs will be on the order of 100 to 200 Mbits of memory running at 600 to 700 MHz.

All of these examples demonstrate the coming challenges for fast, dense, on-chip memory.

LSI is looking at embedded (on-chip) DRAM and the use of 3D, through-silicon via technology for chip-to-chip stacking as ways of increasing the amount of on-chip memory. The company is doing this because it sees a continued and rapid rise in the amount of on-chip memory needed for its networking and storage chips.

Embedded DRAM cuts power because it uses a 1T (one-transistor) cell, which obviously improves density over a 4T or 6T static RAM cell. However, embedded DRAM also reduces static and dynamic power consumption because the fewer transistors use less power and leak less current than the greater number of transistors required to build the same amount of SRAM memory.

LSI is also investigating other power-saving features that become possible when you move memory onto the logic chip including a sleep mode for the memory, dual power rails, and low-voltage operation. However, said Madge, the biggest benefit appears to be a move to embedded DRAM because of the huge reduction in transistor counts.

State-of-the-Art in Low-Power Memory: Denali’s MemCon

June 30, 2009 on 4:06 pm | In DRAM, Flash, LPDDR, LPDDR2, Low-Power, SDRAM | No Comments

Need gobs of cheap RAM? Need it to operate at the lowest possible power? This blog’s for you.

I attended Denali’s ninth annual MemCon conference a few days ago. It was three days of intensive discussion about the state of the art in DRAM and Flash memory-the two mainstay memory technologies in use today. Surprisingly, NAND Flash memory is now the low-cost leader in terms of cost per bit, having passed by DRAM a few years ago. However, DRAM remains the mainstay memory for the vast number of designs and DDR SDRAM now rules as it becomes easier and easier to find microcontrollers and FPGAs with direct DDR interfaces and DDR controller and PHY IP for SOCs.

Memory power consumption as a percentage of system power consumption has grown with the rapid growth of memory-array size in all sorts of systems. A real eye opener at MemCon 09 was a chart on the power consumption of memory in server systems, where the large server memory arrays consume as much as 40% of the system power and the processor now consumes a mere 28%. Why is that important? It’s important because big server users like Google pay tens of millions of dollars each year in electrical power costs to run and to cool their server farms and 40% of a few tens of millions of dollars is, well, tens of millions of dollars.

Note that the current share-of-power percentages for servers don’t make processor power consumption unimportant-28% is still a big number-but the clear message is that server designers must now be far more concerned with memory power consumption because it’s a big part of the power puzzle. As embedded designs adopt large DDR memory DIMMs for bulk memory, the same sort of situation applies. Embedded designers must also be aware of the way their DRAM choices affect system power.

Marc Greenberg, Denali’s Director of Technical Marketing, gave a 2-hour tutorial on low-power DDR SDRAM on the first day of MemCon09. He threw up one slide that does a terrific job of putting all of the low-power SDRAM parts in perspective:

Low-Power DDR Selection Criteria

Low-Power DDR Selection Criteria

This slide shows the optimum type of SDRAM to use based on your design’s memory-capacity and speed requirements. I like this slide a lot because it helps you to pick from the wide array of DDR types and speeds. However, it seems that your selection job is about to become a lot simpler. Look what happens to the chart when you add in LPDDR2 memory:

Low-Power DDR Selection Criteria with LPDDR2

Low-Power DDR Selection Criteria with LPDDR2

LPDDR2 memory delivers the low-power goods by operating the SDRAM’s memory core and I/O at 1.2V, which is what you need to do to substantially cut memory power these days. Several manufacturers have announced LPDDR parts with I/O speeds to 400MHz/DDR800 and spec sheets for these parts are beginning to appear on DRAM vendor Web sites. LPDDR2 vendors with announced parts include Elpida, Hynix, Micron, and Nanya. Note that there’s also the possibility for existing LPDDR1 vendors to create parts that operate at 1.2V for similar power savings and that some of the soon-to-be-seen DDR3 parts may operate at 1.35V, which qualify them as low-power DRAMS.

In addition, there’s a spec for LPDDR2 non-volatile memory (LPDDR2-NVM) to allow LPDDR2 DRAM and Flash to be intermixed. The big advantage of Flash LPDDR2 is the very low standby power but Flash memory exhibits both read and write wear-out failure, so DRAM isn’t yet obsolete and you’ll likely need both memory types in your system design. The LPDDR2-NVM spec allows for I/O speeds to 533MHz/DDR1066 operation, but Greenberg says that the initial LPDDR2-NVM parts are likely to be slower than the maximum.

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^