Image
image
image
image


Design Articles

Architectural Issues for Power Gating

This article discusses some of the architectural issues involved in implementing power-gating designs. In particular, it addresses the issues of partitioning, hierarchy and multiple power-gated domains.

By Michael Keating (Synopsys, Synopsys Fellow), David Flynn (ARM, ARM Fellow), Robert Aitken (ARM, ARM Fellow), Alan Gibbons (Synopsys, Principal Engineer), and Kaijian Shi (Synopsys, Principal Consultant)

A scalable approach to chip architecture is valuable since a system-on-chip design today often becomes a component in an even larger chip in a subsequent product generation.

Hierarchy and Power Gating

figure 1
Figure 1: Power Gating Example

To support this portability, module boundaries must be enforced at the power domain level. That is, a given module should belong to a single power domain, not split across several domains. Some tools and flows support RTL process by RTL process assignment to power domains, but this leads to much more complicated implementation and analysis. Clean visibility of the boundaries of a power-gated block is key to having a clean, top-down implementation and verification flow.

Although one can, in theory, nest power-gated modules arbitrarily within power-gated subsystems, which are in turn nested on a shared switched power rail, there are considerable benefits in not creating multiple levels of power switching fabric. Power gating is intrusive and adds in some voltage drop and degradation of performance. Cascading multiple voltage drops can lead to unacceptable increases in delay.

Even if the design is represented as hierarchical at the architectural level, the implementation is improved if this is mapped onto a single level of power gating at implementation. Consider the example shown in Figure 1. The CPU conceptually has all the core logic power gated, and within it a number of functional units that can each be powered down independently—a Multiply-Accumulate and a Vector Floating Point unit in this case. The modes of operation in Figure 1 are shown in table form in Table 1.


Cache

CPU

MAC

VFP

Power State

(OFF)

(OFF)

-

-

Shutdown (Cache cleaned, VDDCPU off)

ON

OFF

-

-

Deep Sleep (Cache preserved)

ON

ON

OFF

OFF

Normal Operation

ON

ON

ON

OFF

DSP workload

ON

ON

OFF

ON

Graphics workload

ON

ON

ON

ON

Intensive multimedia mode

From an implementation standpoint the switching fabric is flattened as shown in Figure 2. There is never a case when the MAC or VFP functional units are switched on without the CPU core also being powered. So the switch control semantics are adjusted to AND the control terms rather than cascade the switch elements. The power mode table now includes explicit control of the nested power-gated functional units (Table 2).

figure 2
Figure 2: Flattened Switching Network

Recommendations:

  • Map power-gated regions to explicit module boundaries.
  • When partitioning a hierarchical power-gating design, ensure that the power-gating control terms can be mapped back to a flat switching fabric.

Pitfalls:

  • Avoid control signals passing through power-gated or power-down regions to other power regions that are not hierarchically switched with the first region.
  • Avoid excessively fine power-gating granularity unless absolutely required for aggressive leakage power management. Every interface adds implementation and verification challenges and complicates the system-level production test challenges.
  • Avoid a power-gating system of more than one or two levels.

Cache

CPU

MAC

VFP

Power State

(OFF)

(OFF)

(OFF)

(OFF)

Shutdown (Cache cleaned, VDDCPU off)

ON

OFF

OFF

OFF

Deep Sleep (Cache preserved)

ON

ON

OFF

OFF

Normal Operation

ON

ON

ON

OFF

DSP workload

ON

ON

OFF

ON

Graphics workload

ON

ON

ON

ON

Intensive multimedia mode

Power Networks and Their Control

In the design of a processor-based SoC the CPU system may well introduce a number of power networks:

  • An independent power rail to the entire cached CPU subsystem—this allows the CPU to be completely turned off for long-term “sleep” modes of operation.
  • A power-gated supply to the CPU logic to support short-term leakage savings modes where the cache memory can be left retained but all the leaky standard cell logic turned off locally.
  • Optionally, some form of always-on retention power supply from the non-power-gated rail. This is needed to support state-retention registers in the standard cell portion of the design.
  • An always-on supply to provide power to the isolation cells.
  • A non-power-gated supply for the power-gating controller and for the buffers on all the power control signals: the power switch controls, the retention controls and the isolation controls.
  • An SoC-level always-on supply to control the external rail switching handshake with the power supply.
figure 3
Figure 3: Power Network Control
figure 4
Figure 4: External Power Rail Switching

Figure 3 illustrates the power networks with independent “VDDCPU” and always-on “VDDSOC” with a common VSS ground connection; in this example, the power-gated standard cell area has a non-gated state retention supply shown to indicate an active supply rail within a power-gated region.

External Power Rail Switching
External power rail switching (Figure 4) offers the best long-term leakage power savings—but introduces a significant turn-on delay to allow voltage regulation to stabilize and settle within specification.

Only a few voltage rails can typically be externally switched; every power supply incurs (external) regulator cost and area on the circuit board—including inductors and capacitors required to implement switched mode power supplies. Every power rail also requires on-chip power distribution that costs area and complicates the power planning and physical floor-planning. Most SoCs already have at least three power rails:

  • I/O power (at least one of 1.8/2.5/3.3V, and perhaps several depending on the application)
  • “Always-on” SoC core rail (technology-dependent logic and internal memory power rail)
  • Clean analog power supply rail to PLLs
  • An optional “keep-alive” voltage supply to the real-time clock

Adding more than two or three external switch power rails adds significant complex­ity and cost to the end product.

Typically a shared ground/VSS connection approach to the chip and board works best for external power rail switching. Although there are typically independent VSS pins for both the I/O pad-ring and the chip core to de-couple output simultaneous switching activity from the logic and memory, these are typically grounded on the circuit board into a shared “0-volt” ground plane. Treating any other power supplies as switched positive supplies relative to the common ground minimizes complexities when adding power gating.

External power rail switching incurs significant delays on wake-up events—from the order of tens of microseconds to milliseconds or even longer. Much faster supply switching times are not necessarily desirable. The in-rush currents to re-charge all the capacitive nodes in the powered-down subsystem result in noise injection into other (powered) regions of the chip. The resulting “ground-bounce” in a shared ground sys­tem can introduce problems that are hard to quantify until very late in the implemen­tation and analysis phases of the design flow.

Translating such latencies into clock cycles at RTL level is not simple. Normally the clocks should be suppressed until a switched power rail is stable and within specified tolerance. For a design operating in the hundreds of MHz region, this may be the equivalent of tens of thousands of clock cycles. The actual delays are highly depen­dent on the power supply technology (which may have to be multi-sourced in a production).

Separate power rails become a necessity when one introduces dynamic voltage scaling. It may also be highly desirable to give large banks of memory their own supply, which may be switched to intermediate RAM retention operating condi­tions, for example.

Recommendations:

  • Minimize the number of external switched independent power rails—each one must be justified from an end-product requirement given the associated additional power supply real-estate costs and on-chip power distribution.
  • With external switched rails, it is best to switch (positive) supply rails and retain a common ground.
  • In systems implementing voltage scaling, an independent rail must be provided for each voltage scaled region.

Pitfalls:

  • Design for significant external power rail switching times: tens or hundreds of thousands of clock cycle latencies must be factored into wake-up and will be dependent on the external PSU specifications.
  • Although multiple rails appear elegant from a system design perspective, they introduce verification and deployment challenges in production. Independent sup­ply rails have independent voltage control regulators, and independent rails can exhibit vastly different load regulation characteristics when active, wait-stated or halted compared to logic powered at interfaces.

On-Chip Power Gating
On-chip power gating is much faster than off-chip power rail gating. And the smaller the power-gated region, the faster power can be gated on and off. The current required to power up a small power-gated region is much less than that required for a large block. But time must be budgeted to manage the minimization of power gating tran­sients and noise injection as seen by other logic and memory.

Therefore it is realistic to see power gating in terms of a few clock cycles for very small regions and tens or even hundreds of clock cycles for more significant gate counts. Turning on a number of small power-gated regions at the same time is no bet­ter than powering up a large block and may lead to a much more complex power controller.

Power gating has an impact on both performance and area, due to the nature of the switching transistor fabric. These limitations will impact system architecture and design objectives.

Recommendations:

  • Design for technology-dependent power-gating times: tens or hundreds of clock cycle latencies may need to be factored into wake-up times dependent on the area switched and the switching fabric control characteristics.
  • Design for “wait-states” across boundaries where there are dynamically power-gated functional units such that the implementation-dependent delay times can be safely managed and latency constraints set.

Pitfalls:

  • Every power-gated rail introduces verification and test challenges so the number of power-gated regions needs to be carefully justified and factored into project timescales.

Power State Tables and Always on Regions

figure 5
Figure 5: Buffering Inter-Domain Signals
When dealing with multiple power-gated power domains, power routing can become complex. In particular, the concept of “always on” becomes less clear. Figure 5 shows three power domains, each of which is power gated.If power domain B is always on, then there is no problem. But if domain B is turned off while domains A and C are powered up, then there is a problem: the outputs from A to C are corrupted because the buffer in B is powered down. In this case, we would have to route power from some other “always on” supply to the buffer in B. We could use either the isolation supply in A (since it stays on even when A is powered down) or the supply from C.

On the other hand, if we know that whenever B is powered down, then C is also powered down, we do not have to provide a special supply to B. In this case, we consider B to be “relatively always on”—that is, always on relative to domain C. Thus, we can end up with some fairly complicated power routing rules depending on the power-gating relationships among different blocks. UPF provides a succinct way for system architects to communicate these power-gat­ing dependency rules to the implementation tools.

The create_pst and add_pst_state commands allow us to create a power state table that can be used to specify the relationships between different power supply nets.

This article originally appeared as Chapter 6 in Low Power Methodology Manual for System-on-Chip Design (New York: Springer, 2007). Copyright © 2007 by Synopsys, Inc. & ARM Limited. Reprinted by permission of the authors. The Low Power Methodology Manual (LPMM) is a comprehensive and practical guide to managing power in system-on-chip designs, critical to designers using 90-nanometer and below technology. This book is a must-read for anyone designing, or getting ready to design, SoCs for low-power applications.

Synopsys, Inc.
Mountain View, CA
(650) 584-5000
www.synopsys.com/lpmm

ARM Inc.
Sunnyvale, CA
(408) 734-5600
www.arm.com/lpmm

 Share
Digg Reddit Del.icio.us Stumble Upon Facebook Twitter Google BlinkList Technorati Mixx Windows Live Bookmark MySpace Yahoo Bookmarks Diigo


Insert your comment

Author Name(required):

Author Web Site:

Author email address(required):

Comment(required):

Please Introduce Secure Code:


image
image