Reducing Power in Video-Intensive Portable Applications
Sequential clock gating during RTL design selectively disables clocks to eliminate unnecessary switching activity, significantly reducing dynamic power consumption in computationally intensive applications such as image processing.
With the advent of 3G wireless and increasingly advanced integrated circuit (IC) technology, portable consumer devices today are capable of delivering amazingly rich and varied video content. A growing number of mobile products engage users in compelling visual experiences such as live video, interactive gaming, interactive maps, web browsing and high definition photography. And, whether in a basic cell phone or full-featured PDA, these capabilities now play a significant role in a productís appeal and, thus, success in the marketplace.
The power-hungry image processors that enable these applications, however, also present a challenge for the portable system designer. Image processing demands intensive computation for extended periods of time. For example, a mobile device processing video must decompress, decode, scale and apply graphic filters on each incoming frame. It then repeats these computations at a rate of 30 times per second to stream content to the mobile display. It’s a substantial workload, with a voracious appetite for power.
Despite the consumer’s seemingly insatiable desire for video-rich applications, minimizing power consumption and maximizing battery life remain absolute requirements for portable mobile devices. Navigating these conflicting objectives requires a holistic, comprehensive approach to saving power. Portable designers must deploy power saving-techniques throughout the design flow to achieve both standby and active power requirements.
Components of Power
Power optimization in mobile devices starts with an understanding of both static and dynamic power components. Static power is the main contributor to standby power dissipation and results from transistor leakage current. Increased transistor leakage current is a by-product of shrinking device geometries.
Fortunately for designers, leakage can be addressed automatically during synthesis by using multi-voltage threshold cells. During synthesis cells are selected based on power and performance tradeoffs. Higher voltage threshold cells reduce leakage current, but switch more slowly than lower threshold cells. For non-timing critical paths, slower switching, high-voltage threshold cells are used to decrease static power dissipation. As long as the technology library being utilized offers a selection of cells with different thresholds, static power is optimized automatically by low-power implementation tools, and is largely transparent to the designer.
Active power, also referred to as dynamic power, is the consequence of device activity that can be reduced by eliminating unnecessary switching activity. Register transfer level (RTL) clock gating is the most common technique for reducing dynamic power. Clock gating selectively disables clocks to eliminate unnecessary switching activity, significantly reducing dynamic power consumption in computationally intensive applications such as image processing.
Clock Gating Techniques
Figure 1: Comparing combinational and sequential clock gating
There are two types of clock gating, combinational and sequential (Figure 1). Combinational clock gating is a straightforward substitution of conditional statements in the RTL code with clock-gating cells inserted into the clock path of registers. Low-power RTL synthesis tools automatically identify and insert combinational clock gating based on pattern matching. The extent to which low-power synthesis tools can apply clock gating is limited by how the RTL is coded. Low-power synthesis tools identify clock gating opportunities by finding explicit “IF conditions” prior to assignment statements in the RTL code. However, these tools are not capable of analyzing and determining where and when “IF conditions” can be added to the RTL code to reduce power. It is up to the RTL designer to place these “IF conditions” in their code so low-power synthesis can translate them into power saving clock gates.
Sequential clock gating is a more powerful optimization technique with proven ability to reduce power in computationally intensive applications. Unlike combinational clock gating, sequential clock gating is not a simple translation. It involves sequential analysis of design behavior. Sequential analysis looks at circuit functionality over multiple design states and cycles to identify unnecessary switching activity such as unused data computations in pipelines. It then determines the enable logic condition that eliminates unnecessary switching.
For example, when a register output being held in the current cycle, sequential analysis can determine the logic condition to disable the switching in combinational logic and registers that generated the data in the previous cycle. This sequential relationship can be propagated backward and forward across many cycles. RTL designs contain a multitude of sequential relationships that can be exploited to reduce switching activity and, hence, optimize register, memory, clock and combinational logic power.
The potential power savings from sequential clock gating is significant, particularly in applications such as video processing designs that support multiple video formats and data-dependent algorithms. Depending on the incoming video stream, only part of the decode computation and filtering calculations is needed. This provides ample opportunities for clock gating if the designer understands the sequential relationships in their design.
Deploying Clock Gating
Figure 2: A typical manual RTL clock gating design flow
Three key elements of implementing a clock gating methodology are: thoroughly identifying all clock-gating opportunities; accurately creating the RTL code that implements those opportunities; and effectively verifying the clock-gated RTL code to ensure it retains the original functionality.
Until recently, the common approach to clock gating has been for designers to examine their code for clock gating opportunities and to manually add IF conditions to their RTL code. A typical clock-gating design flow includes manually adding clock gating optimizations, using RTL power estimation to gauge the effectiveness of those optimizations, and then running simulation regressions to verify no functionality has been broken. Eventually the RTL code is synthesized and gate-level power estimations are available to determine if more clock gating is required (Figure 2).
There are many inefficiencies in this approach. Because of the vast number of clock-gating opportunities and complexity of the enable conditions, designers can spend considerable time investigating and modifying RTL code. Since not all clock-gating optimizations will result in a net power savings and some may have a negative impact timing and area, a trial and error approach is required to create the lowest power design. This manual effort is labor intensive, extends the development cycle and puts additional demands on RTL simulation. Since clock gating optimizations cannot be verified with traditional combinational equivalency checking tools, simulation testbenches and assertions must be developed to verify that the new clock gating has not disturbed the original functionality.
The natural solution to this dilemma is design automation.
Adding Power Optimization Tools into the RTL Design Flow
Figure 3: Automated RTL power optimization
When adopting new power optimization tools, design teams are well-advised to keep several key considerations in mind. First, it is important that the power savings from a new tool be complimentary and cumulative to existing power optimization tools. Second, to avoid long learning curves and additional script development, new tools should accept standard file formats and fit into existing design flows. Third, it’s important that new tools provide a comprehensive solution so as not to create additional problems elsewhere in the design flow. For example, automating the identification of clock-gating opportunities is interesting, but it can have a negative impact on productivity unless the tool generates optimized RTL code and comprehensively verifies that the original functionality has not been changed.
Recently, commercially available power optimization tools automate the identification, insertion and verification of sequential clock gating in RTL designs (See figure 3). Incorporating automated sequential clock gating and verification brings power saving benefits to video-intensive portable applications.
Such capabilities are embodied in PowerPro CG (for clock gating) and SLEC CG from Calypto Systems Inc. (Santa Clara, Calif.). PowerPro CG is an automated RTL power optimization solution proven to deliver up to 60% power savings with little or no impact on area and performance. SLEC CG is a sequential equivalence checker that comprehensively verifies sequential clock gating optimizations. These tools complement an existing low-power design flow by analyzing sequential behavior in RTL designs and identifying clock-gating opportunities beyond those already present. They fit into existing design flows by reading in standard, synthesizable Verilog and VHDL RTL code.
Additionally, the generated power-optimized RTL code is identical to the original RTL design with additional clock-gating enable logic. The power optimized RTL code is then comprehensively verified using sequential logic equivalence checking to ensure no functional changes are introduced. The power optimized RTL design flows directly into low-power synthesis to take advantage of downstream power optimization capabilities. By relieving the designer from tedious design analysis, manual RTL recoding and time-consuming simulation, productivity and power savings are improved.
Power optimization is a critical requirement for computation intensive, low-power integrating power optimization techniques, such as sequential clock gating into existing design flows, designers can significantly reduce power while actively processing video. With today’s automated sequential clock gating solution, designers can save additional power and improve design productivity over their current manual clock gating methods.
Calypto Design Systems
This article first appeared in the October/November issue of Portable Design. Reprinted with permission.
Santa Clara, CA