More on Mentor’s Catapult C from John Cooley and Other Designers

December 18, 2009 on 11:15 pm | In Design, EDA, SOC | No Comments

Earlier this month, I wrote about Mentor’s C-to-gates synthesis tool Catapult C and low-power design. The EDA industry’s self-appointed gadfly and uber-user John Cooley has just written an extensive blog posting about Catapult C complete with detailed comments from several of his reader/users. These comments and Cooley’s conclusions are very, very interesting for people in the ESL space, as well as anyone involved in chip design, so I thought I’d highlight some of Cooley’s conclusions.

First, Cooley quotes EDA analyst Gary Smith’s published numbers to quantify Catapult C’s lead in the high-level synthesis arena. He then uses the anecdotal evidence of the large number of comments (both good and bad) that his reader/users make about Catapult C relative to the other high-level synthesis tools to conclude that there do seem to be more IC designers using Catapult C than competing tools.

Cooley then hands the microphone over to his designer/readers for comments. One thing that really strikes me about these comments is the number of people who want to use C++ to describe hardware. Now C++ compilers have a tough enough time creating streamlined object code out of C++ descriptions. C++ allows such a high level of abstract description that algorithmic descriptions more resemble poetry than precise engineering-style descriptions. My opinion is that expecting any and all C++ descriptions to result in efficient hardware is a bit of a reach. No matter how good the compiler is, C++ descriptions can be so abstract that it can be tremendously difficult to infer any sort of efficient hardware design from such descriptions. The likelihood of developing mind-reading compilers in the near future seem mighty slim to me.

Other designer/readers seem to share my concerns. One engineer who sent a comment to Cooley and who preferred to remain anonymous wrote: “I remain concerned that quality-of-results derived from designs developed in ANSI C/C++ will not compare well to hand-coded RTL for our design area (hardware accelerators for broadband communications), regardless of claimed market share of Catapult C.”

Now don’t make something out of this skepticism (mine and “anonymous”) that’s not there. When logic synthesis first appeared in the early 1980s, it too “suffered” from a quality-of-results issue. The earliest logic-synthesis tools could not generate gate-level designs that were as efficient as manually-created designs by even moderately good human logic designers just as C compilers could not initially generate assembly code that was as efficient as code written by a good human codesmith. However, two things happed to make this issue become a non-issue.

First, the tools simply got better. Adoption of Verilog and VHDL as description languages helped to standardize the sea of HDL slopping over the EDA bucket back in the 1980s. Standardization gave compiler designers a focused target and channeled their creative energy into building better synthesis tools rather than creating ever-more-elegant description languages.

Second, Moore’s Law made irrelevant the difference between 10 and 100 gates or even between 100 and 1000 gates. At some point, we stopped counting gates just as we’d previously stopped counting polygons and transistors. We don’t really know how many gates there are on a chip any more and what’s more, we not longer care. Not really. Because it’s square millimeters of silicon that actually costs money, not gates. So today we use square millimeters and then use a fudge factor to estimate the number of gates represented by the square-millimeter metric.

Perhaps something like that will happen with high-level synthesis. The jury’s still out.

Laser Spike Annealing of Nickel in Nanometer CMOS ICs Cuts Leakage 10x

December 6, 2009 on 8:22 pm | In CMOS, Design, EDA, Green Design, Low-Power, SOC | No Comments

One of the sad facts of life for nanometer silicon has been the rise of leakage current as device geometries shrink. At 65nm, CMOS leakage currents roughly equal operating currents, making it virtually impossible to reduce overall operating current by more than half. I’ve long thought this was the result of low-Vt transistors that can never fully turn off, a consequence of the drive to recover speed that’s lost when supply voltages are cut to reduce operating power. Turns out there’s another culprit: nickel contamination that occurs when nickel atoms drift away from the nickel-silicide interface layer used to improve the connectivity of metal inter-layer contact plugs. The nickel atoms drift during the annealing process, which is used to drive the deposited nickel atoms into the transistors’ source and drain contact pads. The first of two annealing cycles drives the metallic nickel atoms into the silicon source and drain pads creating Ni2Si silicide. A second, higher-temperature annealing process converts the Ni2Si into NiSi, which has lower resistance and thus provides good electrical connectivity between the contact pad and the metal interconnect plug.

It turns out that the current “soak” annealing (which lasts for tens of seconds) processes allow the nickel atoms to drift far afield. Like beach sand in your bathing suit, the nickel gets into places you’d rather not have it. The drifting nickel atoms seem to have an affinity for silicon lattice discontinuities, which can be found at the outside ends of the transistor where source and drain diffusions meet the isolation trenches and in long, narrow voids that run from the source and drain regions towards and into the FET channel. Both of these hiding places cause leakage because the metallic nickel conducts electricity where there should be insulator or semiconductor material. Nickel at the ends of the transistor causes substrate leakage and nickel atoms in the channel naturally cause channel leakage.

Applied Materials and European semiconductor research powerhouse IMEC have jointly developed a laser-annealing process with one-millisecond duration instead of taking tens of seconds. As a result, the diffusing nickel doesn’t have time to drift into these unwanted places during the second annealing step that generates NiSi. Applied Materials described a similar laser-spike annealing process back in 2004 (see article here), but reportedly achieved only a 3-4% leakage reduction back then. This latest development appears to be a refinement of that earlier technique. The two companies will be presenting their findings at this week’s IEDM conference in Baltimore, Maryland.

IMEC and Applied Materials will indeed have pulled a rabbit out of the hat if this laser-spike annealing process plus the application of appropriate transistor-design rules result in cutting leakage currents by 90% for nanometer CMOS. Leakage-driven power loss has become a significant problem for advanced IC design and had appeared to be insurmountable, even with the addition of high-K and metal-gate processing. Now, it appears there’s a real solution with the best of all possible implications for system and logic designers: they don’t need to learn anything new. They can leave this fix to the design tools and to the process engineers and once again skirt the system-level and architectural issues of low-power design.

C-to-Gates Synthesis and Low-Power Design

December 4, 2009 on 1:42 pm | In Design, Low-Power, SOC | No Comments

One of the many “pushbutton” design-automation tools that chip designers have sought is a “C-to-Gates” tool that would allow the automated development of hardware from algorithmic descriptions written in the C programming language.

The place to start almost any system design is with the most fundamental aspect of system design: algorithm development. Systems are collections of independently and dependently operating algorithms. For example, a DVD player uses one algorithm to decompress the combined media stream, another to decode the resulting video stream, and yet another algorithm to decode the resulting audio stream. All systems are based on the execution of one or more algorithms.

Most algorithm development begins and ends with C. One exception to this rule is MATLAB from The Mathworks, which is very popular with many algorithm developers. However, there are ways to get MATLAB to produce C from MATLAB-based algorithms as well.

Given the close association between algorithm development and the C language, it’s only natural to want a tool that automatically converts C descriptions to hardware netlists. However, early attempts to create such tools didn’t meet with much commercial success, largely because the quality of results (number of gates, operational speed, and resulting power consumption) compared poorly to manual RTL-generation design techniques.

The tools have been getting better over the years, leading to this recent announcement by Mentor Graphics:

“Mentor Graphics Corp. (NASDAQ: MENT), today announced that Fujitsu Kyushu Network Technologies Limited (Fujitsu QNET) has chosen the Catapult C Synthesis tool for use in its design tool environment to implement complex algorithms in hardware that were previously processed by a processor implemented on LSI. Fujitsu’s growing expertise with the Catapult C tool is also a key enabler in the expansion of their design services business.

Fujitsu QNET was able to dramatically cut power consumption by using the Catapult C Synthesis tool to create a dedicated hardware accelerator for mobile voice processing algorithm versus running in software. The resulting silicon implementation yielded a reduction in power consumption of 83%. This was made possible by the ability of the Catapult C tool to find the optimal trade-off between power, performance and area, in this case, implementing a design satisfying voice performance requirements while running at a lower clock frequency than the previous implementation using a processor.”

Based on this announcement, it would seem that C-to-gates tools, at least Mentor’s Catapult C, are getting closer to reality. In fact, the above announcement would lead you to believe that such tools are here and production-ready today. Indeed, Mentor’s description of the Catapult C synthesis tool appears mighty attractive:

“Catapult C Synthesis reduces design time and verification effort. When writing pure C++, designers focus on the functional intent of their application. Timing and architectural information is abstracted away from the source description. With fewer details in the model, testbench development is also simplified.

Implementation of specific details are automatically added during the synthesis process, eliminating error-prone manual interventions and resulting in RTL designs correct by construction. Debug of the resulting RTL is in turn eliminated, further reducing the overall verification effort.

The Catapult C automated verification environment allows any RTL implementation of a C++ model to be verified using the original C++ testbench. This eliminates the need to write pin-level interfacing and bit-timed RTL environments to verify the RTL blocks created by Catapult before moving to system integration.”

There’s a caveat or two to remember, however. First, there’s good C code and bad C code whether you’re writing code to run on a processor or code that’s to be synthesized into gates. In the case of C-to-gates synthesis, good code signals the design intent as clearly as possible so that the synthesis tool needs to infer as little as possible. Machine inference is the second caveat. Every detail that Catapult C—or any synthesis tool for that matter—must infer is a design detail that you didn’t put into the design.

Conventional RTL-driven logic synthesis makes such inferences all the time and, over the years, designers have gotten savvy to the kinds of inferences that will be made by their logic-synthesis tools and have compensated by adapting their code-writing styles when writing hardware descriptions in the Verilog and VHDL hardware description languages. However, C has largely been used as a sequential algorithm description tool to create software that runs on single processors and use patterns by engineers reflect that long history. In addition, C describes algorithms at a higher level of abstraction than descriptions written in hardware description languages. As a result, there’s always more to infer from a C algorithm description just due to the higher abstraction level.

So eye that 83% power-reduction that Fujitsu QNET achieved for that voice-processing algorithm with envy. Just remember that engineering isn’t as simple as pushing a button.

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^