Configurable Processors—Boon or Bane?
Is configurability the cure-all for volatile portable designs or would you be better advised to go with what you know?
When it comes to portable designs, it’s hard to say which is moving faster, the market or the technology. Each year we see scores of new cell phone models introduced, most of which will disappear within six months—and all of which are the end product of a design process that started 12-18 months earlier. During that time audio, video and RF standards have continued to evolve and consumer tastes have remained on spin cycle. The chances that all of your original design decisions will turn out to be on the mark a year or two later are vanishingly small.
These are concerns that configurable processors attempt to address. Up to the time you tape out your SoC you can optimize it to handle that recently-standardized video codec, making informed tradeoffs between power, performance and price. However, does the vendor’s toolset support the custom instructions you just created, or would you do better to stick with a more standard and proven configuration? As with everything else, it depends.
The Configurable Camp Makes Its Case
Sumit Gupta, Product Marketing Manager at Tensilica, makes the case for configurability. “The advantage of a configurable processor over high-MHz processors is that it gives you the performance you need without giving up on area,” according to Gupta. “Higher speed always comes at the cost of higher area. Higher speed means a longer pipeline, more FLOPS, more branch prediction, fancier memory architectures and out-of-order logic.” With configurable processors, you’ve created an area-optimized solution, which equates to lower power and lower cost.
Tensilica’s main configurable offering is the Xtensa 7 processor (Figure 1), a configurable, extensible and synthesizable 32-bit RISC processor core intended for high-performance, low-power applications such as embedded control and digital signal processing. Xtensa features a 5-stage pipeline, 16/24-bit instruction encoding with modeless switching and tool support for a wide range of designer-configurable options. Xtensa LX2 adds RTL-equivalent bandwidth and is aimed at more data-intensive applications.
Figure 1: Tensilica Xtensa 7 processor architecture
“Tensilica’s Xtensa processor architecture was designed from the start for use in ASICs,” explains Tensilica’s Technology Evangelist Steve Leibson. “As a result, the Xtensa architecture uses the good RISC features that improve performance, such as 3-operand instructions, pipelining for single-cycle instruction execution, and a load/store architecture, but it has decidedly non-RISC features as well. Most notable of these is the Xtensa ISA, which includes native 16- and 24-bit instructions, which reduces code footprint, reduces the number of instruction fetches needed to execute a program, and therefore reduces both cost and execution time when compared to the original 32-bit instruction sets of the ARM and MIPS architectures.”
“Because the 16- and 24-bit instructions are native,” continued Leibson, “there’s no mode switching between them. The processor can automatically determine the length of each instruction and therefore doesn’t switch modes between them. In addition, the Xtensa processor’s instruction set includes some merged instructions that perform more than one operation per instruction, such as the EXTUI instruction, which performs a shift-and-mask operation in one clock cycle.”
ARC International has also staked its claim in the configurable processor arena. In ARC’s case the focus is on complete audio and video subsystem cores. According to Bill Jackson, ARC’s VP of Marketing, “We have a series of product offerings starting with traditional 32-bit RISC CPUs, which is the history of ARC. It's a relatively traditional 32-bit RISC but it’s configurable; that is, the licensor of the IP can change things in it to suit their needs. For example you can add a new instruction to the instruction set to do something that is unique to your application. We provide the tools and the software necessary to enable all of that as part of our core offering.”
“The next step up from that is subsystems (Figure 2),” continued Jackson. “Those are preconfigured systems that are set for a specific purpose. For example we do an audio subsystem called the ARC Sound 210; that is a processor that is configured with specific extensions in it for audio processing. On top of that we license codecs that do processing of different audio types, for example MP3. So we sell both audio and video subsystems. And then finally we have post-processing software that runs on top of all that; it does enhancement of process audio.”
Figure 2: ARC VRaptor video subsystem architecture
To what extent are ARC processors configurable? Jackson: “The user can configure the register set; they can configure the cache sizes; they can configure instructions, and there are several bus options that they can configure. You can take things out. We provide a tool called ARChitect that enables you to define and build these things, so it's not like you are editing RTL.”
What advantage does the ARC architecture offer? “We have a considerable range of technology that allows someone to tailor hardware to be exactly what they need so they don’t have to spend any more power or silicon area or design time on features that they don't care about, unlike the vast majority of our competitors where if you buy their core from them it is what it is and you're stuck with the whole thing. That's one of our advantages over the likes of ARM and MIPS.”
ARM and MIPS See It Differently
Needless to say, ARM and MIPS have a different take on things. While admitting the advantages of configurability, both claim they have been mischaracterized as having fixed architectures when in fact they’re both highly configurable.
Richard York, Director of Product Marketing for microprocessors at ARM, takes particular umbrage to characterizing ARM’s architecture as ‘fixed’. “People sometimes think of ARM has having a fixed architecture and other companies has having a configurable architecture,” complains York. “Although at the core instruction set level that's true, at the system level—particularly around the memory systems and the interrupt controllers and the memory interfaces—that hasn't been true for arm for quite a while. Cortex-M3 (Figure 3), our microcontroller core, has something like 10 discrete configurability options, and with some of those options you can choose 10 to 20 different options. So there are something like 266 million discrete versions that you can configure. The number of interrupts, the interrupt controller, what your debug looks like, how many breakpoints and watchpoints—all of these things we configure a huge amount on our products.”
Figure 3: ARM Cortex-M3 processor architecture
“Our architecture has evolved,” continued York. “We're not purely RISC anymore. From the beginning we've been a bit pragmatic about having a variable-length instruction set, which some people think is just anathema to the microprocessor world; and frankly, variable length might make the processor more complex. But if I can shave 10% off my code size, that can save me tens of cents and can make the difference between a successful product and an unsuccessful one. We've now standardized on a variable-length instruction set, which means that a compiler can pick freely between 16-bit and 32-bit instructions. So a customer isn't stuck with having to double the amount of flash memory based on their choice to microprocessor. That's a wrong direction to have gone.”
York plays the ecosystem card in response to Tensilica and ARC. “At one level the benefits of the architecture have nothing to do with the architecture itself but everything to do with the ecosystem around it. I often find when I present to customers that one of the reasons we win designs against our competition is because they want a choice of which compiler to use, which RTOS, which debugger and which ICE vendor. If they go with ARC or Tensilica, they're typically locked into one or two vendors, and that makes them nervous. So in some sense it's not the features of the architecture that's the important thing, it’s support for the architecture.”
Jack Brown, VP of marketing at MIPS, takes a similar tack. “Our products have a large degree of configurability, and that goes for all of our products—from the 4KE family through 24K/34K/74K (Figure 4) and multicore 1004K. Those products have scratchpad RAM, which is very fast; and no-wait-state RAM, as an alternative to cache. You can have various sizes of caches or no cache. You can determine things about the MMU, whether it’s a TLB or an FMT for map translations, and you can determine the size of the TLBs. You can add user-defined instructions; you can add a custom coprocessor. We have EJTAG and program-trace debug capabilities. Each of these standard cores has between 10 and 20 functional areas that have a large degree of configurability. So before I even start considering adding custom instructions I can optimize how much area and how many gates I’m putting down for a specific application.”
Figure 4: MIPS 74K core architecture
In addition to choosing between processor options, you can add custom instructions to MIPS processors. “Custom instructions can be attached in one of two ways,” explains Brown. “There are user-defined instructions, and those allow you to make use of the arithmetic logic unit in some of the cores; in other cores you can just attach directly to the instruction pipeline. But now you can leverage the machine’s ability to fetch instructions, and you provide custom logic for the decoded implementation of instructions.”
“The other way to add instructions is through COP2 or the coprocessor,” continued Brown. “If I'm doing just a few instructions, say I want to make a VoIP application really scream, maybe I had one or two instructions. If I'm trying to do a graphics interface to make it run faster, I might want to go through the graphics processor and then I can use the COP2 interface, which allows me to save and restore user states; it's a heavier duty way to add instructions if you're adding say dozens of instructions. So there's a light way and a heavy-duty way. Both of these are supported by our compilers and they're also supported by our MIPSsim instruction set simulator that MIPS provides, so that lets the customer do this in a simulation SystemC environment or in RTL.”
Are You Really Configurable?
Christian Heidarson is principal semiconductor research analyst with Gartner Hong Kong, specializing in microprocessors and embedded processor cores. He offered some perspective as well as context about MIPS, ARM, ARC and Tensilica.
“Configurable processors tend to be most appreciated when you're looking more for a hybrid MPU/DSP solution, which ARC and Tensilica can offer with their configurable options,” explained Heidarson. “Now with the Cortex-M3 ARM is making a huge push with THUMB-2 into deeply embedded controllers, so ARM will be competing more with ARC and Tensilica.”
“ARM’s and MIPS’ strengths lie in their standard instruction sets; they do offer standard extensions for multimedia processing, for security and other areas,” continued Heidarson. “Then they do have some custom areas, which really helps them to reply to customers who are interested in specific applications and who are concerned that general-purpose instructions might fall short. But most of the ARM’s and MIPS’ customers will be using either their standard ISA or one of their standard extensions.”
Are ARM’s and MIPS’ processors configurable in the same sense—or to the same extent—that ARC’s and Tensilica’s are? It depends on how you define ‘configurable’. Both ARM and MIPS claim their cores are configurable because you have any number of options that you can select before committing to silicon. ARC and Tensilica counter that their ISAs are far more extensible and thus capable of generating cores that are better optimized to a specific application; and their DSP capability enables flexible post-silicon software control. ARM and MIPS reply that their cores are more proven and their respective ecosystems are far broader, as is their tool support.
Heidarson doesn’t see configurability as the key bone of contention between the two camps. “I think today ARM and MIPS are not so worried about the threat of configurable processors; rather it's the small size and efficiency of the ARC and Tensilica solutions in controller applications. Tensilica, in addition, has these extremely sophisticated toolsets for configuring your processor and setting them up in multiprocessor configurations.”
At the highest level the configurability argument is already over, since the major processor core vendors now all claim that their cores are configurable. To what extent they are—and how well their products will fit into your particular design and tool flow—is up to you to determine.
This article originally appeared in the January, 2009 issue of Portable Design. Reprinted with permission.
Santa Clara, CA
San Jose, CA
(408) 734 5600
MIPS Technologies, Inc.
Mountain View, CA