New Tensilica IP Cores Targeted to Dataplane and Signal Processing Functions Such as Imaging, Communications and Networking, Boost Data Bandwidth 4X
Santa Clara, Calif. USA – March 28, 2011 – Tensilica, Inc. today announced that it is extending its leadership in IP (intellectual property) cores for compute-intensive dataplane and DSP (digital signal processor) functions such as imaging, video, networking and baseband wired/wireless communications. Any application that requires extensive data processing will significantly benefit from the breakthrough features – particularly those that quadruple data bandwidth – built into Tensilica’s new Xtensa LX4 dataplane processor (DPU) for SOCs (system-on-chips).
The new Xtensa LX4 DPU supports wider local data memory bandwidth of up to 1024 bits per cycle, wider VLIW (very long instruction word) instructions up to 128 bits for increased parallel processing, and a cache memory prefetch option that boosts overall performance for systems with long off-chip memory latency. Tensilica is already using many of these capabilities in its recently introduced ConnX BBE64 DSP for LTE Advanced communications.
“The strength of Tensilica’s DPUs is the ability to combine control and digital signal processing functions in cores that can be optimized to provide 10x to 100x performance improvement compared to a standard RISC or DSP core,” stated Steve Roddy, Tensilica’s vice president of marketing and business development. “Now, with Xtensa LX4, Tensilica offers IP cores that range from an ultra-small programmable DPU as exemplified by a 1GigaMAC DSP in 0.01mm2 (in 28 nm process technology) up to the ConnX BBE 64-128, the world’s highest performance licensable DSP IP core with over 100 GigaMAC per second performance.”
Wider Data Fetch for Higher Bandwidth
Tensilica’s Xtensa LX4 DPU has four times the local data memory bandwidth of the Xtensa LX3 DPU, with up to two 512-bit load/store operations per cycle. Designers can now easily create super-wide SIMD (single instruction multiple data) DSPs that pump more data into more MAC (multiply accumulate) units each clock cycle for extremely fast performance. This makes Xtensa LX4 DPUs ideal for wired and wireless baseband processing, video pre- and post-processing, image signal processing, and various network packet processing functions.
This enhanced local memory bandwidth is in addition to Tensilica’s existing customizable local port and queue interfaces that provide unlimited point-to-point data and control signal bandwidth. Tensilica now offers both the unique Port/Queue interfaces that allow connections between Xtensa DPUs and other system block just like traditional RTL block interconnection, and the new ultra-high bandwidth local memory connections.
Wider Instructions for Increased Parallel Processing
With Xtensa LX4, Tensilica doubles the allowable width of its Flexible Length Instruction eXtensions (FLIX) instructions from 64- to 128-bits wide. This allows the execution of twice the number of independent operations per clock cycle. Every wide FLIX instruction is seamlessly intermixed with the shorter base Xtensa instruction set so there is no mode switch penalty when using FLIX.
With FLIX, the Xtensa LX4 DPU can deliver the ultra-high-performance characteristics of a specialty VLIW processor with smaller code size than competing VLIW DSPs. Tensilica’s Xtensa C/C++ compiler automatically extracts parallelism from source code and bundles multiple operations into single FLIX instructions. An Xtensa LX4 DPU with wide FLIX instructions running parallel operations at low clock frequency can often deliver performance matching that of larger, higher MHz non-VLIW cores, but consumes far less energy completing the same task.
Prefetch Reduces Cycle Counts
The new data prefetch option reduces cycle counts in long-latency designs by fetching data from system memory ahead of its use. This way, the data is ready and waiting when the application code needs it, reducing wasted cycles when the DPU would have to wait for data. The benefits are seen most when streaming data from contiguous memory locations. It’s a much simpler alternative for memory access optimization than adding a separate DMA (Direct Memory Access) engine, which requires additional software programming and application code tuning.
Key to Success: Automation
Tensilica provides tools that automate not only the creation of the DPU hardware, but also the creation of the matching comprehensive software development tool set. Because the underlying base Xtensa instruction set is never changed, designers can access Tensilica’s robust ecosystem of third party applications software and development tools even after heavily customizing the Xtensa DPU.
Customizable Xtensa DPUs are compatible with major operating systems, debug probes and ICE (in-circuit emulator) solutions, and come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the Eclipse framework, a world-class compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry standard GNU toolchain.
New with this release is Tensilica’s Vectorization Assistant tool, a first-of-its-kind tool that suggests ways developers can improve compiler vectorization of their C-code when running on SIMD (single instruction multiple data) DSPs. The Vectorization Assistant explains what is preventing further vectorzation so the software developer can improve the source C-code to take advantage of the DPU’s parallel execution units.
Availability and Performance
The Xtensa LX4 DPU is available now from Tensilica. The base Xtensa LX4 DPU can reach speeds of over 1 GHz in 45 nm process technology (45GS) with an area of just 0.044 mm2.