Generation-jumping 2.5D Xilinx Virtex-7 2000T FPGA delivers 1,954,560 logic cells, consumes only 20W

Xilinx announced today that it is shipping Virtex-7 2000T FPGAs to customers. This is one monster FPGA. Its 6.8 billion transistors deliver 1,954,560 logic cells, 21.55 Mbits of distributed SRAM, 2160 DSP slices, 46,512Kbits of block RAM, four PCIe ports, 36 12.5Gbps GTX serial transceivers, and 1200 user I/O pins. All in about 20W (!!!). The only fly in the ointment, if you want to call it that, is that no one on this planet can make this FPGA as a monolithic device. The Virtex-7 FPGA is a 2.5D assembly that combines four FPGA tiles on a silicon interposer. The interposer provides 10,000 connections between each of the four FPGA tiles

Here’s an exploded diagram of the FPGA assembly:

I’ll be covering the 2.5D/3D assembly aspects of this new FPGA in more detail on my EDA360 Insider blog this coming Thursday (for 3D Thursday), but I want to discuss the low-power aspects of this device here, now.

The obvious contributor to the Virtex-7 2000T FPGA’s power profile is the use of the 28nm TSMC HPL (high-performance, low-power) high-K, metal-gate (HKMG) process technology. Xilinx chose TSMC’s 28 HPL process to make all of its Series 7 FPGA family members (Virtex-7, Kintex-7, and Artex-7) over TSMC’s 28 HP and 28 P process technologies. Instead of letting us know this fact and leaving it there,  Xilinx published a White Paper that goes into great detail about the decision (“Lowering Power at 28nm with Xilinx 7 Series FPGAs”).

Xilinx considered all three of the TSMC 28nm process technologies for the 7-series FPGA families but the company quickly locked on the two HKMG processes (HP and HPL) as being the “best” for FPGA design. Because Xilinx wanted to use just one process technology to cover all of the planned Series-7 FPGA families from high-performance to low-power, HKMG promised the best mix of performance and leakage for the company’s unified approach to designing all of the Series-7 FGPA families. TSMC’s 28 LP process uses PolySiON (polysilicon/silicon oxy-nitride) gate insulation and is best suited for designs that require less performance than FPGAs. The PolySiON 28 LP process produces transistors that are about 13% slower than those produced in the 28 HPL and 28 HP processes (for the types of transistors Xilinx would be using to build its Series-7 FPGAs) while exhibiting more than twice the leakage. The advantage of the 28 LP process is that it’s less expensive.

Eliminating the 28 LP process as a possibility left the choice between TSMC’s 28 HPL and 28 HP processes. Both processes can produce equally fast transistors but the 28 HP process produces transistors with about twice the leakage of the 28 HPL process for the types of transistors Xilinx would be using to build its Series-7 FPGAs. According to the White Paper, TSMC’s 28 HP process is better suited to GPU and CPU designs that require the ultimate performance and that have the power budget (~100W) to achieve that performance. The maximum Xilinx Series-7 FPGA power budget is 40W, so the company selected TSMC’s 28 HPL process technology. However, in a demo last week, Xilinx showed the Virtex-7 2000T FPGA simultaneously running 3600 copies of its 8-bit nanoBlaze processor core, delivering 180,000 MIPS while consuming 20W. That’s a really impressive amount of computation for the power.

However, it’s the system implications of the Virtex 7 2000T FPGA’s huge capacity that really have an impact on system power consumption. Here is an example diagram that Xilinx used to showcase the power-consumption advantage of the Virtex-7 2000T FPGA:

In this image, Xilinx claims that one Virtex-7 2000T FPGA delivers the equivalent capability of four “competitive” FPGAs, each with nearly 1M logic cells. Xilinx came to this conclusion based on characteristics other than logic-cell count and I’m not going to tell you that this diagram shows an apples-to-apples comparison. That’s not my point in using this diagram.

For my purposes, the diagram shows four smaller FPGAs tied together with many high-speed differential pairs. Each composite FPGA-to-FPGA link burns 8W. That’s 32W total used just for chip-to-chip communications. To me, that’s the secret low-power sauce that the 2.5D assembly approach of the Xilinx Virtex-7 2000T has. The extremely wide I/O provided by the silicon interposer drastically reduces the power consumption of tile-to-tile communications and this power consumption is included in the device’s overall power consumption number: 20W. That means the FPGA power consumption is essentially “free” if you look at the wrong end of the telescope.

Where might this reduction in power consumption come in handy? At the announcement, Xilinx VP of FPGA Development and Silicon Technology Liam Madden discussed two example cases relevant to this discussion.

The first example involved a customer looking at developing a large ASIC for a communications application. The defining performance characteristic was the need to handle a terabit/sec aggregate data rate using an estimated 20M gates for the logic. The power budget was about 30W. The customer knew that it wanted some amount of programmability and was therefore considering a 3-chip solution with one ASIC and two FPGAs. The estimated time to develop this design was three years and the estimated power consumption was 70W. Way out of the ballpark. One Xilinx Virtex-7 2000T filled the bill and beat the power budget by 30%.

The second example involved the replacement of six chips with one device. One Xilinx Virtex-7 FPGA swallowed all six devices and delivered 5x the performance of the existing multi-chip system while consuming 1/7 of the power.

Here’s the before picture:

And here’s the “after” picture showing the function of all six chips compiled into one Xilinx Virtex-7 2000T FPGA:

Clearly, 2.5D and 3D assembly is going to have a major influence on the way we design low-power systems in the future.

This entry was posted in FPGA, Low-Power and tagged , , , , . Bookmark the permalink.