Hawaii GPU Architecture
The Radeon R9 290X is based on the GPU codenamed Hawaii. While it is a new GPU design in AMD’s line-up, it is still based on the Graphics Core Next architecture (GCN), which debuted in the Radeon HD 7000 series. The GPU has been significantly beefed up versus previous-gen products, however, and is one of the largest pieces of silicon to come out of AMD in quite a while.
AMD Hawaii GPU Block Diagram
The R9 290 series GPU (Hawaii) is comprised of up to 44 compute units with a total of 2,816 IEEE-2008 compliant shaders, 176 texture unit, and 64 ROPs. The GPU has four geometry processors (2x the Radeon HD 7970) and can output 64 pixels per clock. The GPU also has 1MB of L2 cache on board and features a wide 512-bit GDDR5 memory interface, versus the 384-bit interface on AMD’s previous-gen high-end parts.
The R9 290 series GPU features roughly 6.2 billion transistors and is manufactured using TSMC’s 28nm processor node. Its die size is about 438mm2, which is approximately 24% larger than the Radeon HD 7970 (Tahiti) which came in around 352mm2.
In addition to being larger and offering more shaders, a wider memory bus, and increased geometry throughput, the R9 290 series GPU also sports a number of new features, namely TrueAudio support, a new bridge-less CrossFire engine, a more flexible display output configuration, and enhanced PowerTune capabilities designed to wring the most performance out of the GPU as possible.
If you didn’t see our original coverage during the AMD webcast from Hawaii last month, TrueAudio is a new positional and 3D spatial audio engine that will be available on the R9 290X, R9 290, and the R7 260X. To enable TrueAudio, AMD incorporated DSPs into the GPUs and worked with audio middleware providers like Firelight Technologies (FMOD) and AudioKinetics (Wwise) to enable better positional audio that leverages a programmable audio pipeline that resides on the GPU. Those audio pipelines consist of multiple audio optimized DSP cores, which support the Tensilica HiFi2-EP instruction set. There are also 32 KB instruction and data caches and 8 KB of scratch RAM used for fast local operations.
The R9 290X’s bridge-less CrossFire mode comes by way of a new hardware DMA engine that resides inside the CrossFire compositing block on the GPU. The previous-gen bridged CrossFire implementations are bandwidth limited and cannot transfer images over 4MP. This new mode, however, is designed for UltraHD 4K resolutions, though it may be able to scale even higher (information out of AMD wasn’t clear). The hardware DMA engine allows for direct communication between GPUs over PCI Express, with no external connector necessary. And though it technically consumes PCIe bandwidth, there is no real-world performance penalty versus the previous-gen implementation since graphics cards aren’t typically bandwidth starved at the slot anyway.
Eyefinity -- DisplayPort No Longer Required
Radeon R9 series graphics cards will also feature a more flexible display output configuration. Whereas Eyefinity used to require that at least one monitor be attached via DisplayPort, the Radeon R9 290X can use whatever combination of display outputs the user desires—DisplayPort is supported, but not required. Up to six monitors can be connected up to a single card, though a DisplayPort MST hub will be required for anything above four monitors.
AMD has also implemented new PowerTune related features into the Radeon R9 series, which leverage a new 2nd generation Serial VID (SVI2) interface and dedicated telemetry with 20Mbps of bandwidth, voltage switching times on the order of ~10μs, 6.25mV voltage step granularity, and 255 voltage steps between 0.00V to 1.55V.
Previously, a pre-determined power target was used to determine the peak boost clocks of a GPU. If a given workload wasn’t fully utilizing available board power and environmental conditions and temperatures were acceptable, the GPU’s voltage and frequency would be boosted to take advantage of any spare power. The R9 series’ new PowerTune features work in a similar manner, but in lieu of a strict power target they use actual GPU temperature and power targets in their determination of peak boost frequencies and voltages.