In-Depth Analysis Explains Llano's Bandwidth Sensitivity

AMD's Llano has now debuted in both mobile and desktop flavors to generally strong approval. While CPU performance significantly lags Intel's, Llano's GPU (officially labeled as the Radeon HD 6550D) steamrolls Intel's integrated Sandy Bridge in virtually every gaming benchmark. The two chips take a markedly different approach to CPU/GPU communication; said differences have a very real impact on performance results.

Over at Real World Technologies, David Kanter has compared and contrasted the differences in how Sandy Bridge and Llano manage CPU/GPU intercommunication. As he writes: "The most novel and interesting part of Llano is not the CPU or the GPU. Both of those components were re-used, specifically to avoid any complexity and the associated risks. The software and physical integration is the key to Fusion, and the area where AMD focused the most energy."

One of Llano's peculiarities is the GPU's dependence on main memory bandwidth. Desktop software tends to be far more sensitive to latency than bandwidth. The integrated memory controllers both AMD and Intel use largely negated the benefit of low-latency RAM as well. In short, RAM speed isn't nearly as important to real-world desktop performance as it used to be...except for Llano. Gaming benchmarks show Llano's GPU performance improving by as much as 25 percent when DDR3-1333 was swapped out for DDR3-1866, even at relatively high resolutions.

Kanter's article sheds light on why. The image below compares Sandy Bridge and Llano's interconnects.



There's a major difference between the two. Llano's CPU communicates with the GPU via main memory. Total available bandwidth is 29.8GB/s assuming 1.86GHz DDR3 and just 21.3GB/s when using DDR3-1333. Sandy Bridge's CPU and GPU communicate via a 256-bit, four-wire ring bus clocked at 3.4GHz. Maximum bandwidth is nearly 400GB/s. Real-world bandwidth is nowhere near peak, but with the peak that high it doesn't need to be.

The difference boils down to this: Sandy Bridge's CPU and GPU are better integrated than Llano's. Intel's decision to share the L3 cache between CPU and GPU allows for bi-directional communication across a low-latency/high-bandwidth link. Llano, in contrast, more closely resembles an on-package implementation of an old-style, Northbridge-integrated GPU. GPU RAM (512MB) is allocated out of main memory and it's much easier for the CPU to communicate with the GPU than vice versa.

Implications For The Future:

Llano's GPU could scarcely be better positioned. Not only does it flatten the current competition, there are multiple ways AMD can increase its performance. The next-generation of Llano processors will likely utilize an improved, more thoroughly integrated CPU/GPU communication system. The company could also improve performance by integrating a dedicated memory buffer (Kanter suggests AMD might utilize 3D chip stacking technology to offer as much as 128-256MB of on-die RAM). Regardless of any interconnect improvements, the Bulldozer-based Llano parts due in 2012 will be based on AMD's more recent Cayman GPUs rather than the older Cyprus.

AMD's recent Fusion Developer Summit made it clear that the company is developing APUs that are much more tightly coupled. Intel may have a lead in this respect, but we expect AMD to close the gap in relatively short order. Llano's mainstream positioning may make it a touch boring from an enthusiast perspective, but it has a vital role to play on AMD's road to greater profitability.

We recommend anyone interested in a Llano-based system eye the cost of faster DDR3 carefully. It may be worth investing in faster RAM, provided the cost difference isn't too high.