AMD tries to play catch-up to Nvidia on performance, but sticks to its efficiency mantra with the new Radeon 6900 series GPUs
When AMD launched the Radeon 6800 series back in October, we suggested that some people might be confused by the naming scheme. The Radeon HD 6870 & 6850, code-named Barts, were actually midrange GPUs, and not replacements for AMD’s high end Radeon HD 5870.
Now AMD is releasing its high end GPU, code-named Cayman, in two versions. The high end, which AMD calling its “enthusiast” GPU is the Radeon HD 6970, while the Radeon HD 6950 slots into a space between the 6870 and the 6970. What’s more, the HD 6870 doesn’t replace the dual-GPU Radeon HD 5970. If all that seems once again confusing, it probably is. It’s probably best just to consider AMD’s current naming scheme as unrelated to its old one, despite similarities.
Cayman is based on the same general architecture as Barts, but is an entirely different chip, with features and tweaks all its own. Let’s take a look at some of the key features of the new GPU.
- Dual Graphics Engine, including twin tessellation compute paths. Cayman delivers up to 24 of what AMD calls SIMD engines, as opposed to the 12-14 shipping on the HD 6800 GPUs. Also included are up to 96 texture units. AMD estimates tessellation performance is roughly three times faster than the HD 5870.
Twin graphics engines essentially double throughput over the 6870.
- The graphics engine is based around a VLIW4 thread processor architecture. AMD decided to eliminate the old “T-unit”, which were targeted for transcendental operations. This allowed for a 10% improvement in performance per mm2, as well as improving scheduling.
The Cayman core has been streamlined into a VLIW4 engine, which simplifies scheduling and improves space usage on the chip.
- The ROPs have been redesigned and streamlined, and now are 2x faster in 16-bit integer performance and 2-4x faster on 32-bit floats (compared to the 5870).
- In order to improve GPU compute performance, AMD implemented asynchronous dispatch, and the GPU can execute multiple compute kernels at the same time, with each kernel having its own command queue and virtual address space. Flow control has been improved, and double-precision floating point performance has been boosted. DP performance is now about 25% that of single precision.
GPU compute improvements should make Cayman a better general purpose processor.
New Anti-Aliasing Modes
Given the additional rendering horsepower, AMD is implementing new anti-aliasing modes into their Catalyst Control Center. These AA modes work with all 6000 series cards.
Enhanced Quality Anti-Aliasing
The first is something AMD is calling “Enhanced Quality Anti-Aliasing” or EQAA. This is similar in concept to Nvidia’s Coverage Sample Anti-Aliasing (CSAA). The new EQAA modes use up to 16 coverage samples per pixel, with the number of colors and samples independently controllable. The net result is better quality within the same memory envelope. EQAA is compatible with other forms of AA supported by AMD.
AMD’s EQAA supports customizable color and coverage samples within a pixel.
EQAA can be enabled within the Catalyst Control Panel to increase image quality with whatever existing AA mode that’s set.
Enabling EQAA within Catalyst Control Panel.
Using EQAA results in a relatively minimal performance hit; our experience in quick testing resulted in less than a 5% difference with 4xAA enabled, and 4xAA plus EQAA.
This is a different beast altogether. MAA is a postprocessing technique which DirectCompute to deliver full-scene anti-aliasing. It’s similar to edge AA in concept, but can detect all edges and works with DX9 through DX11. It even works with games that normally don’t support AA.
The cool thing about MAA is that it works even with transparency.
Currently, morphological AA is enabled in the Catalyst Control Center. Future games may support it. If you enable MAA, you’ll see a performance hit of about 10-20% if you’re using it simultaneously with 4x anti-aliasing. But you’ll see image quality on par with supersampling AA, with better performance than supersampling.
Morphological AA delivers image quality similar to supersampling, but without the performance hit of supersampling.
Controlling Power Usage
One of the issues that GPU companies wrestle with is power usage at full throttle. The problem is that some GPU loads, like Furmark, can hammer a graphics card, and the card needs to be designed to deliver the power and cooling levels needed for such loads. But Furmark, Perlin Noise and other artificial benchmarks can impose a greater load on a GPU than most games.
Even within a game, there are only brief periods when the GPU throttles up to maximum performance; most of the time, it’s running at less than full throttle. What that means is that cards are generally overengineered to manage thermal and power loads that really occur for brief intervals.
To manage those loads, AMD is building in a feature called PowerTune – something that AMD’s David Baumann calls “inverse Turbo.” The GPU constrains power usage to a set maximum level, not allowing the chip to run at higher power levels. That means that momentary peak performance levels may be constrained slightly. However, that also means the graphics card can be shipped supporting higher normal clock speeds.
AMD will also allow users to control the amount of power draw (up to plus or minus 20%) in order to either minimize power and noise or maximize performance. These controls will be built into the AMD Overdrive section of the Catalyst Control Center. Users can then set the balance between maximum thermal power and performance.
AMD will be shipping in two versions, the AMD Radeon HD 6970 and the HD 6950. We’re including the feature set for the Radeon HD 6870 as a base reference point.
|Features||Radeon HD 6870||Radeon HD 6950||Radeon HD 6970|
|GDDR5 Memory Clock||1050MHz||1250MHz||1375MHz|
|GDDR5 Frame Buffer||1GB||2GB||2GB|
The Radeon HD 6970 is aimed at the same market segment as the recently released Nvidia GTX 570, while the Radeon HD 6950 occupies the price point currently held by the GTX 470, but the GTX 470 is being phased out.
Keep reading for the good stuff: All the benchmarks you can take.
Now that we’ve seen the feature set, let’s take a look at performance across a variety of games. We’ve tweaked our standard benchmarking list, dropping DiRT2, HAWX and the Far Cry 2 Action benchmarks with F1 2010, HAWX 2 and Metro 2033 respectively. We compared performance against the Nvidia GTX 570 and 580 cards.
It’s worth talking a bit about drivers. Cayman and Barts are new architectures, and drivers are likely still immature. Nvidia’s new GTX 570 and 580 are really the same Fermi architecture we’ve seen for the past year. So it’s likely we’ll see steeper improvements with new drivers for AMD’s new cards over the next few months. However, it’s probably not a wise idea to drop nearly $400 on a graphics cards in the hope that better performance will come later. But that’s been the trend in past releases of new architectures.
Our test bed is a 3.33GHz Core i7-975 Extreme Edition in an Asus P6X58D Premium motherboard with 6GB of DDR3/1333 and an 850TX Corsair PSU. The OS is 64-bit Windows Ultimate. All games are run at 1920×1200 with 4x AA.
Performance Results (DX10)
As we look at three different DirectX 10 games, we can see that the Radeon HD 6970 generally struggles to keep up. While Cayman ekes out a narrow win in Crysis over the GTX 570, it’s 10 fps behind Nvidia’s latest card in Just Cause 2 and falls behind a bit in the Far Cry 2 (Ranch Long) test.
That’s not to say performance is poor, mind you. Performance is actually quite good, but if both cards are in the same price range, the GTX 570 might be a better choice.
Performance Results (DX11)
We’ve emphasized DirectX 11 in this round of benchmarks, because both Nvidia and AMD are touting DX11 performance. AMD’s marketing is pushing the 6000 series as the first of the “second generation” DirectX 11 architectures. Let’s see how the new progeny from AMD’s graphics division fares in a variety of DX11 tests.
First up are a pair of synthetic tests, 3DMark 2011 and Unigine Heaven 2.0. Both are DirectX 11 benchmarks that hammer GPUs, though Unigine tends to focus narrowly on tessellation performance while Futuremark’s 3DMark 2011 pushes a broader DX11 performance envelope.
The heavy tessellation in Heaven’s Extreme mode gives Nvidia’s cards a big edge. But if we look at 3DMark 2011, which uses a wide array of DX11 features, Cayman handles itself pretty well. The HD 6950 is almost a match for the GTX 570, while the HD 6970 beats it by a wide margin. Of course, Nvidia’s pricey GTX 580 continues to rule the single GPU roost.
DX11 Game Benchmark Results
We’ve dropped the older DiRT2 benchmark in favor of the newer F1 2010 test. Interestingly, F1 1010 doesn’t use hardware tessellation. But it does use DirectCompute shaders to accelerated Gaussian blurs for features like lens flares and HDR (high dynamic range) bloom. In addition, the game users HDR lighting using full FP16 floating point formats. F1 2010 also uses additional features in Shader Model 5.0 to achieve better realism using soft shadows.
As we can see, the GTX 570 and the Radeon HD 6970 trade wins, so we’ll call it a wash. If AMD’s intent was to position Cayman against the GTX 570 all along, it looks like that positioning holds mostly in DirectX 11. The Radeon HD 6950 does reasonably well, but it’s clearly a more modest part, as befitting its lower cost.
Let’s take a quick look at power draw. The Radeon HD 6970 has one 8-pin and one 6-pin power connector, while the HD 6950 has two 6-pin connectors.
Despite this, the Radeon HD 6970 still consumes less power – albeit only a small amount — than Nvidia’s GTX 570 in our full throttle test, while the HD 6950 practically sips power. Idle power draw is pretty much a wash, however, with minimal differences among the two Cayman cards and the GTX 570.
Final Thoughts: Needs Improvement
Cayman looks like an interesting architecture. AMD’s approach of simply “doubling everything” scores some significant performance gains over the HD 6800 series, but seems like an uncharacteristically brute-force approach for a company that’s been priding itself on its focus on efficiency.
That said, Cayman looks to be priced right. The $369 suggested price for online retailers for the reference version is in the same ballpark as Nvidia’s GTX 570, albeit slightly more than Nvidia’s reference price. The HD 6950 is only a little slower, and you may see models under $300, which puts it at about $50 more than a Radeon HD 6870. However, our testing indicates that an HD 6950 is likely to be notably faster in shader-heavy games.
If anything is holding back Cayman, it’s the presence of only 32 ROPs – the same
as the HD 6800s. That means that older games that are more texture effects heavy and use less shader code will likely only be marginally faster than the 6870. But going forward, Cayman is a more robust solution than the 6800s. However, Nvidia still holds the single GPU crown, with the GTX 580. Will AMD’s dual-GPU card, code-named “Antilles”, give AMD the overall performance crown? We’ll just have to wait and see.