"America did not win the Cold War by mistreating or killing communists."
- David Ochmanek & Lowell Schwartz
More pages: 1 2
EDRAM
Saturday, May 29, 2010 | Permalink

It's funny how sometimes as technology develops what was originally a good can become a bad idea. EDRAM for video cards is such a thing. AFAIK no video card for the PC ever had it, but it's been used occasionally in some game consoles, most recently in the Xbox 360. However, I recall back when bitboys were trying to enter the PC video card industry using EDRAM based designs. Unfortunately they never succeeded and none of their products ever saw the light of the day. Had they been able to produce one at the time though, chances are it would have worked very well. However, how people render the frames today and back in the DX7 era is very different, which makes EDRAM far less ideal today. Back in the DX7 days you'd probably render to the backbuffer almost all the time, and given no shaders and as many ROP as there were pipelines the biggest bottleneck was generally bandwidth to the backbuffer. That's hardly the case anymore. Today shaders are long, bottlenecks are usually ALU or texture fetches, and even if you end up being limited at the backend you're normally not bandwidth bound but ROP bound.

Having worked a fair amount with the Xbox 360 the last 2.5 years I find that EDRAM mostly is standing in the way, and rarely providing any benefit. Unfortunately, you can't just render "normally" to a buffer in memory if you so prefer, nope, you always have to render to the EDRAM. Once you're done with your rendering the results have to be resolved to the video memory. So even if we assume we are never ROP bound and EDRAM gets to shine with its awesome bandwidth, it really would not buy us much. Each render target operation is immediately followed by a resolve operation copying the data to video memory. During this copying phase the GPU is busy just copying rather than rendering. If the rendering was targetting a video memory buffer to begin those writes to memory would be nicely interleaved with the rendering work that the GPU does and no resolve would be necessary, so once the rendering is done all that's needed is to flush whatever data is residing in the backend cache to memory and you're done.

Sadly it's not just that it doesn't really provide so much of a benefit as it might look on paper, but it also alters the rendering model that we are all familiar with and adds a bunch of new restrictions. Because EDRAM is still quite expensive in hardware it's not something we get an awful lot of. The Xbox 360 has 10MB. But if you render to the typical 1280x720 resolution with 2xAA, that's 14MB needed for the color and depth buffer. So this is generally solved by "tiling", which means you render for instance to the top of the screen first, then resolve, and then the bottom, and resolve. The DirectX9 for Xbox helps out a bit here to let you do this stuff quite automatically by entering a tiling section of the rendering, which is then submitted twice to the hardware, or how many times necessary depending on how many tiles are require for your render target configuration. Sounds fine huh? Well, until you want to squeeze in another render target operation somewhere in that stream. Say you want to apply some SSAO. You need the complete depth buffer for opaque stuff and then apply before rendering transparent stuff. SSAO can be quite expensive, so instead of using oodles of samples you probably want to take a few, then blur the result to half-res, and then apply that. Well, to blur you need to switch render target, which breaks the model. In order for everything to work you need to first resolve everything, do your SSAO passes, then copy that back to EDRAM again and enter a new tiling section. This is of course way too costly so nobody bothers doing that kind of stuff, but instead just try to live with the limitations imposed. So one may attempt to just resolving the depth-buffer without switching render target and then apply SSAO in one pass. Unfortunately, not even this is ideal. The problem is that when the top tile enters this code only the top tiles has been rendered, so the depth buffer texture it will use for the effect is incomplete. So when it samples the neighborhood around pixels close to the edge it will sample over the edge and get data from the previous frame. This often results in visible seams when in motion. It's common to copy the backbuffer for refraction effects. In many games on the Xbox 360 you'll see visible seems when traveling on water for this reason.

For the next generation consoles chances are we want 1080p, full HDR, at least 4xMSAA and probably many want additional buffers for deferred rendering or other techniques. I don't think it will be possible to embed enough EDRAM to fit all for many games, so if you're designing a future console now and are thinking of using EDRAM, please don't. Or at least let us render directly to memory. Or only let the EDRAM work as a large cache or something if you really want it.

Name

Comment

Enter the code below



vince
Saturday, May 29, 2010

EDRAM truly is a pain - certainly my least favorite thing about the 360 (the relatively small amount of L2 for 3 cores might be #2).

One option is just not to do MSAA and do anti aliasing in a post process. If you're using floating point render targets for any given stage this may be the better option on the PS3 and D3D9 anyway (for those of us still stuck supporting D3D9).

If you go this route a lot of other things become simpler -- for example you can do a depth prepass, keep a full depth buffer in EDRAM at all times, but do tiling for just your color targets, which can be useful if you need MRTs for a g-buffer, for example.

Filip
Sunday, May 30, 2010

Oh you're so right.. the pain of "tiling", render-resolve logic and other stuff. I've been saying for years now that EDRAM is pain in the ass and a bad idea, but people usually don't agree.
The only good thing is the fill rate with big particles and similar stuff, but I'd still prefer writing to main RAM directly (and have some more shader transistors instead of EDRAM).

But who could predict that the deferred rendering was going to become mainstream?

valoh
Sunday, May 30, 2010

Imo the main design fail is that the EDRAM is GPU write only. Imagine you could directly sample from EDRAM, then the design would really shine or at least would make you forget the tiling pain. Do the image processing completely in EDRAM without touching the main bus and use it as large texture cache for texture heavy batches. Ok, and you would want hw resolves independent of the GPU, so that you can resolve and render in parallel.

Dan
Monday, May 31, 2010

I heard the 360 had certain shader instructions to write stuff directly to main RAM. Is that true? Couldn't it be abused for this purpose?

jon w
Monday, May 31, 2010

720p? 540p with 2xAA for the win!
(Yeah, I'm looking at you, Gears...)

Humus
Monday, May 31, 2010

Filip: Well, even with big particles the main bottleneck is going to be ROPs anyway, so spend those transistors on a few more ROPs instead please.

Dan: I don't know about main RAM, I would have to double check, but you can do arbitrary writes to video memory at least. We used that feature in JC2 for a depth buffer conversion pass to trade performance for memory. It was very tricky to transform from one swizzled surface format to another differently swizzled one and keep both reads and writes fairly linear in memory. If you don't do that, performance drops to nearly a halt. It took me a week to get it working, but in the end I was ended up very close to theorethical max bandwidth.

Reader
Sunday, June 6, 2010

> 720p? 540p with 2xAA for the win!
> (Yeah, I'm looking at you, Gears...)

What are you talking about? Every Gears of War game has been 720p with 2x MSAA. Maybe you are thinking of Halo?

mark
Saturday, July 10, 2010

PS3 MLAA, u don't want MSAA u want MLAA
http://www.eurogamer.net/articles/digitalfoundry-saboteur-aa-blog-entry

with MLAA the PS3 is actually beating the xbox360 EDRAM, and not by a small bit (look at god of war 3, and the future GT5)
they all use MLAA running on a single SPE.

More pages: 1 2