"We use the word 'politics' to describe the process so well: 'Poli' in Latin meaning 'many' and 'tics' meaning 'bloodsucking creatures'."

EDRAM
Saturday, May 29, 2010 | Permalink

It's funny how sometimes as technology develops what was originally a good can become a bad idea. EDRAM for video cards is such a thing. AFAIK no video card for the PC ever had it, but it's been used occasionally in some game consoles, most recently in the Xbox 360. However, I recall back when bitboys were trying to enter the PC video card industry using EDRAM based designs. Unfortunately they never succeeded and none of their products ever saw the light of the day. Had they been able to produce one at the time though, chances are it would have worked very well. However, how people render the frames today and back in the DX7 era is very different, which makes EDRAM far less ideal today. Back in the DX7 days you'd probably render to the backbuffer almost all the time, and given no shaders and as many ROP as there were pipelines the biggest bottleneck was generally bandwidth to the backbuffer. That's hardly the case anymore. Today shaders are long, bottlenecks are usually ALU or texture fetches, and even if you end up being limited at the backend you're normally not bandwidth bound but ROP bound.

Having worked a fair amount with the Xbox 360 the last 2.5 years I find that EDRAM mostly is standing in the way, and rarely providing any benefit. Unfortunately, you can't just render "normally" to a buffer in memory if you so prefer, nope, you always have to render to the EDRAM. Once you're done with your rendering the results have to be resolved to the video memory. So even if we assume we are never ROP bound and EDRAM gets to shine with its awesome bandwidth, it really would not buy us much. Each render target operation is immediately followed by a resolve operation copying the data to video memory. During this copying phase the GPU is busy just copying rather than rendering. If the rendering was targetting a video memory buffer to begin those writes to memory would be nicely interleaved with the rendering work that the GPU does and no resolve would be necessary, so once the rendering is done all that's needed is to flush whatever data is residing in the backend cache to memory and you're done.

Sadly it's not just that it doesn't really provide so much of a benefit as it might look on paper, but it also alters the rendering model that we are all familiar with and adds a bunch of new restrictions. Because EDRAM is still quite expensive in hardware it's not something we get an awful lot of. The Xbox 360 has 10MB. But if you render to the typical 1280x720 resolution with 2xAA, that's 14MB needed for the color and depth buffer. So this is generally solved by "tiling", which means you render for instance to the top of the screen first, then resolve, and then the bottom, and resolve. The DirectX9 for Xbox helps out a bit here to let you do this stuff quite automatically by entering a tiling section of the rendering, which is then submitted twice to the hardware, or how many times necessary depending on how many tiles are require for your render target configuration. Sounds fine huh? Well, until you want to squeeze in another render target operation somewhere in that stream. Say you want to apply some SSAO. You need the complete depth buffer for opaque stuff and then apply before rendering transparent stuff. SSAO can be quite expensive, so instead of using oodles of samples you probably want to take a few, then blur the result to half-res, and then apply that. Well, to blur you need to switch render target, which breaks the model. In order for everything to work you need to first resolve everything, do your SSAO passes, then copy that back to EDRAM again and enter a new tiling section. This is of course way too costly so nobody bothers doing that kind of stuff, but instead just try to live with the limitations imposed. So one may attempt to just resolving the depth-buffer without switching render target and then apply SSAO in one pass. Unfortunately, not even this is ideal. The problem is that when the top tile enters this code only the top tiles has been rendered, so the depth buffer texture it will use for the effect is incomplete. So when it samples the neighborhood around pixels close to the edge it will sample over the edge and get data from the previous frame. This often results in visible seams when in motion. It's common to copy the backbuffer for refraction effects. In many games on the Xbox 360 you'll see visible seems when traveling on water for this reason.

For the next generation consoles chances are we want 1080p, full HDR, at least 4xMSAA and probably many want additional buffers for deferred rendering or other techniques. I don't think it will be possible to embed enough EDRAM to fit all for many games, so if you're designing a future console now and are thinking of using EDRAM, please don't. Or at least let us render directly to memory. Or only let the EDRAM work as a large cache or something if you really want it.

[ 12 comments | Last comment by mark (2010-07-27 23:27:20) ]