"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."
- Douglas Adams
More pages: 1 2
Performance
Wednesday, July 22, 2009 | Permalink

The key to getting good performance is properly measuring it. This is something that every game developer should know. Far from every academic figure understands this though. When a paper is published on a particular rendering techniques it's important to me as a game developer to get a good picture of what the cost of using this technique is. It bothers me that the performance part of the majority of techical papers are poorly done. Take for instance this paper on SSDO. This paper isn't necessarily worse than the rest, but just an example of common bad practice.

First of all, FPS is useless. Gamers like them, they often show up in benchmarks and so on, but game developers generally talk in milliseconds. Why? Because frames per second by definition includes everything in the frame, even things that are not core components of the rendering technique. For SSDO, which is a post-process technique, any time the GPU needs for buffer clears and main scene rendering etc is irrelevant. But it's part of the fps number, and you don't know how much of the fps number it is. Therefore a claimed overhead of 2.4% of SSDO vs. SSAO, for instance, is meaningless. The SSDO part may actually be twice as costly as SSAO, but large overhead in other parts of the scene rendering could hide this fact. For instance if the GPU needs 12 milliseconds to complete main scene rendering, and 0.5ms to complete SSAO, that'll give you 12.5ms in total for a framerate of 80 fps. If SSDO increases the cost to 1ms, which would then be twice that of SSAO, the total frame runs in 13ms for a framerate of 77 fps. 4% higher cost doesn't seem bad, but it's 0.5ms vs. 1ms that's relevant to me, which would be twice as expensive and thus paint a totally different picture of the cost/quality of it versus SSAO. Not that these numbers are likely to be true for SSDO though. This paper was kind enough to mention a "pure rendering" number (124fps at 1600x1200), which is rare to see, so at least in this can we can reverse engineer some actual useful numbers from the useless fps table, although only for the 1600x1200 resolution.

So main scene rendering is 1000/124 = 8.1ms, which is definitely not an insignificant number.
With SSAO it's 1000/24.8ms = 40.3ms
With SSDO it's 1000/24.2ms = 41.3ms

So the cost of SSAO is 40.3ms - 8.1ms = 33.2ms, and for SSDO it's 41.3ms - 8.1ms = 34.2ms
Therefore, SSDO is actually 3% more costly than SSAO, rather than 2.4%. Still pretty decent, if you don't consider that their SSAO implementation obviously is much slower than any SSAO implementation ever used in any game. 33.2ms means it doesn't matter how fast the rest of the scene is, you'll still never go above 30fps. Clearly not useful in games.

Another reason that FPS is a poor measure is that it skews the perception of what's expensive and not. If you have a game running at 20fps and you add an additional 1ms worth of work, you're now running at 19.6fps, which may not look like you really affected performance all that much. If you have a game running at 100fps and add the exact same 1ms workload you're now running at 90.9fps, a substantially larger decrease in performance both in absolute fps numbers and as a percentage.

Percentage numbers btw are also useless generally speaking. When viewed in isolation to compare two techniques and you compare the actual time cost of that technique only, it can be a valid comparison. Like how many percent costlier SSDO is vs. SSAO (when eliminating the rest of the rendering cost from the equation). But when comparing optimizations or the cost of adding new features it's misleading. Assume you have a game running at 50fps, which is a frame time of 20ms. You find an optimization that eliminates 4ms. You're now running at 16ms (62.5fps). The framerate increased 25%, and the frame time went down 20%. Already there it's a bit unclear what number to quote. Gamers would probably quote 25% and game developers -20% (unless the game developers wants to brag, which is not entirely an unlikely scenario ). Now lets say the next day another developer finds another optimization that shaves off another 4ms. The game now runs at 12ms (83.3fps). The framerate went up 33% and the frame time was reduced by 25%. By looking at the numbers it would appear as the second developer did a better optimization. However, they are both equally good since they shaved off the same amount of work, and had the second developer done his optimization first he would instead have gotten the lower score.

Another thing to note is that it's never valid to average framerate numbers. If you render three frames at 4ms, 4ms and 100ms you get an average framerate of (1000/4 + 1000/4 + 1000/100) / 3 = 170fps. If you render at 10ms, 10ms and 10ms you get an average framerate of 1000/10 = 100fps. Not only does the latter complete in just 30ms versus 108ms, but it also did so hitch free, yet gets a lower average framerate.

The only really useful number is time taken. Framerate can be mentioned as a convenience, but any performance chapter in technical papers should write their results in milliseconds (or microseconds if you so prefer). There are really no excuses for using fps. These days there are excellent tools for measuring performance, so if the researcher would just run his little app through PIX he could separate the relevant cost from everything else and provide the reader with accurate numbers.

Name

Comment

Enter the code below



Humus
Saturday, July 25, 2009

Overlord, I've implemented profiling support in Framework4. In OpenGL I'm using GL_EXT_timer_query and timestamp queries in DirectX. I guess it could still be useful to make a demo of it. I'll add that to my ever expanding TODO list.

Rob L.
Sunday, July 26, 2009

Hi,

I just checked my card (s.b.) and it doesn't seem to support GL_EXT_timer_query. However, it supports GL_AMD_performance_monitor, which at the first glace over the description looks like it has some more functionality.

My card's OpenGL driver info:
ATI Radeon HD 4850
OpenGL version: 2.1.8673
CCC version: June 2009

What do you guys think about that? Do you have any tips and tricks to share?

Regards,

Rob

More pages: 1 2