"Freedom is not worth having if it does not include the freedom to make mistakes."
- Mahatma Gandhi
More pages: 1 2 3 4 5 6 7
New dynamic brancing demo
Thursday, July 1, 2004 | Permalink

Here's a demo that does dynamic branching, without the need for pixel shader 3.0, and still receives the huge performance boost that pixel shader 3.0 dynamic branching supposedly gives when utilized in a similar fashion.


Since it seems people have lost the ability to understand smilies I removed the first line of this post. Couldn't have imagined that people would feel so seriously offended by it, and I certainly didn't foresee the amount of crap I would recieved for it.

2004-07-04: Did a minor update to the demo to work around the performance drop issue on nVidia cards. The demo will now let you choose between doing a full stencil clear or simply zero it for surviving fragment. The former method seems to be required for this technique to see any performance gains at all on nVidia hardware, while both methods run fast on ATI card, the latter at a higher speed though. In order to get maximum performance it will choose zeroing as default for ATI and full stencil clear on everyone else. I don't know if that assumption holds true for other vendors though.



Enter the code below

Saturday, July 10, 2004

Yes, I use the latest version of your program, (disabled full stencil clear), DirectX 9.0c and Detonator 61.21. You wrote that there is no performance gain on nVidia hardware when full stencil clear is on. I think you mean it is neither faster nor slower. This is also the case for me (20 fps when dynamic branching is on compared 21 when off). So it should be a lot faster (probably 40+ fps) on those other nVidia cards becaused of the saved shader power.

Getting this to work would be very important for me because I am currently experimenting with a soft shadows algorithm based on your latest SelfShadowBump demo which could get nearly as fast as with vanilla shadow mapping. (now its only half the fps)

Saturday, July 10, 2004

there are never any performance guarantees whatsoever, regardless what you do, so that's not even an argument.

have you tried with full stencil clear enabled? On nVidia hardware you must use full stencil clear for dynamic branching to be faster.

Monday, July 12, 2004

Yes, but it was about the same weak performance (21 fps or so). I have also tried different detonators and DirectX 9.0b. Can you tell me how the fps are (under the different combinations) on the nVidia hardware you have tested?

Tuesday, July 13, 2004

I haven't tested any cards myself, but I've heard speedup similar to those on ATI cards reported, but I'm not sure if this also is true for the 5x00 series though.

Tuesday, July 13, 2004


Be careful I've seen people explicity relying on a feature which is not classified as generally supported or even undocumented. Early Stencil test is only one of them.

Tuesday, July 13, 2004

This demo doesn't act with a geforce 6800 ultra

Wednesday, July 14, 2004

the output is the same regardless, so who cares? It's just a performance option. There's no guarantee that hardware supports early z rejection either, but still developers do a depth only pass. Same thing.

Sunday, August 1, 2004

I thought you might be interested, but I gave this technique a try in my own code... And the results were fairly interesting...

I mostly use a Q3 bsp loader to test the lighting system... The lighting itself is fairly simple, per-pixel with specular, with very heavily optimized stencil-shadows. No fancy things like paralax.

I was expecting a measurable speed boost. I only managed this in one case...

in chiroptera, I managed to get a boost of around 15% when in a random coner looking at a wall with my spotlight.. but this was the only case.
In every other place in the map, I'd get a slight performance hit...

On a small simple map with no normal lights (just spotlight) I could break even, but not see a boost..

And on my mega-test, running nv15, the extra triangles required really hurt things.. Standing at the far coner, with nearly 300 lights on screen, with stencil-shadows, the polygon count was around 2 million for the scene... using this technique it jumps another half million triangles, which naturally cut the frame rate by about a third. (radeon 9500p, 640x480 -> ~6fps goes down to ~4fps)

so I'm wondering, in this demo, for each light do you simply render the entire scene again? I can understand this giving a noticable perforamnce boost, but it would seem in a real world case, where there is already a lot of culling going on the extra triangles simply shift the bottleneck...

if I were to pull out my old paralax shaders, (which used 5 layers - then there might be a noticable difference.. Just I don't know if I still have them since it's been over a year since I scrapped them since they were too slow...


still is a nice demo

have fun.

More pages: 1 2 3 4 5 6 7