More pages: 1 2 3
4 5 6 7 8 9
Framework 3 (Last updated: October 11, 2016)Framework 2 (Last updated: October 8, 2006)Framework (Last updated: October 8, 2006)Libraries (Last updated: September 16, 2004)Really old framework (Last updated: September 16, 2004)
Dynamic branching 3
Tuesday, April 25, 2006 | Permalink
DynamicBranching3.zip (1.1 MB)
This demo illustrates the benefit of dynamic branching in a per pixel lighting scenario. On the F1 dialog you can toggle dynamic branching, as well as shadows and single/multi pass. Branching is used at several places in the shader. It skips past instructions if the pixel is outside the light range, is backfacing the light, or is in shadow.
This demo should run on Radeon 9500 and on and GeForce FX and up. Only those cards supporting dynamic branching will see a performance benefit by enabling this path, that is X1300 and up and possibly also the GeForce 6 series.
Monday, April 3, 2006 | Permalink
Hair.zip (925 KB)
Pixel shader 2.0
This demo illustrates a hair simulation using R2VB. The hair consist of a large number of strands. Each strand is a line in a render target, and each node in the strand is represented by a pixel in the render target. The simulation is similar to a typical cloth simulation, except springs only connect in one direction, along the strands. The first node of every strand (the leftmost column of pixels) are locked to the moving balloon head. The rest of the hair will follow it wherever the head is bouncing. Collision is computed against the head and the floor.
Currently the demo draws the hair as lines. This has the effect that the hair will look thinner in bigger resolutions or when the head is close up. An easy fix for that would have been to scale the line width as appropriate. Unfortunately, D3D doesn't expose any way to draw wide lines. With additional work a shader could have expanded the lines into a triangle strip with adjustable width, but I was too lazy to do that.
This demo should run on Radeon 9500 and up.
Infinite Terrain II
Thursday, March 9, 2006 | Permalink
InfiniteTerrainII.zip (939 KB)
Required:R2VB or VTF
Pixel shader 2.0Recommended:R2VB
A bit over three years ago I made the Infinite Terrain
demo, which generated new terrain on the CPU as it needed it and uploaded to a vertex buffer. With R2VB there's now a possibility to generate terrain on the GPU, and that's what this demo does. First a pixel shader generates a noise function representing the height into a texture. This texture is then sampled in a second pass to generate the normals. The normal and height is written to a RGBA16F render target which is later used as a vertex buffer. If half floats are not supported in the vertex declaration RGBA32F is used instead, and if R2VB is not supported VTF is used instead.
This demo should run on Radeon 9500 and up and GeForce 6 series and up.
Monday, February 6, 2006 | Permalink
In the debate of MSAA vs SSAA most people have now accepted multisampling as the victor, at least in relation to the performance impact. Still, there are a few that love supersampling for its higher quality and its ability to reduce aliasing not just on the edges but also internally in the surfaces, whether they occur due to a badly applied texture filter or high frequency components in shader output. Some have argued that in the era of shaders and the aliasing that some of them bring supersampling will make a comeback. I'll give them half a point for that; however, the future still belongs to multisampling when it comes to driver/hardware side antialiasing. Global supersampling is just too expensive to be an option in most cases. That doesn't mean supersampling will never be applied, in fact, I believe it will be used more frequently in the future, but it will be implemented on the application side, and probably directly in the shader. This is what this demo shows.
The advantages of supersampling in the shader is that it gives the developer fine-grained control over where to apply it, to what degree and what sample positions to use, rather than just providing a global switch. In this demo there's one a bit aliasing prone bumpmap on the floor. The aliasing is showing up with the specular lighting, which is a fairly common scenario. So the app supersamples this particular material and nothing more. The walls are not supersampled, neither is the skybox and certainly not the GUI. Furthermore, the shader only supersamples the specular lighting. The diffuse lighting does not have a problem with aliasing for this bumpmap, so it's not supersampled, nor is the lightmap, base material and so on. Additionally dynamic branching can be used to shave off even more of the work.
This fine-grained selection of course means the performance loss is signficantly less than regular supersampling. A driver side 4 sample implementation would normally be in the range of 1/4 the original speed or even less, plus that it uses a good deal more memory for the framebuffer. This application side supersampling doesn't use any extra memory and is able to keep up with 77% of the original speed on my system.
How does it work? It's implemented using gradients. Remember that the gradient returns the difference between neighboring fragments in a fragment quad. The texture coordinate plus the gradient in x would land on the texture coordinate for the neighboring pixel to the right of it if it's one of the left column pixels in the quad. The math doesn't match perfectly in the other direction of course, but the disparency is normally not a concern. Multiplying the gradient with sample positions in the [-0.5, 0.5] range gives you texture coordinates for the samples, which you can then input to any function to compute whatever you want to supersample.
For the dynamic branching I compute the gradient of the center sample's specular. If there's a big change in the specular value between the pipes the supersampling will kick in, otherwise it will do with just the center sample.
This demo should run on Radeon X1300 and up and GeForce 6 series and up.
Use the 1-6 keys to select the amount of supersampling. The view wobbles a bit by default to show the aliasing a bit more clearly. To disable this, press 0. To toggle the use of dynamic branching, use the 9 key.
Monday, December 12, 2005 | Permalink
GameEngine2.zip (5.8 MB)
GameEngine2.7z (5.0 MB)
This demo shows a fairly simple way to handle large game worlds. The main problem with large game worlds, as opposed to the regular tech demos I normally do, is that you have way too much data to throw it all on the GPU. Drawing everything is not an option.
There are loads of different techniques out there to handle this, portals, BSPs, PVS etc. In this demo I chose to go with an exemplary simple technique, namely to split the scene into axis aligned cubes. Then in a preprocessing step I compute for each cube, what other cubes it can see. Only points in open space are considered. The preprocessing step is quite expensive. It took about an hour and a half to preprocess the scene in this demo. On the other hand, it's extremely simple to evaluate on runtime and makes checking visibility for dynamic objects easy. Just check what cube(s) it's in and check whether that cube is visible from the cube of the camera. It's a conservative check which could include some hidden objects of course, but that's true for most techniques. I don't know if this technique generally performs better or worse than other techniques, but it works well enough, and it has the most important attribute, it's scalable. It's not the size of the scene that matters, but the amount of simultaneously visible lights and surfaces. The cubes also allows you to cut down on the amount of drawn surfaces for a certain light. Only the cubes that are within light radius (or light frustum for spotlights) and are visible from both the light and the camera need to be drawn.
Each cube also stores the maximum visible distance. This way the Z can be as tightly packed as possible by using only as distant far plane as needed. The advantage of this is that it improves Z precision, but perhaps more importantly, improves efficiency of HyperZ. Speaking of HyperZ, by drawing the cubes in smaller to larger distance order, we get a very good front-to-back draw order in the initial Z-pass, which gives a healthy performance increase.
There are more optimizations possible using only these cube-to-cube visibility check, and not everything is implemented. You may find a few TODOs in the source, which I didn't bother finish up for this release. Maybe I'll add it later on.
The disadvantage of this technique would be that using conventional draw calls you either have to produce an index array dynamically or split things into a lot of draw calls even for the same light/material combination. Fortunately, OpenGL has a glMultiDrawElementsEXT call that can render a range of subsets of the index buffer in a single call. This way the number of draw calls normally stick to 80-150.
This demo should run on Radeon 9500 and up, and GeForce FX 5200 and up. You need to have OpenAL installed to run this demo. You can get it here
Note: The first time you run this demo there will be about 10s delay (depending on your CPU speed) as it precomputes some data (I didn't want to bloat the download). For all runs after that it will use cached results and start much quicker.
Monday, September 5, 2005 | Permalink
HDR.zip (2.9 MB)
HDR.7z (2.4 MB)
This is an HDR (high dynamic range) rendering demo, complete with the mandatory butter-on-my-glasses blur effect.
The main scene is first rendered to an RGBA16F texture. For the blur effect it's downsampled and converted to a fixed point format. This texture is then blurred in several steps, with each step sampling at twice the size as the previous step. The HDR assets, which is actually only the skybox, an RGBE format is used. The RGB components are stored as DXT1 and E as L16. This cuts down the storage space (and download size!) and memory bandwidth requirement to less than 1/3 of the cost of RGBA16F with no perceivable loss of quality. This also allows for filtering all the way back to R300.
This demo should run on Radeon 9500 and up and GeForce 6800 and up.
Alpha to coverage
Thursday, June 23, 2005 | Permalink
AlphaToCoverage.zip (921 KB)
One of the weaknesses of multisampling compared to supersampling is that it doesn't work too well with alpha testing, a technique that unfortunately many games still use as a replacement for real geometry. The effect is that the edges created by alpha testing aren't antialiased. The proper solution is of course to alphablend, but that means the transparent or masked objects need to be sorted in a back to front order, which can be costly and inconvenient. But there's another solution that doesn't need depth sorting and properly antialiases alpha masked surfaces, namely alpha-to-coverage. This works by sampling the alpha and interpret it as how much it covers the pixel, and then the result is dithered and distributed to an approriate number of multisample samples. So if you're using 6x multisampling and the incoming fragment's alpha is 0.5 it will be deemed to cover three samples, which will then receive the fragment data. When the multisample buffer is resolved this means it will be blended with the background which will be written to the remaining samples. It is a bit of a hack but actually works very well in practice. In fact, it often works better than supersampling, since it's using the alpha value directly rather than checking against a number of thresholded alpha values, and thus doesn't have the flicker and discontinuity problem that often occurs even with supersampling when the texture is minified a couple of mipmap levels. When magnifying the texture it results in blurrier edges though, which is also the case with alpha blending. To solve that problem this demo also implement a technique that boosts the alpha contrast around 0.5 when the texture is magnified so that the [0, 1] range of alpha values spans over the width of a pixel. To figure out how much the texture is magnified another texture is looked up with a texture coordinate that's multiplied with the size of the base texture. Each mipmap level contains the size of that mipmap level. So if the texel of the base texture is 20 pixels wide in screenspace, contrast is boosted 20x. This makes the edges equally sharp as with alpha testing, but the look properly antialiased.
To compare the results to alpha testing you can toggle between the two methods on the F1 menu.
Sunday, June 19, 2005 | Permalink
HSRBench.zip (201 KB)
This is a small benchmarking utility to show the benefits of HyperZ and similar techniques. It's basically a replacement for the HSR test in my old "GL_EXT_reme" benchmark, and the simple test is pretty much the same. I got a report though that GL_EXT_reme didn't work on some newer nVidia drivers, and I can't debug it without one of their cards, so I thought it might be a good time to write a new one. The HSR test was basically the only test that I still find interesting though, and the simple texturing isn't the most interesting thing these days, so I added a complex shader mode too that runs a fairly long shader. This is where the benefits of early Z culling techniques really shows through, so I thought it would be interesting to show that too.
I also added a number of configuration options, so you can select what draw mode to run and the amount of overdraw. Results are appended to the result.log file in the app's directory.
And of course, the disclaimer: I am an employee of ATI and I'm not trying to hide that fact. Keep that in mind and take this benchmark for what it's worth. I believe it to be a good and valid synthetic benchmark though, and haven't even tested it on any other cards than my own, so I don't even know which IHV will come out ahead in it. It's actually more interesting for comparing a card against itself through the different test to see efficiency of early Z culling hardware, rather than comparing IHVs against each other. Like most of my other work, it's open source, so you can judge it that way too.
This is also the first application I release that's based on a new framework I've been working on for a while. So you can see there's a new "Framework 3" link.
More pages: 1 2 3
4 5 6 7 8 9