Humus - News

More pages: 1 ... 4 5 6 7 8 9 10 11 12 13 14 ... 21 ... 31 ... 41 ... 48

New gallery
Saturday, September 4, 2010 | Permalink

These are pictures from our trip to Tenerife, where I engaged my fiance.

[ 2 comments | Last comment by Humus (2010-09-06 14:49:48) ]

More D3D v-table hacking
Friday, August 6, 2010 | Permalink

The trick in the previous post is as implied by its title and noted in the comments not going to give much return on the time invested in terms of performance, although it might give you a deeper understanding of the underlying mechanisms and costs involved in calling virtual functions. So it's more like an academic exercise, but nonetheless a cool thing to play around with.

It's also a prerequisite to the v-table hack I'm going to present in this blog entry, which on the other hand actually has some practical use.

Sometimes it's desirable to get a callback from the device when certain calls are made. Given that virtual calls are dispatched dynamically with the v-table it's possible to reroute the call somewhere else with a little hacking. Normally the v-table is located in read-only memory and thus will not change. And once a class has been instantiated its v-table pointer should not change for the life-time of the object, unless you have a nasty memory overwrite bug, or just hacking around like we're going to do.

While we can't modify the v-table contents we can easily overwrite the v-table pointer and set up our own custom v-table. The original v-table is readable, so the first thing we want to do is just copy the original v-table to our new one.

void **v_table = *(void ***) ctx;
// Copy v-table
memcpy(new_vtable, v_table, sizeof(new_vtable));
// Replace v-table pointer
*(void **) ctx = new_vtable;

The new_vtable would be declared like this:

void *new_vtable[115];

This array will of course have to be kept for as long as you keep the device context object alive, so a good choice is to put it next to the device context pointer, or for quick hacks, just make it static. Don't put it on the stack, it won't work. The number 115 is the number of virtual functions in ID3D11DeviceContext. This number could be found by simply counting them in the header, including all virtual functions inherited from base classes. Or the easy way, just find the last function in the header, make a call to that and place a breakpoint on it. When you hit the breakpoint, just switch to disassembly view and check what offset into the v-table it's using. For ID3D11DeviceContext that's FinishCommandList(), which you'll find it's at offset 0x1C8, leading to an index of 0x1C8 / sizeof(void *) which is 114, hence the v-table needs 115 elements.

If you run this code everything should just run as usual with no noticable change since we replace the v-table with an identical one. Now let's say we want to count how many DrawIndexed() calls we're making every frame. We could then just override the DrawIndexed() function with our own substitute and point the corresponding v_table entry to it.

uint g_DrawCount = 0;
void STDMETHODCALLTYPE DrawIndexedCallback(ID3D11DeviceContext *ctx, UINT IndexCount, UINT StartIndexLocation, INT BaseVertexLocation)
{
    ++g_DrawCount;
    MyDrawIndexed(ctx, IndexCount, StartIndexLocation, BaseVertexLocation);
    //ctx->DrawIndexed(IndexCount, StartIndexLocation, BaseVertexLocation);
}

...

new_vtable[12] = &DrawIndexedCallback;

Now every call made to DrawIndexed() will end up in DrawIndexedCallback instead. After incrementing our counter it would be nice to call the original DrawIndexed() function as well, so that things are actually drawn too. Notice that I'm using the function pointer approach from the previous blog post to call the original function. Why am I not using the code that's commented away? It would intuitively seem like a good idea, but actually since we rerouted the call, that would now call right back to DrawIndexedCallback() instead of the original code, so we'll be recursing infinitely, leading to a stack overflow in Debug and a hang in Release builds where the compiler is smart enough to realize it doesn't need to fetch and push arguments back on the stack but can just jump to DrawIndexed() directly.

So why use this instead of just using proper API design and hide the gory details of D3D calls under a thin interface? Just make your own DrawIndexed function which does what this DrawIndexedCallback() does and simply call that instead, right? Absolutely. I would most certainly recommend that over this hack whenever possible. However, what if you're using a third party library which is making calls on the D3D device directly? Using this trick you can track or even alter the behavior of the third party library. In fact, during the development of Just Cause 2 there was one instance where I had to resort to this trick. Using PIX I had concluded that a third party library was probably not working optimally, but I wanted to be sure that I'm not making any false assumptions, the library could after all be smarter than you think. So I overrided a few D3D functions to alter the behavior and could conclude that I was in fact right. After this we of course communicated with the third party and they were very helpful and the problem was promptly resolved. Naturally none of the v-table overriding code was left in the shipping product or even entered into source control, but it was nice to have that backdoor to override the library and prove my theories right.

I recommend this trick strictly as a development hack. I would argue against shipping any such code in a final product. The main reason is that while it'll probably work, there are no guarantees. While the size of the v-table is known for ID3D11DeviceContext, it's not known for any deriving classes that the D3D runtime is not exposing directly. They could in fact have more virtual functions than the base class, and theorethically those could be called from any of it's functions it shares with ID3D11DeviceContext, and such a call would crash with our replaced v-table since it would just grab whatever random pointer happens to be in the memory after the custom v-table. In practice, I've had no such problems with D3D device contexts though, but there's nothing saying that it wouldn't break when Windows 8 or 9 ships with a brand new implementation of the runtime components. To play it "safe" one could of course just make the custom v-table bigger and copy more stuff into it. Make it sufficiently large, like 256, and it's probably safe for the entire lifetime of the DX11 API including all future OSes' runtime implementations, but personally I would avoid shipping such code to the public.

[ 5 comments | Last comment by r2d2Proton (2010-08-20 00:05:57) ]

How to cut your D3D call cost by a tiny immeasurable fraction
Wednesday, August 4, 2010 | Permalink

One difference between D3D and OpenGL is that the former is using an object-oriented API. All API calls are virtual rather than being plain C calls like in OpenGL. The main advantage of this is of course flexibility. The runtime can easily provide many different implementations and hand you back any one depending on your device creation call parameters. The obvious example of that would be the debug and retail runtime. I suppose the D3DCREATE_PUREDEVICE in DX9 also handed you a different implemention than the standard functions. It's of course faster to have a D3D runtime function that's trimmed down rather than have the same function and look at IsDebug and IsPure booleans. The disadvantage of having virtual functions is that dispatching virtual function calls comes with a bit of overhead.

One thing to note though is that once you've looked up the actual address for a virtual function, the actual function call is no different than calling a non-virtual function. In fact, the only thing different from a plain C function or static member function is that you pass the this pointer as well. Consider the following D3D11 call:

virtual void STDMETHODCALLTYPE DrawIndexed(UINT IndexCount, UINT StartIndexLocation, INT BaseVertexLocation);

We can declare the equivalent C-style function pointer type like this:
typedef void (STDMETHODCALLTYPE *DrawIndexedFunc)(ID3D11DeviceContext *ctx, UINT IndexCount, UINT StartIndexLocation, INT BaseVertexLocation);

Then we can create a function pointer like so:

DrawIndexedFunc MyDrawIndexed;

And the call to ID3D11DeviceContext:: DrawIndexed(...) can be done with MyDrawIndexed(context, ...) provided that MyDrawIndexed has been loaded with the correct function pointer. Very straight-forward. So how do we find the function pointer? Virtual functions are looked up through a v-table, which is essentially a list of function pointers for all the virtual functions in the class. When a class which has virtual functions is created a pointer to a static v-table will be stored in the object. The C++ standard doesn't require any particular memory layout, or even require a v-table at all to solve the virtual function dispatch problem, so this code will be highly unportable. But if you're coding DirectX you're only building for Windows anyway and chances are you're using the MSVC compiler. In that case the v-table pointer will be the very first member of the class. Other compilers may do it differently. From what I gather it's common for Unix compilers to put the v-table pointer at the end of the class instead.

Given a ID3D11DeviceContext pointer, let's call it "ctx", the first thing we need to do it grab its v-table:

void **v_table = *(void ***) ctx;

This somewhat cryptic code basically just grabs the first 4 bytes (8 bytes on x64) out of the memory ctx points to, which will be the v_table pointer. Now we just need to know which entry in the table represents DrawIndexed. The hard way is to look in the the D3D headers. DrawIndexed is the 6th member declared in ID3D11DeviceContext, however it also inherits from ID3D11DeviceChild which has 4 virtual functions and which in turn inherits from IUnknown which has 3. So it's the 13th function, or should be at the index 12. So we can find the pointer like this:

DrawIndexedFunc MyDrawIndexed = (DrawIndexedFunc) (v_table[12]);

The easy way to figure this out is to just set a breakpoint at a regular DrawIndexed call and switch to disassembly view to see what code the compiler generated. It could for instance look like this:

mov eax, dword ptr [esi]
mov ecx, dword ptr [eax]
mov edx, dword ptr [ecx+30h]
push 0
push 0
push 1Eh
push eax
call edx
mov eax, dword ptr [esi]

Here esi points to the class which holds "ctx". So first it grabs the ctx pointer, then grabs the v-table from it and on third line looks up the function address at offset 0x30 in the v-table. 0x30 / sizeof(void *) is 12, so there's your index. The following three lines pushes the arguments to the function on the stack in reverse order, and then the this pointer. The this pointer, which in this case is "ctx", was fetched to eax on the first line.

Now what happens if we make this call through MyDrawIndexed? Well, this:

mov ecx, dword ptr [esi]
push 0
push 0
push 1Eh
push ecx
call dword ptr [MyDrawIndexed]

That's two instructions less. Woot!

Also note that the first call was daisy chaining the fetches. Two of those indirections were removed. It should be noted however that for this to work, the MyDrawIndexed variable must either be a static member function or a global variable. In other words, its address should be resolvable at compile time. If you only have one device context this should be no problem. If you are using multiple contexts, for instance for threaded rendering, you may not want to rely on both contexts having the same function pointers in its v-tables, although this is likely to be true if they were created with the same parameters. You could in that case simply store the function pointer next to the device context in whatever encapsulating class you have, like my "Context" class I referred to earlier. While this is not as optimal, it still cuts down some work:

mov ecx,dword ptr [esi]
mov edx,dword ptr [esi+4]
push 0
push 0
push 1Eh
push ecx
call edx

This is one instruction longer, although still one shorter than the initial code. The most important thing though is that this code still only has one level of indirection, whereas the original one has three.

I should also mention that C++ has some fancy syntax for pointers to C++ member functions. The underlying mechanism for how those work is somewhat different from how standard C functions work. However, using a static or global function pointer the actual code generated with that is the same as with a regular function pointer. If you put it next to "ctx" though it will generate one more instruction, or the same as the original virtual call. It's still only one level of indirection though, so it's still better. The actual function pointer appears to have a sizeof() of 16. I don't know what the unused bytes are for, only the last 8 are actually used in the call. The advantage of using C++ function pointers though is that you can assign to it by name instead of figuring out the v_table index, so it creates somewhat prettier code.

typedef void (STDMETHODCALLTYPE ID3D11DeviceContext::*DrawIndexedFunc)(UINT IndexCount, UINT StartIndexLocation, INT BaseVertexLocation);

DrawIndexedFunc MyDrawIndexed = &ID3D11DeviceContext:: DrawIndexed;

And then the fancy calling syntax:

(ctx->*MyDrawIndexed)(...);

So what does all this messing around actually gain you? Performance-wise probably somewhere between infinitesimal and nothing. The number of cycles spent inside the DrawIndexed call probably far outweights any slight gain in calling it. In fact, if you set a breakpoint and step inside the function you will find that you're stepping over a quite large number of instructions before you return. You'll also notice that DrawIndexed in turn calls a few other virtual functions under the hood. If anything, you gain insight into the underlying mechanisms of virtual function calls. Plus of course that messing with v-tables is a lot of fun.

[ 10 comments | Last comment by Humus (2010-08-06 19:13:00) ]

Latest Steam survey
Sunday, August 1, 2010 | Permalink

Highlights this month:
- Windows 7 64bit is now the most popular OS. Windows XP's decline is accelerating. All 32bit versions dropped and all 64bit versions increased their share, even WinXP 64.
- DX11 GPU adoption is accelerating, although still only 9.26% vs. 74.16% for DX10. AMD continue to dominate the DX11 market.

[ 0 comments ]

Yet another gallery
Sunday, August 1, 2010 | Permalink

More pictures! I present you with the Just Cause 2 release party, pics from Riga, and more.

[ 0 comments ]

OpenGL 4.1
Friday, July 30, 2010 | Permalink

I'm a few days late on this news, but earlier this week OpenGL 4.1 was released. There was a time when I had doubts in the future of OpenGL, but in the last couple of years the API has really made a lot of progress, so much so that I sometimes feel I can't keep up with it. I would say that by now it's back to where it used to be, namely a competent alternative to DirectX that's also cross-platform. Some feature missing perhaps, but at least as often there are bonus features not available in DirectX.

[ 6 comments | Last comment by Micke (2010-11-17 17:14:42) ]

Another gallery
Wednesday, July 28, 2010 | Permalink

Trying to catch up with time here, so I've uploaded another gallery of old pictures. With this I'm only about 5 months behind the times.

The time covered by this gallery includes when I met my fiance, so I think she's in like half the pictures

, but what the heck, she's pretty so why not?

I'll probably add another couple of galleries soon to get in sync with the times.

[ 2 comments | Last comment by Josef (2010-07-28 21:40:24) ]

New gallery
Sunday, July 25, 2010 | Permalink

I've uploaded a "new" photo gallery. I've been lagging with my photo updates, so the pictures are only about a year old, from last summer. But better late than never.

[ 0 comments ]

More pages: 1 ... 4 5 6 7 8 9 10 11 12 13 14 ... 21 ... 31 ... 41 ... 48


General
	News 3D Pictures Textures Articles Cool stuff
Other
	FAQ About Humus
Other 3D sites
	OpenGL.org GameDev.net Beyond3D Rage3D
Programming info
	OpenGL extension registry NeHe