"A fool and his money get a lot of publicity."
- Al Bernstein
More pages: 1 2
Whoha, I found a compiler bug
Thursday, February 18, 2010 | Permalink

I gave Visual Studio 2010 RC a few quick runs and while doing so I noticed that it doesn't always generate optimal code with the StringHash code I mentioned recently. Often it behaves just like you expect and outputs the constant directly, but it happens that it generates some clearly suboptimal results. This is a call to Context::SetConstantBuffer():

mov eax, dword ptr [ebx+9D0h]
push eax
push ecx
mov eax, esp
mov dword ptr [eax], 0
mov dword ptr [eax], 54h
mov dword ptr [eax], 541514h
mov dword ptr [eax], 29C53055h
mov dword ptr [eax], 77DBE55Eh
mov dword ptr [eax], 647B726Bh
mov dword ptr [eax], 2CCC28C8h
mov dword ptr [eax], 2F060985h
mov dword ptr [eax], 9C015834h
mov dword ptr [eax], 0BC88B513h
mov dword ptr [eax], 1AB79019h
mov dword ptr [eax], 23457696h
mov dword ptr [eax], 24AE2F4Ch
mov dword ptr [eax], 3629A415h
mov dword ptr [eax], 0F8546197h
mov dword ptr [eax], 7E5B047Ch
mov dword ptr [eax], 1CE21AF8h
mov dword ptr [eax], 369CA37Ah
mov dword ptr [eax], 14063B6Fh
mov dword ptr [eax], 28F7A0BFh
push esi
mov dword ptr [eax], 0B5AF8F68h
call Context::SetConstantBuffer (404750h)

Clearly there's a lot of superfluous writes that should have been optimized away. At first I thought this was a new bug in Visual Studio 2010, so I tried it in Visual Studio 2008 and it did the same thing. It's unclear why this happens, or what triggers this behavior, but I just noticed it happening in a certain location in my program.

So I rewrote the StringHash() constructor in this way, which appears to fix the problem:

StringHash(const char (&str)[4])
{
    uint32 hash0 = 0;
    uint32 hash1 = hash0 * 65599 + str[0];
    uint32 hash2 = hash1 * 65599 + str[1];
    m_Hash = hash2 * 65599 + str[2];
}

Name

Comment

Enter the code below



FMoreira
Thursday, February 18, 2010

that assembly code results from a release build?

I've been doing some performances tests using both C++ and Assembly (SIMD) and some times the compiler ends up doing something closer to what you've just described. when I compile in release mode it generates correct code!
I also noticed that 64bits that the code that VS2010 RC generates is much more faster than the 32bits code. Even more faster than hand-tuned assembly...

these conclusions were all taken from some performances tests that I did with 4x4 matrices operations.

FMoreira
Thursday, February 18, 2010

"I also noticed that 64bits that the code that VS2010 RC generates is much more faster than the 32bits code."

should be read: I also noticed that the 64bits code that VS2010 RC generates is much more faster than the 32bits code.

Arseny Kapoulkine
Friday, February 19, 2010

First rule of member function optimization: never access (r/w) your class members in an inner loop unless you must. This is a hidden indirection, therefore it contains implicit aliasing and god knows what else. In some situations like this, compiler can produce good code, but the probability of failure is higher.

You can change your ctor to operate as before, but store all results in a local, i.e.

uint32 hash = 0;
hash = hash * 65599 + str[0];
... etc
m_Hash = hash;

It will likely solve the problem.

Martin Vilcans
Friday, February 19, 2010

At least it doesn't generate invalid code, which has happened to me at least once in Visual C++ 2.0. (Way back.)

Humus
Friday, February 19, 2010

FMoreira: Yes, that's a Release build. Debug builds generate function calls building the StringHash class and everything. That 64bit is faster could very well be due to double the amount of registers available, so far fewer reads/writes to the stack. I did some SIMD optimizations at work which dramatically improved our performance, which mostly was about keeping things in registers and avoid the stack.

Arseny Kapoulkine: Yes, that's true. Although I would have expected that the compiler would do a final pass over the generated code and eliminate sequential writes to the same address.
Using a single local variable worked too, so I'll use that instead since it looks better (and is less prone to copy-paste errors).

Martin Vilcans: Yeah. It was just a coincidence that I noticed this, because the code was working. I was looking at other stuff when I suddenly saw that chunk of code above the code I was really looking at.

Magnus F.
Saturday, February 20, 2010

Hi,
will the new framework4 abstract OpenGL 3.x(2) and D3D11? If so, i would really like a look at your abstraction, because i am working on a similar project right now and i always get the "if i want to have this (very rare) feature, how can i abstract it for both" illness .

keep up the excellent work!
M.

Keldor
Tuesday, February 23, 2010

That looks a lot like the code OpenCL generates for my kernels... :-P

Anonymous
Friday, March 5, 2010

Did you report it to Microsoft? For VS2010, they seem to be very responsive to bug reports, and even if this particular one might not get fixed in the release, I'm pretty sure they would try to fix it in the SP1.

More pages: 1 2