"Be not overcome with evil, but overcome evil with good."
- Romans 12:21
More pages: 1 2 3
MetaBalls demo updated
Wednesday, October 19, 2005 | Permalink

I just recently got myself a new dual-core Athlon64 3800+, so naturally I had to try taking advantage of this extra processing power. Most of my demos are more GPU limited than CPU though, but MetaBalls is an exception to that rule. So I implemented threading into it to improve performance. Everything couldn't be parallelized though, so the gain is fairly moderate, about 15-20%. Another reason for the moderate increase is that the bottleneck apparently shifted from computations to cache/memory. The gain is larger when running the slower FPU path than the 3DNow path, about 25%.
This CPU also supports SSE3, so naturally I threw in an SSE3 path as well. 3DNow is still a tiny bit faster though.

Name

Comment

Enter the code below



Bj�rn
Thursday, October 20, 2005

Hey nice! Congratulations on your new CPU! I want one of those myself *drool*

Anyhow, has this anything to say on my "now-outdated-underpar-performing" A64 3400+?

Cheers!

Anonymous
Friday, October 21, 2005

"Anyhow, has this anything to say on my 'now-outdated-underpar-performing' A64 3400+?"
-Bj�rn

Your A64 3400+ is an AMD Semron, not an Athlon 64. It was never on-par to ever be able change to underpar status.

Sunray
Friday, October 21, 2005

The coolest would be to compute them entirely on the GPU (via raytracing).

Looks like this: http://jmb.mine.nu/~kma/x/metaballs_xvid.avi

You know who!
Friday, October 21, 2005

Hello hello! I will write in english this time, hope you understand. The thing is that this message is to Humus if you didn't know?! :-P

Whooohooo!!!
Maybe I will go and see Howard Jones in Ume�!!!!!
Hug to you Humus and take care of yourself.

N

Nitro
Friday, October 21, 2005

Very interesting piece of code. I get about 13% speed increase on my 2.8 GHz P4 HT. But what's weird is that the FPU path is faster that SSE (117 fps to 104 with 1 thread and 132/117 with 2 threads). Is the new VS2005 optimising so much? I recompiled the demo on VS2003 and the results are rather slower. 1 tread FPU/SSE: 54/97, 2 threads even slower: 43/93. Does anybody know why is it so?
Btw the old demo ran 80 fps on FPU and 115 on SSE.

Humus
Friday, October 21, 2005

Hello to N! S� jo k�n spik inglich?

Nitro,
yeah, I was a bit surprised how much VC2005 could optimize the FPU path. There's a new compiler option where you can set FPU to "fast", which ignores some of the IEEE standards and so on, much like -fast-math in GCC. It boosted performance quite a bit. I also did some algorithmical improvements. I now bake in the radius in a preprocessing step, and I also defer the division to a final division in the end by refactoring instead. I didn't try doing the same in the other paths, but they have fast RCP instructions anyway, so I don't think it's going to improve performance a lot if at all.
It was interesting to see it improve performance almost as much on HT too. I tried it at my work laptop and saw a pretty decent increase too. But that further points to that it's the cache/memory that's the bottleneck, rather than computation power.

Nitro
Saturday, October 22, 2005

Thanks for explanation. I had off all optimizations when I recompiled it, so now it runs 75 fps on FPU and 113 on SSE (1 thread) and 122 on SSE (2 threads). FPU with 2 threads still slower - 53 fps. Maybe it's just because I have only HT. I need a new CPU
Btw Humus, I wonder how fast does the demo run on your new Athlon64.

eXile
Saturday, October 22, 2005

Nice to see that, Humus! It seems that the future are dual core processors... but they are too expensive now for me

Will be there anytime more indoor demos? Perhaps radiosity could be a nice challenge for you or realtime ambient occlusion

More pages: 1 2 3