"The hardest thing to understand in the world is the income tax."
- Albert Einstein
More pages: 1 ... 5 6 7 8 9 10 11 12 13 14 15 ... 21 ... 31 ... 41 ... 48
Vacation
Monday, July 5, 2010 | Permalink

Tonight me and my fiance will get on the plane for China. So for the next two weeks this blog is likely to be silent. In the meantime, check out my article "Making it large, beautiful, fast and consistent – Lessons learned developing Just Cause 2" in GPU pro. I just got my copy today, so I guess I'll have some reading on the flight.

[ 5 comments | Last comment by Humus (2010-07-23 17:57:09) ]

GPU vs. CPU
Wednesday, June 23, 2010 | Permalink

Via Rage3D I found this Nvidia blog post, which I found somewhat amusing. Although after a brief look at the actual paper I give Intel a bit more credit than the Nvidia spin of it. Still, even then, the Intel paper concludes that the previous generation GPU is 2.5x faster on average.

Anyway, I find the GPU vs. CPU war not so interesting, because my prediction is that we still need to have both paradigms around. No model is going to "win", so I don't think Intel needs to be so defensive, nor do I believe in Nvidia's prediction that "the piece of hardware that runs sequential code will shrink to a tiny dot swimming in an ocean of ALUs" (I forgot the exact wording, but something like that). I don't believe in Nvidia's prediction because of Amdahl's law. At least when speaking of games, there will always be some sort of critical path through the game update code where each step needs input from previous steps. So just slapping on more cores will not make things much faster and switching to the Larrabee model for CPUs is likely to make things slower even if you get an order of magnitude more raw throughput power. I believe the model for future CPUs is something like what the PS3 has, with one main CPU and 6 smaller throughput oriented SPUs. Even in the future we will need at least one, but preferably two or three cores optimized for quickly crunching through sequential code. Then a larger number of tiny throughput oriented cores next to it for parallel but fairly independent tasks. Then the GPU for graphics and a number of other embarrasingly parallel tasks. I don't think the GPU and CPU will meet anytime soon, although with more and more programmable GPUs and then stuff like Fusion I could imagine that the GPU and the SPUs might merge at some point, but I'm not convinced of that yet.

[ 11 comments | Last comment by Nuninho1980 (2010-08-04 13:39:03) ]

Dealing with uninitalized memory
Friday, June 18, 2010 | Permalink

A common source of undeterministic bugs that are hard to find is uninitialized variables. In a local scope this is typically detected by the compiler, but for member variables in a class you're typically on your own. You forget to initialize a bool and in 255 cases of 256 you get 'true', so the code might appear to work if that's a reasonable initial value. Until your bool ends up on a memory address with a zero in that byte.

One way to deal with this problem is to simply initialize all memory to zero first thing in the constructor. Unfortunately, that adds a runtime cost and code size cost. I figured someone must have attacked this problem before, and I'm sure someone did, but googling on it I haven't found much about it. So I came up with my own approach, which I'll be adding to Framework4. Basically it's a small class and a set of macros to simplify stuff. In my class definition I just tag the range of variables I want to check, typically all of them.

class MyClass
{
public:
    MyClass();
    ...
private:
    CLASS_BEGIN()
    // Variables go here
    CLASS_END()
};

This inserts a start and end tag into the class, which are basically two uint32.
In the constructor I do this:

MyClass::MyClass() : CLASS_INIT()
{
    ...
}

CLASS_INIT() will initialize all memory between the tags to 0xBAADCODE, before any other initialization happens.

At some point where I expect everything to be set up, for instance the end of the constructor or maybe after some call to Init() or whatever, I simply add CLASS_VERIFY() which will check that no memory is left as 0xBAADCODE. Also, it will check start and end tags to make sure they have not been touched, which will also detect common out of range writes.

Adding this to my simple test app I found a whole bunch of variables I didn't initialize. Most of them were simply unused though. I can't imagine what kind of bug could be avoided if something like this is added to the main system classes in a big scale project. And the best of all, this come at no extra cost at runtime because in final builds those macros are defined to nothing.

[ 21 comments | Last comment by niko (2010-11-12 03:00:35) ]

New DirectX SDK
Wednesday, June 9, 2010 | Permalink

The June 2010 DirectX SDK has just been released. I haven't given in a try yet, but there appears to be a few nice things in it, but not so much big news. This release also drops MSVC 2005 support while adding MSVC 2010 support.

[ 1 comments | Last comment by fmoreira (2010-06-12 22:55:34) ]

Latest Steam hardware survey
Saturday, June 5, 2010 | Permalink

I've touched on the Steam hardware survey a couple of time before, mostly related to JC2. This month's survey has another very interesting piece of data. Steam has only been available for Mac less than a month, yet MacOS now represents 8.46% of the market. That's an enormous accomplishment for the Steam system to bring on so many Mac users in so short time! The number of Mac users vs. PC users are already pretty much in balance with current Mac vs. PC market in general. My next wish is for Valve to expand the system to Linux as well. I don't think a lot of AAA titles would come to Linux anytime soon, but a lot of indie developers would love to explore that market. It may not be large enough to be worth the effort for AAA titles, but for indie games it would certainly be. As someone once explained, being a big fish in a small pond is not so bad.

It's also worth noting that Windows XP keeps dropping. The drop looks a bit unproportional because MacOS jumped in and stole 8.46% of the market, making all Windows versions drop their percentage, but the XP drop is large while other versions have modest drops. If we look at the Windows world only, XP now represents 36.5%, with Windows 7 beating it at 39.2% and Vista at 23.6%.

[ 0 comments ]

About crunch mode
Wednesday, June 2, 2010 | Permalink

When I was at ATI/AMD a couple of years ago I was working 7.5 hour days. During my more than three years there I can only remember working overtime once. This was due to an ISV needing urgent help with what they thought was a driver bug, just an hour or so before the end of the day on a friday, and they needed it fixed before the weekend. Turned out it was their fault, but anyway, that was the only time I stayed in the office a couple of hours extra. When I decided to switch to the game industry I worried quite a lot about things like overtime, because you know, I want to have a life too, and the game industry have a really poor record in this area. One of the main reasons I joined Avalanche Studios in particular was because they have an expressed policy basically stating that "overtime is a failure of the management". During my time here I can say that overtime has been rare. Yes, it has happened a few times, in relation to important deliveries. But generally it has been in moderate amounts and have only lasted a couple of weeks at most, and with employees able to plan their overtime freely. As we get paid for the overtime work, alternatively can swap it for vacation at a later time, and the company also provide overtime food in crunch time, it's not much of an issue when it does happen.

However, I keep hearing the situation is not as rosy in many other places. I believe the situation is better among Swedish developers than elsewhere, but I suppose that might have more to do with local laws than a better mentality. I think the situation has improved a lot though, especially after the famous "EA Spouse" story, but I still think there's a immaturity in the industry. Somehow game developers are perceived "from the top" as a bunch of 20 year old nerds that are all single and probably stay up all night anyway. Well, I'm 30. Many of my co-workers are married, have kids, and I'd say a majority are in a stable relationship with someone who might care if they can share dinner at home tonight.

I saw this article today named Why Crunch Mode Doesn't Work: 6 Lessons. I knew that productivity drops the longer people work. What I didn't know was that the common 40-hour work week was not some kind of arbitrary standard, but the result of research and studies done a century ago. Turns out this is the sweet spot for getting the highest output, at least for industrial jobs. Increase to 10 hours and you not just less per hour, but get less done in absolute terms. This raised a question though. Should a manager strive to get the highest possible output per employee, or the highest output per invested dollar. If the former, the 8-hour day is likely optimal, but if the latter, maybe going down to 7-hour or so would make more sense, because of increased productivity. Say you get perhaps 95% of the output at only 87.5% of the cost. Come to think of it, the 7.5-hour day at ATI was probably not just an arbitrary "bonus half hour free" to be nice to employees, but probably someone had a deeper thought there. Because this was only for office type employees, while people working in production had the typical 8-hour days. It was probably optimized for different variables by some clever dude.

[ 5 comments | Last comment by Rob L. (2010-06-07 15:28:20) ]

EDRAM
Saturday, May 29, 2010 | Permalink

It's funny how sometimes as technology develops what was originally a good can become a bad idea. EDRAM for video cards is such a thing. AFAIK no video card for the PC ever had it, but it's been used occasionally in some game consoles, most recently in the Xbox 360. However, I recall back when bitboys were trying to enter the PC video card industry using EDRAM based designs. Unfortunately they never succeeded and none of their products ever saw the light of the day. Had they been able to produce one at the time though, chances are it would have worked very well. However, how people render the frames today and back in the DX7 era is very different, which makes EDRAM far less ideal today. Back in the DX7 days you'd probably render to the backbuffer almost all the time, and given no shaders and as many ROP as there were pipelines the biggest bottleneck was generally bandwidth to the backbuffer. That's hardly the case anymore. Today shaders are long, bottlenecks are usually ALU or texture fetches, and even if you end up being limited at the backend you're normally not bandwidth bound but ROP bound.

Having worked a fair amount with the Xbox 360 the last 2.5 years I find that EDRAM mostly is standing in the way, and rarely providing any benefit. Unfortunately, you can't just render "normally" to a buffer in memory if you so prefer, nope, you always have to render to the EDRAM. Once you're done with your rendering the results have to be resolved to the video memory. So even if we assume we are never ROP bound and EDRAM gets to shine with its awesome bandwidth, it really would not buy us much. Each render target operation is immediately followed by a resolve operation copying the data to video memory. During this copying phase the GPU is busy just copying rather than rendering. If the rendering was targetting a video memory buffer to begin those writes to memory would be nicely interleaved with the rendering work that the GPU does and no resolve would be necessary, so once the rendering is done all that's needed is to flush whatever data is residing in the backend cache to memory and you're done.

Sadly it's not just that it doesn't really provide so much of a benefit as it might look on paper, but it also alters the rendering model that we are all familiar with and adds a bunch of new restrictions. Because EDRAM is still quite expensive in hardware it's not something we get an awful lot of. The Xbox 360 has 10MB. But if you render to the typical 1280x720 resolution with 2xAA, that's 14MB needed for the color and depth buffer. So this is generally solved by "tiling", which means you render for instance to the top of the screen first, then resolve, and then the bottom, and resolve. The DirectX9 for Xbox helps out a bit here to let you do this stuff quite automatically by entering a tiling section of the rendering, which is then submitted twice to the hardware, or how many times necessary depending on how many tiles are require for your render target configuration. Sounds fine huh? Well, until you want to squeeze in another render target operation somewhere in that stream. Say you want to apply some SSAO. You need the complete depth buffer for opaque stuff and then apply before rendering transparent stuff. SSAO can be quite expensive, so instead of using oodles of samples you probably want to take a few, then blur the result to half-res, and then apply that. Well, to blur you need to switch render target, which breaks the model. In order for everything to work you need to first resolve everything, do your SSAO passes, then copy that back to EDRAM again and enter a new tiling section. This is of course way too costly so nobody bothers doing that kind of stuff, but instead just try to live with the limitations imposed. So one may attempt to just resolving the depth-buffer without switching render target and then apply SSAO in one pass. Unfortunately, not even this is ideal. The problem is that when the top tile enters this code only the top tiles has been rendered, so the depth buffer texture it will use for the effect is incomplete. So when it samples the neighborhood around pixels close to the edge it will sample over the edge and get data from the previous frame. This often results in visible seams when in motion. It's common to copy the backbuffer for refraction effects. In many games on the Xbox 360 you'll see visible seems when traveling on water for this reason.

For the next generation consoles chances are we want 1080p, full HDR, at least 4xMSAA and probably many want additional buffers for deferred rendering or other techniques. I don't think it will be possible to embed enough EDRAM to fit all for many games, so if you're designing a future console now and are thinking of using EDRAM, please don't. Or at least let us render directly to memory. Or only let the EDRAM work as a large cache or something if you really want it.

[ 12 comments | Last comment by mark (2010-07-27 23:27:20) ]

Pac-man turns 30!
Friday, May 21, 2010 | Permalink

Happy anniversary Pac-man!

[ 3 comments | Last comment by fmoreira (2010-05-22 17:10:40) ]

More pages: 1 ... 5 6 7 8 9 10 11 12 13 14 15 ... 21 ... 31 ... 41 ... 48