"Projects promoting programming in natural language are intrinsically doomed to fail."
- Edsger Dijkstra
More pages: 1 ... 11 ... 13 14 15 16 17 18 19 20 21 22 23 ... 31 ... 41 ... 48
Random google result
Saturday, March 28, 2009 | Permalink

Why hasn't Nvidia marketed this product more?

[ 4 comments | Last comment by ABC (2009-06-01 08:26:45) ]

OpenGL 3.1 released
Wednesday, March 25, 2009 | Permalink

Press Release

[ 5 comments | Last comment by Overlord (2009-03-27 20:31:16) ]

New DirectX SDK
Tuesday, March 24, 2009 | Permalink

The DirectX March 2009 release has been released.
Download here

[ 4 comments | Last comment by Humus (2009-03-25 21:33:49) ]

For dust thou art ...
Sunday, March 22, 2009 | Permalink

... and unto dust shalt thou return.

In loving memory (1GB) of my Radeon HD 3870x2 who passed away this afternoon.

After suffering experimental code, beta SDKs and many driver resets for a long time, it finally couldn't handle it anymore. It walked many code paths that no one walked before.

It was 1.2 years old.
Rest in peace.

[ 12 comments | Last comment by Humus (2009-04-04 17:42:02) ]

A couple of benchmarks
Saturday, March 21, 2009 | Permalink

So I put my SSE vector class to the test to see if it would give any actual performance improvement over the standard C++ implementation I've used in the past. So I set up a test case with an array of 16 million random float4 vectors, which I multiplied with a matrix and stored into result array of the same size.

First I tested the diffent implementations against each other. I tested the code compiled to standard FPU code, and then with MSVC's /arch:SSE2 option enabled, which uses SSE2 code instead of FPU most of the time (although mostly just uses scalar instructions), and then my own implementation using SSE intrinsics. This is the time it took to complete the task:

FPU: 328ms
SSE2: 275ms
Intrisics: 177ms

That's a decent performance gain. I figured there could be some performance gain by unrolling the loop and do four vectors loop iteration.

Unroll: 165ms

Quite small gain, so I figured I'm probably more memory bandwidth bound than computation limited. So I added a prefetch and streaming stores just to see how that affected performance.

Prefetch: 164ms
Stream: 134ms
Prefetch + Stream: 128ms

Final code runs 2.56x faster than the original. Not too bad.

[ 6 comments | Last comment by Paul (2011-02-01 21:48:29) ]

Ubuntu rocks!
Thursday, March 19, 2009 | Permalink

It's been a while since I last tried Linux, for two reasons. First is the OpenGL debacle that has turned this API much less interesting to me, and with all my demos written in DX10 there hasn't been much need to ever boot into Linux. The other reason is that for a long time AMD didn't provide any drivers for my particular GPU, a Radeon HD 3870x2. Oddly enough, the release notes for the Linux drivers still say that the HD 3870x2 is not supported, although clearly it worked for me now, so I suppose this is just a documentation error. How long it's been working I have no idea.

Anyway, so I wanted to try some stuff in Linux recently, primarily the SSE intrinsics. In the past I've used Gentoo, which is more of a power user distribution, but I couldn't be bothered to use that again since it's quite a lot of work to set i up, besides it's not being updates as much as it used to be these days. Ubuntu on the other hand seems to be the most trendy distribution now, so I decided to give it a try. I have to say I'm quite impressed. It gave me a quite "Windows-like" experience. First I installed it on my laptop. It installed without problems and once I was in the OS it notifed me about updates which it downloaded and installed for me. Drivers for all my hardware was set up automatically and everything just worked out of the box. It even asked me whether it should install the proprietary drivers for my video card and set that up automatically for me as well. Once I got down to compiling stuff and needed libraries I was able to find and install everything I needed with a few searches in Adept.

In the past I would have said that the big problem for Linux is that it's an OS for geeks made by geeks and that for instance my mom who has big enough trouble with Windows would not be able to use it. And frankly, even if you're a geek, who wants to hack around in a bunch of config files anyway? I don't know how much of this is Ubuntu vs. Gentoo, or just the general progress of Linux in the last year or so, but with the experience I had I have to say that I would now be comfortable with recommending Linux to anyone. If you have never worked with either Windows or Linux before and you're starting entirely from scratch, I don't think it would be any harder to get started on Linux than on Windows.

Anyway, this was on my laptop. I installed it there primarily because I didn't think drivers for my HD 3870x2 existed, because that's what the driver release notes say anyway, whereas the Mobility HD 3650 should be fine. So given this positive experience I decided to give it a try on my desktop machine as well. Especially since I didn't want to have to copy files back and forth between my computers. Just for convenience I decided to use Wubi, which basically is an Ubuntu installer you run from Windows. It really can't be simpler than that. Download an exe, double-click and off you go. Once it's done you have a fully working complete Linux installation. Not a virtual machine like "Linux inside Windows" or anything like that, but a standard OS you boot into, except its file system is a large file on your Windows drive. And if you don't like it you can also uninstall it like any other application. With Wubi any form of inconvenince or danger of changing the partitions on your harddrive is eliminated. There's really no excuse for not giving Linux a try anymore. So I gave it a shot, and it worked fine, and to my surprise it even installed drivers for my GPU, which I thought didn't exist. And they worked fine too. The Wubi installation seems to differ from the regular installation though in a few ways. I found I had to change a few settings here and there, like changing from single-click to double-click for opening files and so on. Also it seems the packages Adept knows about differ. The standard installation had all dev packages I needed, whereas I had to resort to typing apt-get on the commandline to install some packages in the Wubi installation. I suppose Adept has some sort of index of packages it knows about and the Wubi installation only includes those a normal user would use, or something like that. I haven't used Ubuntu before or Adept so I really don't know. Other than those minor annoyances, the Wubi installation worked really fine too.

Two thumbs up for the Linux community.

[ 4 comments | Last comment by Humus (2009-03-21 00:03:14) ]

DPPS (or why don't they ever get SSE right?)
Monday, March 16, 2009 | Permalink

So in my work on creating a new framework I've come to my vector class. So I decided to make use of SSE. I figure SSE3 is mainstream now so that's what I'm going to use as the baseline, with optional SSE4 support in case I ever need the extra performance, enabled with a USE_SSE4 #define.

Now, SSE is an instruction set that was large to begin with and has grown a lot with every new revision:
SSE: 70 instructions
SSE2: 144 instructions
SSE3: 13 instructions
SSSE3: 32 instructions
SSE4: 54 instructions
SSE4a: 4 instructions
SSE5: 170 instructions (not in any CPUs on the market yet)

Why all these instructions? Well, perhaps because they can't seem to get things right from the start. So new instructions are needed to overcome old limitations. There are loads of very specialized instructions while arguably very generic and useful instructions have long been missing. A dot product instruction should've been in the first SSE revision. Or at the very least a horizontal add. We got that in SSE3 finally. Yay! Only 6 years after 3DNow had that feature. As the name would make you believe, 3DNow was in its first revision very useful for anything related to 3D math, despite its limited instruction set of only 21 instructions (although to be fair it shared registers with MMX and thus didn't need to add new instructions for stuff that could already be done with MMX instructions).

So why this rant? Well, DPPS is an instruction that would at first make you think Intel finally got something really right about SSE. Maybe they has listened to a game developer for once. We finally have a dot product instruction. Yay! To their credit, it's more flexible than I ever expected such an instruction to be. But it disturbs me that they instead of making it perfect had to screw up one detail, which drastically reduces the usefulness of this instruction. The instruction comes with an immediate 8bit operand, which is a 4bit read mask and a 4bit write mask. The read mask is done right. It selects what components to use in the dot product. So you can easily make a three or two component dot product, or even use XZ for instance for computing distance in the XZ plane. Now the write mask on the other hand is not really a write mask. Instead of simply selecting what components you want to write the result to you select what components get the result and the rest are set to zero. Why oh why Intel? Why would I want to write zero to the remaining components? Couldn't you have let me preserve the values in the register instead? If I wanted them as zero I could have first cleared the register and then done the DPPS. Had the DPPS instruction had a real write mask we could've implemented a matrix-vector multiplication in four instructions. Now I have to write to different registers and then merge them with or-operations, which in addition to wasting precious registers also adds up to 7 instructions in total instead of 4, which ironically is the same number of instructions you needed in SSE3 to do the same thing. Aaargghh!!!!

[ 9 comments | Last comment by JKL (2009-04-07 23:15:21) ]

OMG, Multi-Threading is Easier Than Networking
Sunday, March 15, 2009 | Permalink

A friend sent me a link to this Intel paper today:
OMG, Multi-Threading is Easier Than Networking

If you think the title is somewhat odd for a technical paper, wait until you read the actual paper. It contains cats. What more can I say?

Content-wise the paper is an excellent read for anyone who wants to get started with multi-threaded programming. I have to say that the lack of academic mumbo jumbo is really refreshing, which alone made the paper well worth the time to read, even though it didn't contain a lot of unique information. I wish more papers were written like this. Short and to the point. Lots of info that's instantly useful. Unlike a lot of the academic papers that seem to be written primarily with the intent of making the author seem good.

[ 1 comments | Last comment by Jackis (2009-03-16 12:05:30) ]

More pages: 1 ... 11 ... 13 14 15 16 17 18 19 20 21 22 23 ... 31 ... 41 ... 48