"I don't see the logic of rejecting data just because they seem incredible."
- Fred Hoyle

Rules of optimization
Monday, July 2, 2018 | Permalink

So I tweeted a thing that got retweeted, liked and commented on a bit more than my usual stuff, so I thought I should expand a bit on my thoughts on it.

Rules of optimization:
1) Design for performance from day 1
2) Profile often
3) Be vigilant on performance regressions
4) Understand the data
5) Understand the HW
6) Help the compiler
7) Verify your assumptions
Performance is everyone's responsibilityhttps://t.co/CwwJffnHGz

— Emil Persson (@_Humus_) June 27, 2018


Basically Programming Wisdom (which btw often is a great sources for actual programming wisdom) posted a quote that basically suggested more or less that there’s never a good time to think about performance. Even experts should defer it until later! This is way worse advice than your usual “premature optimization is the root of all evil” tirade. The premature optimization at least addresses something that can be a bit of a problem, i.e. that programmers go ahead and obfuscate code in order to make it faster, without even looking at whether the code had any performance problem to begin with or verifying that the new code actually is any faster, and in the process of doing this introducing bugs and reducing readability. Yes, that’s a real problem and poor engineering practice. And that’s what Knuth was going on about in that quote. But the often omitted continuation is equally important: “Yet we should not pass up our opportunities in that critical 3%." Basically what he was saying is that you should profile your code first, see where your bottlenecks are, and then work to optimize those parts. That’s good advice and I agree.

But I would go a step further than Knuth’s famous quote. If you work in an environment where you have a decent performance culture, you may in fact be able to just fire up the profiler, find your top 3% functions (or top 3% shaders), optimize those and have a shippable product. But if performance work has been neglected for most of the product development cycle, chances are that when you finally fire up the profiler it’ll be a uniformly slow mess with deep systemic issues. Fixing your top 3% function may only lead to a minor observable performance gain, or none at all. You may find that your product won’t be able to reach a shippable state without large scale redesign, delays, budget overrun or cutting features. You don’t want to put your entire team on a 6 months crunch to salvage a fundamentally broken product when you had hoped a small performance push would suffice.

What you need is a performance culture. Understand that performance is a core feature of your product. Poor performance is poor user experience, or in the case of games, possibly unplayable and unshippable. When you design new systems, you need to think about performance from the start. Yes, you can hack away at prototypes and proof of concept implementations without being overly concerned about micro-optimizations. You can run with it for a while and get a feel for how things hold up. It’s fine, you don’t have all the answers at this point. But put some thought into what sort of workload it should be able to handle once it goes into production. Does it run fine with 100 items? How about 1,000? A million? Is it conceivable that there will be a million items? Don’t just integrate a prototype into mainline without first having thought about the scale it will run at eventually. The idea isn’t that you should optimize everything down to the last SIMD instruction and last byte of memory in the first pass. But you should prepare your solution to be able to operate at the intended scale. If the code will only ever need a handful of objects, don’t obsess over performance. Will there be hundreds? See if you can simplify your math. Maybe group things sensibly. Will there be thousands? You should probably make sure it operates in batches, think about memory consumption, access pattern and bandwidth, perhaps separate active and inactive sets, or hierarchical subdivision. Tens of thousands? Time to think about threading perhaps? Millions? Start counting cycles and you can essentially not have cache misses anymore.

So that’s the item 1 and I guess also item 8 on my list, and a pinch of 4 and 5 I guess. Yeah, the list isn’t sorted. My tweet wasn’t carefully crafted, but a spur of the moment frustration over terrible advice. Now, it’s also true that most of the time you work with existing code rather than writing new, and most days you’re probably spending more time debugging, maintaining and fixing code rather than optimizing. The idea isn’t that performance should override all these other concerns, but it’s a concern that is of equal value to the rest. You wouldn’t advice anyone to defer bug fixing to the end of the product cycle, and neither should you defer performance work until the end. And just like you couldn’t have a single engineer on the team be “the bug guy” who gets the fix all issues when the rest of the team implements features, neither is it a reasonable thing to have a single engineer be “the performance guy” who gets to fix all performance issues.

Performance is everyone’s responsibility and it needs to be part of the process along the way. And that leads to item 3, you need to take performance regressions seriously. If performance unexpectedly drops, it should be a top priority to investigate and fix. And to catch performance regressions you should have systems in place detecting it, such as auto tests with daily graphs over performance. But engineers on the team should also profile often enough to have a good idea in their head of the overall performance characteristics of their game or application as a whole, and of their particular systems in particular, so that whenever it looks different from the usual they know that either something broke or there’s a team member that needs an extra pat on their back for the awesome performance work.

Naturally, what priority you assign to performance may vary depending on what you are developing and how critical performance is and what your current state is. Not every piece of software needs a lot of performance work. Most of my tweet should be interpreted in the context of a team within an organization, and my perspective comes from a rendering engineer working in game development. In my line of work performance is absolutely crucial. I need to crunch through up to tens of thousands of items 60 times per second, and I need to shade millions of pixels in milliseconds. Maybe if you build applications for yourself to run on your own servers you may consider just throwing hardware at the problem. But that doesn’t work in game development, and probably not for most lines of business. The software is not for us and the hardware isn’t ours. We can throw hardware at internal tools, like build servers, to optimize our own processes, but customer facing products must run on the customers’ machines. If our PS4 SKU doesn’t run satisfactorily on a PS4, it’s not going to ship until it does.

That’s not to say that performance work is always needed. When I write my personal code I don’t necessarily spend a lot of time tuning performance. At least not more than necessary, or should I say, not any more than I enjoy doing. There’s time for just brute-forcing something. If I just want to generate a lightmap for a scene I can just write some dumb code that throws loads of rays into the scene until all shadows are smooth and nice. It’s fine if it takes 30 minutes if it only needs to run once. So unless I can optimize the code to be much faster in less than 30 minutes, it is probably more productive to just let it churn away on poorly optimized code. In practice though, to be honest, chances are that I’ll run it tens of times before I’m satisfied with the results.

Also, I do agree that if you are working with a system that is currently buggy, it’s not time to optimize it. But don’t defer performance work just because there exists bugs elsewhere in the code. It’s fine to optimize the occlusion culling system while there’s glitches in the character animation, but you should of course first make sure the culling logic is correct and you aren’t falsely culling objects before you spend any time tuning it.

So item 1, 2, 3 and 8 were mostly concerning performance culture. I would say those are the most important points. The rest mostly concern practical optimization tips when you actually sit down to do performance work. I don’t have a lot to add about 4 and 5, but regarding 6, helping the compiler. Yes, compilers can be super smart and employ some really clever tricks to make your code faster. But it’s important to know that the compiler often has its hands tied in a number of ways that makes it not able to optimize in the ways you expect. Many so call zero-cost abstractions can turn into very costly abstractions when you stack several layers of them. Or when you run in Debug. A compiler is limited not just by the semantics of the language, which can constrain it in surprising ways sometimes, but also in time. A compiler may try an optimization, which would’ve been successful if it went all the way, but after a number iterations bail out because it doesn’t seem to be converging, and then just generate the code instead of the inline constant you hoped would result. Verifying your assumptions is key. Look at the actual assembly code. Does it look like the compiler generate the code you thought? Did the zero-cost abstraction actually generate zero instructions? Did it fail to inline code that you wanted inlined? Did it inline code that’s pointless to inline?

There's a lot more to be said about practical optimization, but what I really wanted to say is that performance matters and neglecting it until the end of development cycle is a recipe for disaster.

[ 6 comments | Last comment by Mikkel Gjoel (2018-07-13 09:41:39) ]