Thursday 4 May 2017

Compiler investigations

As an impatient programmer, I've always been interested in compile times. I believe that fast-iteration is something that makes a huge difference to my level of productivity. I guess we all have different work-flows and adapt to those, too. So I've been playing around with my relatively small-but-perfectly-formed WeaselTech codebase. It's relatively simple and handles modelviewing, sound, input, file loading, etc. Most of the stuff you typically need to build a game with.

Firstly, I don't consider the compile times to be onerous for this project - I actively maintain it to do The Good Things - but it's a reasonable enough size project and I'm curious about how much of the many things I've been told about compile times are true. Also I'm interested in decisions made in future projects.

tl;dr I managed to get my compile times down from around 13.1 seconds to 2.1s. That's 84% faster! If you're too lazy - or have better things to do - There's a convenient summary at the bottom of the page.



Here's some thoughts:

Most of the information I 'know' about compile times comes from hearsay and information garnered over multiple years I've put a few of those links below. Few people *think* they've got time to test this in the real world. In reality, I think we don't have time *not* to.


Testing Notes


  • My project is set up in a traditional way - no unity files - just a lot of individual .cpp and .h files in the project.
  • After each change, I do a 'Rebuild' and do four runs, discarding the slowest (usually the first) result to ensure warm caches and 'random' CPU spikes.
  • Code is kept on a HDD (as opposed to an SSD) though previous tests for this showed me that once the disk cache is warm the compilation time is the same. More on that to come.
  • Build timings are taken from using MSVC 2015 and timed using MSVC build timer.
  • I'm compiling the debug build. This is usually considerably faster than optimised builds.
  • I'm using /MP on the command-line to compile using multiple cores (more on this later).
  • I'm using an increasingly dated 4-core i5-2500K @ 3.3GHz.
  • I'm doing a rebuild on just my 'Tech' project. So these timings do not account for linker time!

Removing '#include <everythingandthkitchensink.h>'


It makes sense that the larger the file, the longer it's going to take to compile. For that reason I've always reasoned that most of the time is spent parsing massive #include networks so I addressed that first.

I have a reasonably tidy codebase but there were a few areas of improvement I'd seen before. I spent a fair amount of time doing the following:
  • Tried removing dead code (old platforms, deprecated libraries etc.) from the most common of headers, which made up to 500ms difference. I'm sure these will grow again over time, but housekeeping is good.
  • I moved some inline function definitions (for example for Vec3, Vec4) from the header file of their class to a separate .inl file. As these files are parsed so many times it should reduce the amount of work the preprocessor will have to do. It did. And I probably got around 500ms back again. Progress.
    • This works particularly well for template classes (HashTables, custom Arrays, etc.). As you'd expect, a template class syntax takes longer for the preprocessor to parse.
  • I had a very small number of system includes in commonly included files (math.h included from maths.h being the most notable). This probably saved about 0.1ms as in my case maths.h wasn't being included much.
  • I noted that old-fashioned header guards seem to be doing their job (both inside the .h file and in headers including a .h file). Notionally I can't see the performance degrade by adding a #include to the same file twice. 
    • I'm using header guards historically as some of the code was written a long time ago ... I've not compared with #pragma once as I don't live long enough to try everything.
  • Excluded some files from the build (d3d9 files in a d3d12 project aren't needed...). That's an easy win.
Removing unused headers didn't seem to make the huge difference I was expecting though there's a good argument that this is because my relatively small library is quite well behaved at what it includes compared to an older codebase.

I did try *adding* <windows.h> to every compilation unit via my common ForceInclude.h header, as that's the worst thing I could think of) and that did increase compile times (from about 13s to nearly 39s), so overloading the preprocessor is still a great issue as you'd expect - and if you didn't know - removing all system includes - from every headers is essential!


After all these changes, compile time had gone from 13.1s to 11ms. That's > 16% speedup. For anyone on a large codebase that's spending most of their time watching a compile that's a huge result already for under a day's worth of work. Based off other projects I've worked on, I'd expect much larger wins in many or most larger codebases.

If you're a manager then why would you not want all your programmers to be 16% more productive?

Compiler Settings


It seems that the MSVC (I'm using 2015 here right now) compiler default to /ZI (Edit and Continue). I'll admit to not using that feature in a long time now. I noted that disabling it saved 1s compile time.

Jeez.

I thought playing with the core count could help. Perhaps running more than the 4 processes (my PC only has four cores) via /MP would help, but no - 4 seemed to be the fastest. Here is a great article on parallel builds

I'd expect changing compiler settings could get further gains. I've also seen talk that MSVC 2017 is a different beast. In particular the optimising compiler is far slower, but generates significantly faster code in some cases.


Computer Setup


One thing I've always suspected is that computer setup accounts for a lot of problems. I closed Thunderbird, Chrome, the excellent Everything, Steam, GoG client, AutoHotKey and anything else that could feasibly (or not) be stopped while my computer could be running.

I saved maybe 0.5 seconds on this. Time now 10.5s from  the original 13.1s.

I do run a lot of tabs in Chrome so suspect that's one of the significant hogs.

I don't run standalone virus-checking software on this test PC, but tried disabling Windows Defender - the default MS virus checking software.

I saved another 2.5 seconds! Jeez!

Obviously turning off everything isn't entirely practical for everyone, but it's worth knowing the knock-on. I'd be interested to see how this is affected on a large code-base. I'd also be interested in knowing which virus checking software is the largest resource hog on compile times because this is clearly a significant time sink!

At this point the build time is around 8.3 seconds. That's > 37% perf improvement.



EDIT: 05-05-2017: It was pointed out on a thread in GameDev.net that I'd not added the source code as an exclusion to Defender. I've since tried this and it does indeed get back some but not all of the performance. From a time of 12.2s (I was unable to test in the same way) this went down to 11.7s with the project directory excluded vs 9.7s with Defender Real Time Scanning completely off. I suspect that it's still scanning cl.exe and likely various .dll files in the system directory (that you don't want to also exclude). It's definitely worth doing, though.

SSD vs HDD


So I said earlier I'm set up on a HDD. Many years ago I remember testing pushing the code to a RAM Drive. This was back in the days when spinning disks didn't spin so fast - and even then, the difference was at best negligible. So for this project I've always kept it on a HDD (at the start the compile times were only 13s so it's hardly a massive time-sink).

The reason for this is that Windows maintains an in-memory disk cache of recently loaded files. In effect, if you've recently loaded main.cpp from disk, the next time you load it, Windows won't need to even touch the disk and just pull it from main memory.

I thought I'd re-test the theory so I moved everything to a really quite fast SSD.

Nothing's changed. Full-rebuild times with a warm cache are nearly identical.

The major difference - and the reason you should keep (particularly large) codebases on a fast SSD is that old data is still fast ... so if you compile the game then run the game, clearing out your disk cache, then the cache hit of loading from disk is lower. It's also worth noting that having more memory than you ever think you'll need will help Windows have a larger disk cache. RAM is cheap. Buy lots.

One other thing I thought I'd try was to use a Windows Junction point via the Sysinternals tool. I noted that setting a junction point up to point at the SSD was notably slower than either the code being on SSD or even being on HDD - nearly by 0.5s or about 4% at this point. I've worked on multiple projects where the project directory is hard-linked to, say 'Z:' via subst or junction points.

I would, however suggest junction points if your SSD isn't large enough for the whole project and we want to store our data on a slow, spinny disk. So long as your game doesn't read a lot of small files.


Clang


Hey! Why not try another compiler? Well I did. MSVC 2015 has an option for Clang compile.

It took a few hours to get the compiler to compile my code. Possibly something of interest for a different post. Anyway, after removing some code that didn't seem necessary and ignoring a few warnings, errors (I'm only interested in compile time and happy to overlook code not actually being testable)

Clang compilation time was very slow in comparison. the 13s compile under MSVC went up to nearly 22.5s. I didn't follow this path too far though it's quite possible there's compiler settings that could improve the situation.

I should also note that previous projects using Clang the compile times were not a problem. Indeed, the link in particular was far faster than MSVC. As compilation can be parallelised using tools like Incredibuild or FASTBuild or SN-DBS but linking time can not, I find that incremental changes are usually faster to test with Clang. That's a very important thing, just not really what I'm testing right now.

I should also point out that the compiler is a lot better - helping me find a few issues in the code that certainly should have been an issue - variable aliasing, template inconsistency. I'm still tempted to use the compiler regardless as it's easy to argue that those are more important things than just the time taken to compile.


Unity Builds


This is where the massive performance gains come...

So - confused at what is actually taking the majority of the time, I timed the compilation of an empty C++ file ... 70-90ms (and around 180ms with Windows Defender enabled).

WHAT?

A rough estimate is that I've 300 .cpp files to compile at . Divided by four cores that's 6 seconds just to start and stop the compiler for each compilation unit. My best compile time at this point was around 8.3 seconds ... so 72% of the compile time is spent just starting and stopping the compiler (and more if you're running virus checking software)!

So I set up some batch files to gather everything into 19 unity .cpp files. It took a little work to do the batch file - and then to separate the platforms (and some files that were there that shouldn't be). I guess it took under an hour though.

Compile time: 2.1 seconds! Wow.

So I mentioned before that the header-guards were doing their job - I guess with a Unity.cpp file, most of the improved compile speed would come from not having to parse the same headers over-and-over-and-over again rather than just coming from the .

Anecdotally, one of the graphics source files (that includes lots of things like <windows.h>, <d3d12.h>. <dxgi.h>, etc.) takes about 525ms to compile in isolation. The entire unity graphics project of over 100 files now takes 1021ms.

So: Microsoft: Can you not keep a copy of cl.exe active on each core for the duration of the compile please?

There's many other reasons to use unity builds - performance, better compliance with Incredibuild, etc. but I'd never quite realised that it'd make such a difference to compile times on a single machine.


PCH files

These are something I hate. I've seen massive speed improvements with in the past though and sometimes the pain is worth the work.

It feels like opening a pandora's box using PCH files. I've seen projects where everyone just adds all their header files to pch.h. True, a full-recompile is much faster - but every time any header file changes it triggers a full rebuild. That's not then a performance improvement. So care needs to be taken on which files to use.

It's something I'd like to give a try though (with carefully chosen headers - even just the system ones). Frankly I don't think it's worth combining them with unity files for my project as it's going to be relatively hard to notice changes to a 2.1s compile time. Maybe this is a good topic for the next blog post?


Summary:



  • Use Unity builds if full rebuild speed is your primary concern. 
    • Though with larger unity files the iteration time could be a problem. If you alter one file you will end up recompiling many.
    • The startup time of cl.exe looks to be the overhead. It takes 70-90ms to compile an empty file (180ms with Windows Defender enabled).
  • SSDs probably won't make a massive difference when your disk cache is warm. When it's cold (maybe you ran the game before the rebuild is a very common case) expect significant improvements.
    • Using Windows Junction Points using tools like Junction can be useful if you can't fit everything on your SSD. But Junction TO a HDD not back from one - as there's a cost.
  • Potentially disable Windows Defender. You (probably) shouldn't be looking at porn at work anyway.
  • Turn off all your other apps you can do.
  • Consider your project settings. Disable 'Edit And Continue' from your debug format.
  • Stop #including massive header files from your header files. You're being silly and upsetting your coworkers.
  • For very commonly included headers consider moving the 'inline' implementations into a .inl file. I think this is an option that would scale very well.
  • Templates compile slower than other code. Who knew?
  • Um. Use Incredibuild. Or FASTBuild. Or something.

Most of these probably don't come as too much of a surprise. But I spent a day or so measuring them all.


Future Work?


There's a few things I'd still like to try. Precompiled headers is a strong option (mentioned above).

Though not a direct replacement for full-solution rebuilds, there's various flavours of runtime compiled C++ plugins/sdks to use: Runtime Compiled C++ by Doug Binks seems strong and I got a demo recently (thanks). Also Recode looks promising. I'd also be very interested in a library that did Just-In-Time C++ compilation at a reasonable speed. Microsoft also has updated its 'Edit And Continue' feature in MSVC 2015 but I've not checked it out recently. I should.

I'd like to trial using FASTBuild. I guess primarily it's a network build tool ... so it'll build your code on someone else's machine much like Incredibuild (without the need to pay £££s for a licence). I'm more interested, however in its ability to pull off already-compiled .cpp files from a network cache ... so if you're working in a large (or even small) studio, if you all grab to the same verified change list from source control on a morning, only the first person will actually have to wait to compile the file. Everyone else just pulls a version from the network cache. This is great (and works - check the website for stats), but previous experience was that it took a little while to set up. I'd really like this for shader compilation but previously couldn't get dependencies working with fxc.

It's also worth mentioning a few caveats:

  • I'm only testing MSVC 2015 building x64 code for PC. Many other compilers, platforms exist.
  • I'm only building a library not a game. The pattern of file includes changes a lot. In a library you can expect to include each other less and have fewer circular dependencies and re-included files. A game may #include a lot more files as a game component may draw on multiple features (sound, graphics, controller) at once.
  • I'm using a limited set of C++. Roughly similar to the Orthodox C++ guidelines which I like a lot but that also probably helps compile times.
  • Perhaps I could have tried testing a published game or engine made open source?

I'd be interested in hearing others' findings and/or other tips and tricks.

Thanks for reading, let me know if any of this helps.


Some useful links:


Nice article on compilation speed
http://www.drdobbs.com/cpp/c-compilation-speed/228701711

Nice article on C++ compilation times 
https://mikaelsitruk.wordpress.com/2010/08/11/speeding-up-cpp-compilation/

Microsoft's tips on improving compile time on  MSVC 
https://randomascii.wordpress.com/2014/03/22/make-vc-compiles-fast-through-parallel-compilation/

Runtime Compiled C++ 
https://github.com/RuntimeCompiledCPlusPlus/RuntimeCompiledCPlusPlus

Recode - code-on-the-fly 
http://www.indefiant.com/#

Incredibuild - the standard distributed compiler 
https://www.incredibuild.com/

FASTBuild - a distributed compiler 
http://fastbuild.org/docs/home.html








5 comments:

  1. It might be interesting to make a tool that found the AST for each source file, then looked at the subset of the files included to see what portion of the AST from them is needed. Acting on that and re-organising might break programmer friendly split of header file naming, but it could potentially be a good step past "include what you use" in terms of identifying the header files where much smaller subsets are actually required. Not as good as actually having dependencies only recompile when changes affect the final AST for the source file in question, but could be an additional tool that makes recommendations - changes made based on that would just be normal source code, so would still build under whatever C++ compiler/build system was in use.

    ReplyDelete
    Replies
    1. Yeah interesting idea.

      Similarly, the approach of compiling a whole file and its dependents from fresh every time unnecessary. If we knew which part of which file - or even which function(s) changed and need recompiling ... which could be done on a low-priority thread ready for 'Build' being pressed...

      Adrian

      Delete
  2. You got a spell error in
    #include , thus the spell checker which connects to MS to submit spelling errors under the "data statistic usage agreement" eats all your measured time.


    Just kidding. Nice work!

    ReplyDelete
  3. Compile times are most important when working on the significant (big parts) the code base. As the years have gone by I have realised that new versions of Windows itself have a really devastating effect on compile times.

    ReplyDelete
    Replies
    1. Interesting you say that. I obviously detected some non-trivial time to compile a single, empty file so it's quite possible at least some of that is in the OS and file system. Have you any figures (or even further impressions) on the difference between Windows versions?


      Thanks,

      Adrian

      Delete