Home / Highlights / FAQ / Examples / Quick Start / Roadmap / Download / About Us

Recently I noticed a few files in my C++ codebase were small and include few headers, took longer to compile than larger files with more headers. I have no intuition of what affects compile time and if there was anything I could do about it besides including less.

This post will go over experiments I tried and a takeaway at the end. Each round will pose questions, show code I repeated thousands of times, and how long it took clang (16.0.6) and gcc (13.2.1) to compile it (-g -march=native -c). It's obviously not realistic to repeat something thousands of times but it may give you an idea of the upper bounds of speed if you're using a lot of something. You can follow along with the source here. The results will have the time in milliseconds with clang being the first set and gcc being the second.

Round 1: What happens if we define an empty type? Will enums, structs, and classes be the same? How much slower will they be if they have 1 member?

Here's the code I compared. These were repeated 250K times.

enum EnumType0 { };                     enum EnumType0 { EnumVal0 };
struct StructType0 { };                 struct StructType0 { int a; };
class ClassType0 { public: };           class ClassType0 { public: int a; };

Empty body and 1 item take a significantly different amount of time. Perhaps this could slow down your code if you have 100K structs with 1+ members. struct and class took roughly the same amount of time and enum's were almost twice as fast on clang but 1.5x faster on GCC.

No Item1 ItemGCC No ItemGCC 1 Item
Enum470  720  8501300
Struct930145015402040
Class980150015502060

Round 2: What if we declare instead of define? Would this be faster than an empty body declaration? Do you think you'd care if there were 100K of these? We'll check function and variable declarations as well. enum aren't legal to declare.

struct StructType0;                     int func0();
class ClassType0;                       extern int var0;

They're significantly faster than even an empty body definition. Functions are roughly twice as slow as a struct declaration. To my surprise vars are faster than functions in clang. However, gcc performs faster in this test and var declares are slower.

250K times
Struct400280
Class400280
Func740620
Var620715

Round 3: Instead of 1 member inside enum/struct/class what if we have 5? Does enum class improve speed?

enum class EnumType0 { Val0, Val1, Val2, Val3, Val4,};
enum       EnumType0 { EnumType0_Val0, EnumType0_Val1, ... EnumType0_Val4,};
struct     S0        { int var0, var1, var2, var3, var4; };
class      S0        { public: int var0, var1, var2, var3, var4; };
template<class T> class S0 { public: T var0, var1, var2, var3, var4; };

Enum class are faster in clang but over twice as slow in GCC. We ran this 50K times instead of 250K. Struct takes 12.7ms per thousand and classes are slower and likely would have methods with them. Perhaps you don't want to define many of these if you don't need to.

50K times with 5 members
Enum400230
Enum Class355590
Struct635790
Class640790
Template815580

Round 4: Do parameters matter? What about the type? We wouldn't change our API because of the results but I am curious what the results will be.

struct Dummy { int a, b, c; }; //In each of the below

int func0();
int func0(int a);
int func0(int a, int b, int c, int d, int e);
int func0(double a, double b, double c, double d, double e);
int func0(Dummy*a, Dummy*b, Dummy*c, Dummy*d, Dummy*e);
int func0(Dummy a, Dummy b, Dummy c, Dummy d, Dummy e);

inline int func0() { return 123; }
inline int func0(int a) { return 123; }
inline int func0(int a, int b, int c, int d, int e) { return 123; }
inline int func0(double a, double b, double c, double d, double e) { return 123; }
inline int func0(Dummy*a, Dummy*b, Dummy*c, Dummy*d, Dummy*e) { return 123; }
inline int func0(Dummy a, Dummy b, Dummy c, Dummy d, Dummy e) { return 123; }

GCC is faster than clang here but both are fairly fast. Declaring is nearly twice as fast as defining. You may want to keep definitions out of the header unless you intend for the optimizer to inline it (the compilation unit needs to see the definition).

25K (not 250K)DeclInlineDeclInline
0 param  93228  75205
1 param120280  95240
5 params230435195370
doubles230430195370
Struct*350580250455
Struct310540230410

Round 5: A definition using inline/static/inline static/template. inline static should work the same as static but let's see what happens.

                int func0() ;
                int func0() { return 123; }
inline          int func0() { return 123; }
static          int func0() { return 123; }
inline static   int func0() { return 123; }
template<class T> T func0() { return 123; }

There are a few surprises here. The biggest one for me is a function template (that is never called) is faster than an inline function (that is never called). I was also surprised that gcc (appears to) generate code for a static function that is never called. I assumed an inline definition never used would be almost the same as a declaration but function declarations are more than twice as fast. As mentioned in the last round, you may want to stick to function declaration unless you want the optimizer to inline that function.

Decl    93    74
Plain28854890
Inline  228  206
Static  2344800
Inline Static  228  190
Template  210  180

Round 6: Does compile time grow linearly with function declarations? What happens when we double and 10x the amount?

int func0(int p1, int p2, int p3);

Yes it's linear

  25K  180  150
  50K  340  288
250K15801570

Round 7: Does overloading a function make it slower?

//25K of this
int func0(int p1, int p2, int p3);

//Comparing to 5K with 5 overloads
int func0(int p1, int p2, int p3);
int func0(int p1, double p2, int p3);
int func0(int p1, int p2, double p3);
int func0(double p1, double p2, int p3);
int func0(int p1, double p2, double p3);

No, not noticeably

1x25K177151
5x5k186155

Round 8: Many function calls. What happens if we make 25K function calls? We'll spread it across 250 functions calling 4 functions 25 times. We'll compare a declaration, a definition, a template, and a function with 4 overloads.

struct Test { }; //In all of the below

int funcI(Test*p1, int p2, int p3);
int funcL(Test*p1, int p2, long p3);
int funcUI(Test*p1, int p2, unsigned int p3);
int funcUL(Test*p1, int p2, unsigned long p3);
int func0() {
	Test t;
	int v;
	v = funcI(&t, 0, 1);
	v = funcL(&t, 0, 1L);
	v = funcUI(&t, 0, 1U);
	v = funcUL(&t, 0, 1UL);
	v = funcI(&t, 1, 2);
	...

int funcI(Test*p1, int p2, int p3) { return 0; }
int funcL(Test*p1, int p2, long p3) { return 0; }
int funcUI(Test*p1, int p2, unsigned int p3) { return 0; }
int funcUL(Test*p1, int p2, unsigned long p3) { return 0; }
//Function call same as above

template int func(Test*p1, int p2, T p3) { return 0; }
int func0() {
	Test t;
	int v;
	v = func(&t, 0, 1);
	v = func(&t, 0, 1L);
	v = func(&t, 0, 1U);
	v = func(&t, 0, 1UL);
	v = func(&t, 1, 2);
	...

int func(Test*p1, int p2, int p3) { return 0; }
int func(Test*p1, int p2, long p3) { return 0; }
int func(Test*p1, int p2, unsigned int p3) { return 0; }
int func(Test*p1, int p2, unsigned long p3) { return 0; }
int func0() {
	Test t;
	int v;
	v = func(&t, 0, 1);
	v = func(&t, 0, 1L);
	v = func(&t, 0, 1U);
	v = func(&t, 0, 1UL);
	v = func(&t, 1, 2);
	...

GCC is slower than clang for function calls (debug build). The way you declare the function doesn't seem to matter but it appears a function declare is different than the rest in GCC when you have 25K function calls. It's not a significant difference.

Decl12201700
Def12201550
Template12801550
Overloads13001550

Round 9: Last Round. What happens if we call a function inside a struct, a class and a class template? We'll pass in 2 parameters since this is a hidden parameter.

struct Test { inline int func(int p1, int p2, int p3) { return 0; } };
int func0() {
	Test t;
	int v;
	v = t.func(0, 1);
	...

class Test { public: inline int func(int p1, int p2, int p3) { return 0; } };
int func0() {
	Test t;
	int v;
	v = t.func(0, 1);
	...
	
template<class T>
class Test { public: inline int func(int p1, int p2, T p3) { return 0; } };
int func0() {
	Test<int> t;
	int v;
	v = t.func(0, 1);
	...

Well, to my surprise, it's all the same. I'm not sure why this is slightly faster than the previous round

struct11501510
class11501510
template11501510

Takeaway: Prefer declaring a function when you don't need it to be inlined even if you don't call the function. Declaring is a lot faster than defining, function declarations scale linearly and overloads aren't slow. To my surprise templates aren't slow. They may feel slow if there are a lot of definitions which is typically the case. This might explain why including <string> increases time by 100 milliseconds.

One Last Example: I noticed SDL took a moment and noticed it is including immintrin.h. That is the header for CPU intrinsics which includes a lot of function definitions. If you Take this example.

#include <SDL2/SDL.h>
int main(int argc, char *argv[]) { }

Writing time clang++ -march=native main.cpp it'll take 550ms to compile. Writing time clang++ -march=native -x c main.cpp -DSDL_cpuinfo_h_ takes 112ms and 60ms with gcc. That's pretty drastic. The difference between the two is 1) excluding the header that includes immintrin.h and 2) telling the compiler to treat the file as C so it doesn't include C++ specifics such as the implementation for operator overloads. That's 400-500ms by ignoring some function definitions. If you run into headers that take long to compile you may want to try looking at it with the -E flag.

Final Thoughts: It's pretty wild how much compile time can vary. I find most C++ files take a second for the headers and another per 1K lines of code. Using the knowledge from these experiments I spent 30mins on improving my headers and improved the compile time by 5%. That's an ok amount for a small project with a dozen files. I'm no longer superstitious about headers with templates in them and I try to keep definitions out of headers that are included in all of my files. But really, most of the compile time is code generation from your source file.