Have Your Efficiency, and Flexibility Too

Metaprogramming Techniques For No-Compromise Code

by Nick Sabalausky

Full source code for this article is available on GitHub, or can be downloaded here.

View this article on a Single Page or Multiple Pages

Table of Contents:

  1. Have Your Efficiency, and Flexibility Too
  2. First Attempt: Send Efficiency and Flexibility to Dr. Oop's Couples Therapy
  3. Respecting the Classics: Old-School Handcrafting
  4. Success at Dr. Metaprogramming's Clinic
  5. It Walks Like a Duck and Quacks Like a Duck...Kill It!
  6. Metaprogramming Plus: The Flexibility Enhancements
  7. The Last Remaining Elephant In The Room: Runtime Conversion
  8. Curtain Call

Success at Dr. Metaprogramming's Clinic

Dr. Metaprogramming listens to the story, pauses for a second, and replies, "Turn your runtime options into compile-time options." Say what? The metaprogramming doc takes the original problem, moves numPorts and isSpinnable up to the struct Gizmo line, makes just a few small changes, and:

From ex4_metaprogramming.d:

struct Gizmo(int _numPorts, bool _isSpinnable) { // So other generic code can determine the // number of ports and spinnability: static immutable numPorts = _numPorts; static immutable isSpinnable = _isSpinnable; static if(numPorts < 1) static assert(false, "A portless Gizmo is useless!"); private OutputPort[numPorts] ports; void doStuff() { static if(numPorts == 1) ports[0].zap(); else static if(numPorts == 2) { ports[0].zap(); ports[1].zap(); } else { foreach(port; ports) port.zap(); } } static if(isSpinnable) int spinCount; void spin() { static if(isSpinnable) spinCount++; // Spinning! Wheeee! } }

Dr. Metaprogramming points out, "As an added bonus, trying to make a portless Gizmo is now caught at compile-time instead of runtime."

Efficiency whines, "Look at all those ifs!" The doc explains that those aren't real ifs, they're static if. They only run at compile-time.

Efficiency responds, "Oh, ok. So you're really making many different types, right? Isn't there an overhead for that, like with polymorphism?" The doc says it does make many different types, but there's no runtime polymorphism (just compile-time) and no overhead. Efficiency smiles.

Seeing Efficiency happy makes Flexibility concerned. Flexibility balks, "We occasionally require some logic to determine a Gizmo's number of ports and spinnability, so I doubt we can do this." The doc assures him that D can run many ordinary functions at compile-time. And in other languages, code can just be generated as a separate step before compiling, or a preprocessor can be used. He adds that even if runtime logic really is needed, there are ways to do that, too. He'll demonstrate all of this shortly. Flexibility smiles.

The code to test this is still very similar to all the other versions. But since this is the first metaprogramming version, I'll show the new Gizmo-testing code here:

From ex4_metaprogramming.d:

struct OutputPort { int numZaps; void zap() { numZaps++; } } struct UltraGiz { // We could still use gizmosA, gizmosB, etc. just like before, // but templating them will make things a little easier: template gizmos(int numPorts, bool isSpinnable) { Gizmo!(numPorts, isSpinnable)[] gizmos; } int numTimesUsedSpinny; int numTimesUsedTwoPort; void useGizmo(T)(ref T gizmo) { gizmo.doStuff(); gizmo.spin(); if(gizmo.isSpinnable) numTimesUsedSpinny++; if(gizmo.numPorts == 2) numTimesUsedTwoPort++; } void run() { StopWatch stopWatch; stopWatch.start(); // Create gizmos gizmos!(1, false).length = 10_000; gizmos!(1, true ).length = 10_000; gizmos!(2, false).length = 10_000; gizmos!(2, true ).length = 10_000; gizmos!(5, false).length = 5_000; gizmos!(5, true ).length = 5_000; // Use gizmos foreach(i; 0..10_000) { foreach(ref gizmo; gizmos!(1, false)) useGizmo(gizmo); foreach(ref gizmo; gizmos!(1, true )) useGizmo(gizmo); foreach(ref gizmo; gizmos!(2, false)) useGizmo(gizmo); foreach(ref gizmo; gizmos!(2, true )) useGizmo(gizmo); foreach(ref gizmo; gizmos!(5, false)) useGizmo(gizmo); foreach(ref gizmo; gizmos!(5, true )) useGizmo(gizmo); } writeln(stopWatch.peek.msecs, "ms"); assert(numTimesUsedSpinny == 25_000 * 10_000); assert(numTimesUsedTwoPort == 20_000 * 10_000); } } void main() { UltraGiz ultra; ultra.run(); // Compile time error: A portless Gizmo is useless! //auto g = Gizmo!(0, true); }

One of the important things to note here is that the function useGizmo() is templated to accept any type. This is necessary since there are multiple Gizmo types instead of just one, and also because the Gizmos are structs instead of classes and therefore don't have a common base type. So effectively, there is now a separate useGizmo() function for each Gizmo type (although a smart linker might combine identical versions of useGizmo() behind-the-scenes). In the next section, I'll get back to the matter of this function being templated, but for now, just take note of it.

Also, note that the arrays gizmosA, gizmosB, etc. were replaced by a templated array. This is just like the separate arrays from ex3_handcrafted.d, but it gives us a better way to refer to them. For example, we now say gizmos!(2, false) instead of gizmosC. This may seem to be of questionable benefit, especially since we could have just named it gizmos2NoSpinny. But it will come in handy in the later metaprogramming versions since it lets us use arbitrary compile-time values to specify the two parameters. That gives us more metaprogramming power. But that will come later.

This version gives me 10.1 seconds and 9.2 MB. That's just as sleek and slim as the handcrafted version and...wait no...huh? It's slightly better? Granted, it's not by much, but what's going on?

It may seem strange that generic code could be more efficient than a specially handcrafted non-generic version. But at least part of what's happening is that with metaprogramming, the compiler is essentially doing your handcrafting automatically as needed.

Remember, in the real handcrafted version, the town elder only handcrafted one-port and two-port versions. For everything else, he had to fallback to the original strategy of dealing with a variable number of ports at runtime. With the metaprogramming version, on the other hand, the compiler automatically "handcrafted" a special five-port version when we asked for five ports. If we had also asked for three-port and seven-port versions, it would have automatically "handcrafted" those as well. It's possible to create and maintain all those special version manually, but it would be highly impractical.

If you really do want a single type for general multi-port Gizmos, just like the town elder's handcrafted version, that's certainly possible with metaprogramming, too. In fact, we'll get to that later.

Of course, I don't mean to imply that handcrafted optimization is obsolete. There are always optimizations a compiler won't be able to do. But when your optimization involves creating alternate versions of the same thing, metaprogramming makes it quick and easy to apply the same technique on as many different versions as you want without hindering maintainability.

I've alluded to a number of flexibility enhancements that can be made to this metaprogramming version. I'll explain these as promised, but there's one other enhancement I'd like to cover first:

Next: It Walks Like a Duck and Quacks Like a Duck...Kill It!

Table of Contents:

  1. Have Your Efficiency, and Flexibility Too
  2. First Attempt: Send Efficiency and Flexibility to Dr. Oop's Couples Therapy
  3. Respecting the Classics: Old-School Handcrafting
  4. Success at Dr. Metaprogramming's Clinic
  5. It Walks Like a Duck and Quacks Like a Duck...Kill It!
  6. Metaprogramming Plus: The Flexibility Enhancements
  7. The Last Remaining Elephant In The Room: Runtime Conversion
  8. Curtain Call