/LanguageTests

Testing various programming language functionality

Primary LanguageZigMIT LicenseMIT

Language Tests

I test various language functionalities and usage experience. I want to find a language which will significantly increase my productivity over C++, without having significant drawbacks.

I used to think C++ was the ideal language, but lately I’ve felt disappointed with how gross templates get, and how complex the stack has become. I would conform more closely to C, but managing memory without any destructors (or defer, etc.) is a definite pain point.

The test

  • Write a command-line tool which takes several optional arguments
  • The tool creates and manipulates numeric data based on the arguments, then serializes the data to a file
  • The file should include several different fields and a string or something (i.e. not a perfect 4-byte float dump)
  • The tool can load from the file and manipulate that data

The test program would be a pain in the ass to write in C, which is what I’m trying to beat.

The languages

Zig

Zig uses LLVM to generate machine code. It also uses LLVM to handle C interoperability.

Carp

Carp compiles into C, then uses a C compiler to create the machine code.

Steel Bank Common Lisp (SBCL)

SBCL has its own machine-code compiler.

Additional setup

Install Quicklisp.

Explanation

I selected Zig because it takes performance seriously, and has compilation-time code execution support. I’m also a Zig sponsor, so I wanted to do this project to try out what I’m supporting :). I didn’t pick Rust because I don’t think the memory management overhead is worth it. I’m no expert on that though, so I could be very wrong.

I picked two Lisp-y languages because I’m enticed by the talk of how powerful it is (I love the idea of GOAL), and how you can extend the language within the language. I also like that most Lisps have compile-time code execution and hotreloading. The former is good for things like serialization which are common in games but very painful in C/C++. It’s also nice to make things more automatic, like module initialization. Hotreloading is good for drastically shortening the iteration time.

My biggest criticisms of Lisps is how much they rely on garbage collection and data structures with poor cache characteristics (linked lists). I’m hoping Carp will give me the best of both worlds (no GC, but still Lisp-y) there. SBCL is the closest to a “normal” Lisp in case I do find it acceptable performance-wise.

Results

Features

These are features important to me to have:

LanguageHas REPLHas hot-reloadingIntrospectionCompile-time code generationMemory management¹
ZigNoNoYes, though without tagsYesAll manual. Explicit
CarpYesYes?“Automatic”, no GC
SBCLYesYesYes (Inspect, introspection)Yes²Garbage-collected (a, b)

¹I’m usually working on high-performance apps like games, so it is important that I have fine-grained control over memory and can reliably avoid stalls. The language should help me achieve high performance without making me suffer for it, one way or another (hard implementation vs. bad runtime performance). It should be possible and easy to use containers with good data locality, e.g. using vector/array instead of linked list whenever possible.

²The code generation abilities in SBCL are more powerful than e.g. Zig’s because SBCL has an environment, even at compile-time. This means you could have functions append themselves to a global variable during compile time, which is very useful when making e.g. keybind or command systems.

C interoperability

C holds such a massive amount of value to interface with, especially in game development (e.g. most console SDKs are written in C++, which is different from C but can be interfaced with through C wrappers).

Zig

This is a huge plus to Zig, because writing bindings is tedious and gratuitous.

Zig also has excellent C ABI export ability, meaning if I write a bunch of Zig code, then switch back to C or C++, I will still be able to reasonably use that Zig code - no “boxing”, weird conversions, etc. necessary.

Carp

SBCL

The CFFI (and others) provides C interop, though it requires maintaining bindings for the C interface. There are automatic binding generators, but I haven’t looked too deeply into their flaws and limitations yet.

C++ wrappers may be possible with SWIG.

In short, it’s possible, but not seamless.

My implementations and thoughts

LanguageMy CLOCTime to implementExecutable size
Zig1322.5h1 M
Carp
SBCL672h42 M

I did not end up making the same program, but I feel I did get an adequate feel for the languages from the simple programs I did make.

SBCL has a large executable because it must package the entire SBCL compiler, Common Lisp, and runtime.

Zig

  • Right out of the box, the Hello World documentation did not compile against my installed version. It’s a rapidly changing language, so it’s not unexpected, but a little annoying. I’m building my documentation from my source now, so I shouldn’t have this problem again
  • I like that what type of allocator I’m using is very explicit (I’m using docgen.zig as a reference for my test). Choosing an Allocator makes me happy to have that level of control
  • I like the defer keyword already, though by default it seems there’s no errors or warnings if I omit it (and the memory should be freed)
  • The Emacs zig-mode works quite well. Once I specified the zig-zig-bin variable, I got automatic formatting on save, which is pretty slick. I’m not a huge fan of the format style, but if it’s not up to me, I won’t worry about it
  • I managed to crash the compiler deep in LLVM output. I’m attempting to write a repro so that it can be fixed
  • While the comptime keyword and introspection are big pluses, they aren’t quite as powerful as I had hoped. I realized from this experiment that the Lisp-style environment, which is available to modify at compilation time and runtime, is necessary to do really powerful code generation. For example, you can write a macro in Lisp that will declare a function and append it to a global list (useful for defining commands or keybinds), then call that macro in any file which includes it. I don’t think that is possible in Zig, but I could be wrong

Field and function tags, a.k.a. annotations

I was bummed to see struct field and function annotations/tags not available yet, and it probably won’t be coming soon. See the issue. The issue author and I have the exact same use-case: automatic serialization and function command registration.

The commentator who said “serialization should be written by hand each time” is flat wrong: serialization is extremely boilerplate-y and painful to write. We should make the computer do that mindless work!

I think I can get by via the @typeName builtin as well as external metadata structures for field tagging, but it is a damn shame the tags approach had so many detractors.

For an example of how it is useful, see how Unreal Engine 4 uses USTRUCT to generate whole editors from field tags, among other things:

SBCL

  • Emacs Slime got me up and running quickly, though I’m going to have to redefine a bunch of keys. I’m used to certain completion keybinds that I’ll have to bind over whatever slime has
  • It’s very frustrating to find the documentation. When I do find some, the answers involve piling on various external packages to make it easier, thus making the whole system more complex. In comparison, Zig is all on one page, and has plenty of easy-to-understand examples. I did find the lisp spec and multiple good resources eventually

What makes me most nervous about SBCL

Packaging an executable is a nasty process. As far as I know, there is no way to cross-compile for another operating system/architecture without running SBCL on that architecture, which is unsustainable. Zig’s cross-compilation ability destroys SBCL in comparison.

Additionally, SBCL executables contain absolutely everything necessary to use SBCL: the compiler, all of Common Lisp, et cetera. I should have a way to analyze and remove all code which is never utilized by my program. The compiler itself would be good to remove just to eliminate arbitrary code compilation and execution on a shipping product.

See SBCL/ClocOutput.txt for an idea of what portions of SBCL take the most code. I will likely need to dig into the internals.

The garbage collector. I don’t want to be walking on eggshells while making things, but I feel like the GC could be a ticking time bomb in regards to game performance. It stops all threads to perform collection, which means things like audio threads are going to have to be in C/C++ and “off the radar” of the SBCL runtime.

I can stomach the existing garbage collector if I find it matches the following criteria:

  • I can always control when it can happen. For example, I could enable “no-gc” until I’m finished with a frame’s worth of work. Once the frame is done, all the processor has to do is sleep for V-sync. If I can estimate how long I have to sleep that frame, I could then decide whether to trigger a GC during the sleep phase, or postpone it until the next frame’s sleep
  • The garbage collector has predictable performance characteristics. If it could take 0ms one frame and 10ms the next with no control or explanation, that’s unworkable. If I know I have to pay a 2ms “GC tax” every frame, I can budget around that and not get any stuttery frames (the stuttery frames will be my fault, not the GC’s)
  • It is possible (and optimally, convenient) to trace garbage back to what caused the allocation. I have to know where garbage is coming from if I am going to be able to reduce the amount of garbage created
  • Common operations do not create garbage. I shouldn’t need to avoid using 99% of the language because it creates too much garbage by default
  • Long collection peaks can be avoided. If I am consistently collecting garbage, I should not get large spikes of collections. Optimally, I could define a max run time where GC could stop if it’s taking too long that frame, then pick up where it left off the next frame. I doubt this is feasible with the existing design

As you can see, there are quite a lot of caveats. I can’t help but think I will end up having to write modifications to the SBCL runtime myself in order to get acceptable performance for games. For example, I could have multiple memory management strategies, then I could switch between them per object. A simple linear allocator could handle many allocations in a single frame, then dump them all at the end of the frame by merely resetting a pointer. That would go a long way to still have convenience without losing speed: If I can do things fast and loose during a frame, knowing it’ll all get dumped at the end, I will be able to implement with less caution without losing performance.

Both the packaging problem and the garbage collector make me feel like shipping a viable, performant executable (and executables for other platforms) is not a concern for most SBCL developers. That’s very worrying. If I go with SBCL or another Common Lisp implementation, I will definitely need to dive in to the runtime internals and make significant modifications.

Garbage collection enthusiasts usually emphasize how nice it is to not worry about memory management. However, GC causes other worries: C programmers don’t have to worry about poor performance characteristics. If they’re doing something slow, they’ll know it. I’d like to find a happy medium where I can have my cake and eat it too: not worry (much) about allocations, and not worry (much) about poor performance. I think having a small collection of custom allocators could make Lisp that thing, but a general garbage collector will not.

C programs tend to be performant because idiomatic, “habitual” coding in C results in performant programs. By “habitual”, I mean the decisions you make over and over again about how to construct the program result in good performance characteristics. Additionally, the “habits” are not painful to have. If I can find a way to do habitually performant coding in Lisp, it could work out. I have doubts that the design of garbage collection and reliance on it will make that possible. Will I be fighting the language?

Maintainability/sustainability

LanguageCLOCRepo healthEcosystemComments
Zig84k¹Very active. Healthy, financially supportedSmall, though C can be used
Carp27kNot many contributors. Says it’s a “research project”Very small, though C can be usedLikely to die as soon as its solo dev loses interest
SBCL310k²Old! Still active, many contributorsLarge: Common Lisp packagesPorting would be hard because it’s a custom compiler

¹CLOC did not detect Zig as a language, though I think it did count the Zig files as C/C++ files. I used cloc src/ src-self-hosted/ lib/std/ to count the source code I thought was most representative of zig (this does not include LLVM/libc/other dependencies).

²Unlike Zig and Carp, the SBCL CLOC does include a full compiler. Around 353k of SBCL is written in Lisp, i.e. the language itself, whereas Zig’s compiler is in C++ and Carp’s is in Haskell. Also note that I removed about 80k lines from this total because cloc --by-file-by-lang src showed the two largest Lisp files were for Japanese and Chinese encoding tables. I removed them because I probably won’t be using them for my purposes. See SBCL/ClocOutput.txt for the full CLOC output.

Note that I mean no disrespect with these evaluations, I’m only trying to be realistic about whether I would need to become the maintainer of the language in e.g. 5 or 10 years time.