whirlicube
A company making a computer game.

Backsliding into the Lower Layers

Dear reader, I have failed. There are no trees and no people. :-(

At least I have you to hold me accountable.

The Man Who Planted a Forest

Once, there was a man who needed a wooden cabin. He went and got a quote from a company that built cabins.

But, after he looked at examples of the company's previous work, he decided the designs weren't exactly what he wanted. One cabin looked nice, but maybe the roof should have been given a different angle? Another was innovative, but the man worried about the stability of the bracing. The last was sturdy enough, but the builders had used the wrong type of screws.

Nothing for it, he would have to build his own cabin. He was good enough with a hammer. He sat down and drew up the plans, and ordered the tools and screws and fixings he would need.

But cabins need wood, and when our man went to look at wood, none of it was good enough for him. He needed softwood, but this lumberyard had only hardwood. He wanted a light-colored pine, but this pine was too dark. He needed a specific thickness of plank, and those had been cut too thin.

Nothing for it but to go cut down the correct trees himself. He got his saw and off he went.

But the forest where the correct trees grew was far away, across an ocean. Transporting the wood he needs for his cabin was going to be tricky.

So he went and gathered some seeds, bought a field, and planted his perfect pine trees nearby. The exact species he needs. The wood will be the right color and the right density. His saw will be ready to cut the planks to his exact specifications.

All he had to do was wait for the trees to grow.

He still doesn't have a cabin.

File Formats

Just like our industrious cabin-builder, I've done an awful lot of work this week. But instead of drawing trees or loading meshes, it's been on improving my engine's file format.

You need a file format before you can load meshes, right?

Human-Readable Notation

The industry has been increasingly using human-readable formats to store structured data. Around the turn of the century people were trying to use XML for everything, which is now widely regarded as a mistake.

But as web development has become more common, saner markup languages like json, yaml, or markdown are increasingly used for all kinds of data.

The excellent glTF - which I probably should have just used - uses json to encode the structural part of scene data.

Here's a sample of part of one of my shaders in my own json-like notation:

program : gpu::program_data {
  vshader : @resources.i2.program.vshader,
  fshader : gpu::shader_data {
    name : "terrain_debug_fshader",
    kind : SHADER_FRAGMENT,
    shader : 
      #[bGF5b3V0ICggc3RkMTQwICkgdW5pZm9ybTsKb3V0IHZlYzQgZnJhbWVidWZm
      ZXI7CmluIGZsb2F0IGRlcHRoOwppbiBmbG9hdCBncm91bmQ7CmluIHZlYzIg
      dGV4Y29vcmQ7CmluIHZlYzMgbm9ybWFsOwp2b2lkIG1haW4oKQp7CiAgICBm
      cmFtZWJ1ZmZlciA9IHZlYzQoIDAuMDIsIDAuMDIsIDAuMDIsIDEuMCApOwp9
      Cg==],
  },
  bindings : [
    gpu::binding_data {
      kind : BINDING_ATTRIBUTE,
      name : "xy",
      bindindex : 0ul,
    },
  ]
}

This kind of notation has a bunch of advantages over custom binary formats:

The addition of schema information helps these kind of files be robust against changes in the code that loads them. If the order of structure members changes, or if a new member is added in a later version, we can continue to load the old attributes from the old files without any special-case version handling.

My own notation extends json in a number of ways:

My engine also has a reflection layer where metadata about serializable objects is built by a pre-compilation step. This allows objects to be loaded and saved by one piece of serialization code.

Structured Binary Files

Of course, there are some kinds of data for which human-readable notation is a terrible choice. And, in general, binary files will tend to be both smaller and easier for the computer to decode.

Data like the vertex buffer of a tree mesh, for example.

You can see that the actual shader data in the example above has been base64-encoded to make it 'readable'. This increases the size of this data block by 33%, and also makes it less compressible.

So a binary file might be a better choice. But we don't want to throw away all of the advantages of our self-describing, easy to read file format.

This is where something like bson can fit. Instead of saving structures each with its own custom byte layout, we can serialize the object notation into a binary package which preserves the structure of the data and contains the type and property names. I think Unreal .upk files serialize objects using a similar strategy.

My engine's main file format is actually a binary form of the notation described above.

This gets us the best of both worlds - the computer finds the binary files easier and faster to read, and we humans can inspect and edit the data, including all its structure, with one simple tool that converts to and from the human-readable notation.

Zero Copy Binary Blob Loading

Large chunks of repeating data like vertices or pixels are still going to suffer from a bunch of overhead if we serialize them interleaved with their structural data. This kind of data is also not particularly meaningful to the CPU - mostly what we want to do is load it from disk then immediately blit it into GPU memory unchanged.

Last week, my file format stored binary data inline with the rest of the data, as if it was a very large number. The deserializer would copy the bytes from the incoming data stream and into the correct place in the final in-memory structure.

Diagram showing binary data being copied after loading

This week, binary blobs have been moved to a special section of the file. This means when we load them, it's much easier for the deserialization code to load this data directly into its own block of memory. It can then hand ownership of this memory block to the final structure without copying it.

Diagram showing binary data being loaded into its own memory block

That was the main outcome of my weekend mucking about with the file format.

Planting that forest is hard work...

Compression

Another thing I added this week was compression of serialized data using zstd. I have found Yann Collet's blog fascinating, and zstd gives generally better results than zlib while still being fast and simple.

So that was Tuesday's win.

Blender and Python

Blender 2.8 has recently been released. Blender has been great for me, especially now that the new version has improved the user interface so much.

Blender is written in a combination of C and Python, and - like most 3D packages - its internal structures are highly extensible. It's addon API is basically the same as the internal Python API, which means you can do almost anything you want by writing some Python code.

Because its open source, you can also see exactly what the code you're running is doing. That's more useful when you run into bugs or issues than any amount of documentation could ever be. You can even fix bugs in the C code, if you find any.

Blender is amazing.

To get mesh data out of Blender and into a format that my engine can load, I have written a bunch of Python this week, including writing Python extensions using Python's extension module API.

Specifically, I needed:

That took a few days.

What Next?

Mesh data might now be getting serialized into the 'perfect' file format, but it's not getting loaded, and it's not getting drawn on the screen. So we're back to pushing more of those ugly damn pixels onto the screen.

This week I've definitely not had the progress I wanted.

I guess, like anyone in recovery, the danger of relapse is always there. I hope that I'm learning how to manage my workflow to achieve my actual goals, rather than getting lost in the minutiae every time.

I also need to draw a simple tree or two.

Whirlicube Limited - SC467330 - 272 Bath Street, Glasgow, G2 4JR