Friday, June 26, 2015

Reflection and Automatic Serialization

Before you ask, no...I'm not talking about rendering reflections in a graphics context :)

Reflection is a term used to denote the ability for objects in a language to know about themselves at runtime (or "reflect" on their data). Other languages that have this concept built-in may also call this introspection. Either way, knowing about your objects at runtime can have some pretty amazing benefits, including:

  • A queryable registry of all of your objects and types
  • Objects can iterate over their member variables
  • Easy function binding with a scripting language
  • Automatically expose objects to an external editing tool
  • Automatic serialization and deserialization
These benefits can really save a lot of time and code complexity later on and having it built into your language of choice really makes it easy. However, Qi predominately uses C++ which does not have any concept of reflection at all. Therefore, we have to roll our own!

Before reading any further, I recommend checking out a good overview of a simple reflection system here. The ideas presented there formed the basis of the reflection system in Qi. 

What does a reflection system need?


In order to get started with reflection, you have to first think about what you plan on doing with it. The main goal for reflection in Qi is to enable automatic serialization and deserialization (more on this later). With that in mind, the reflection system has to meet the following requirements:
  • Allow users to expose any object to the reflection system
  • Selectively reflect member variables in any object
    • Some classes may not want to serialize all of their data (small temporary variables for example)
  • Know info about the type, including:
    • Name
    • Size (in bytes)
    • What member variables are included in this type
    • The offset from the beginning of the object (in the case of member variables) in bytes
  • Have built-in support for all C++ primitive types (int, float, double, etc...)
  • Have support for arrays, pointers, and inherited types
  • Be incredibly simple to use
  • Not add any variables to any classes using reflection
    • We don't want to inflate the memory-use for a class behind the user's back
  • Be able to instantiate any type in the class registry by only knowing its name
  • Build the entire class registry with minimal input from the user before entering main()
Note that templated types are missing from this list. At the time of writing this, I still haven't entirely figured out a good way to represent them yet.

Building a reflection system


With the points above in mind, let's talk about how the reflection system is implemented in Qi. One of the central points is that the entire class registry has to be built at runtime before entering main(). This means that we're going to have to rely on static initialization of variables (as global statics will be initialized before entering main()). Additionally, we want this entire process to be simple on the user, meaning we only have a few entry points to add information to the reflection system. These include:
  • Declaring a class to the reflection system
  • Specifying what members of a class should be known to the reflection system
  • Declaring any parent classes in the case of inherited types
That's it really. We don't want to complicate the user's code and don't want to add lots of complexity to a user's class.

To solve the problems of both ease-of-use and population before the engine really starts, Qi uses a system based on template metaprogramming. This does create a slightly confusing bit of code internally to the reflection system but it will allow us to leverage the compiler to write most of the reflection system for us automatically! Qi exposes the following four macros to handle the reflection system interface:
  • QI_DECLARE_REFLECTED_CLASS()
    • Placed within the public section of a class, this macro tells the reflection system that you want the engine to know about it at runtime.
  • QI_REFLECT_CLASS()
    • Invoked inside of your class' .cpp file, this macro performs the necessary steps to register the reflected type at runtime.
  • QI_REFLECT_MEMBER()
    • Specified per-member that your object wants reflected.
  • QI_DECLARE_PARENT()
    • Tell the reflection system about the inheritance chain that leads to your reflected type.
A quick example of these macros in practice could look like this:
// Inside of the header
class Foo
{
    public:
        
        QI_DECLARE_REFLECTED_CLASS(Foo);

        int   member1;
        float member2;
        Bar   member3;
};

// Inside of the cpp file
QI_REFLECT_CLASS(Foo)
{
   QI_REFLECT_MEMBER(member1);
   QI_REFLECT_MEMBER(member2);
   QI_REFLECT_MEMBER(member3);
}

At this point, the reflection system has all of the data it needs to know about Foo. At runtime, you'll be able to ask any instance of Foo what member variables it has, instantiate it with just the name "Foo" alone, and many other things. Note that the QI_REFLECT_MEMBER() macro can accept any type (as long as it is already known to the reflection system).

So how do these macros work? It's pretty complicated and I won't go into all of the detail here. However, there are some important concepts to understand:

  • Every type that is declared with the above macros will have an associated ReflectionData object that contains all of the information pertaining to that type. This reflection data will be registered with the central class registry and only be created once per type. That means that if you have multiple Foo objects, they will all share the exact same ReflectionData instance. This is done behind the scenes automatically by exploiting both templating and static variables within the QI_REFLECT_CLASS() macro. Every time that macro is evaluated in the engine, the compiler will automatically write a class for that type which contains with it a static instance of a ReflectionData object. This object exists entirely within the class registry and doesn't add to the size of the original type at all.
  • Reflecting any member variable will automatically figure out the proper type information for it at runtime (is it a float? a pointer? and array? etc...) by utilizing the decltype() function of C++ and automatically stripping off any qualifiers that it may have (const, &, and so on).
An example of getting the member variables of a type at runtime by only knowing the name of the could look like this:

Qi::ReflectionDataManager &manager = Qi::ReflectionDataManager::GetInstance();
const Qi::ReflectionData *data = manager.GetReflectionData("MyType");
Qi::ReflectionData::Members &members = data->GetMembers();
for (auto iter = members.begin(); iter != members.end(); ++iter)
{
   std::cout << "Member name: " << iter->GetName() << std::endl;
   std::cout << "Member size: " << iter->GetSize() << std::endl;
}

Using this type of information, we can naturally write some pretty simple serialization code!

Automatic serialization


The main reason for writing a reflection system in Qi has been to get to automatic serialization of objects in the game world. This can have many uses such as saving/loading games, capturing world state for debugging, and network information. A naive approach to serialization could involve some sort of Serializeable base class where every object that inherits from it overrides some virtual serialization function. The downsides to this approach include:
  • The user has to manually write the code for serializing every object required
  • Not easily backwards/forwards compatible
  • Introduces forced multiple inheritance
  • Error-prone as you can't rely on the user to write correct and proper serialization code
However, using the reflection system we can get the following benefits:
  • The user writes no serialization code at all, only tells the reflection system what types they want reflected (as demonstrated above).
  • Easily supports backwards compatibility
    • A reflected type know about its member variables. If it comes across a variable that it doesn't know while deserializing, you can easily choose to skip it.
  • The engine has full control over how serialization is performed, even for user classes
The serialization code in Qi generally follows this recursive algorithm:

Serialize(type)
   if (type has no members)
      write type information to output stream
   foreach(member in type)
      Serialize(member)

And that's it. This code is very simple and easily exploits the reflection system's power of knowing type information at runtime. The code for deserialization is almost identical.

There is one slight gotcha in this whole process, serializing pointers. When you're writing out the data for a pointer, you can't simply write the address and move on because that address won't be the same when you want to read that data in. Therefore, Qi makes use of a pointer table which allows the serialization code to write an entire object pointer graph to an output stream. Qi accomplishes this with a two-pass serialization approach described here.

The basic idea is to traverse a type's member variables, looking for any variables that contain pointers. If they do, add the address that the pointer is pointing to to an in-memory table where each element of the table represents an object being pointed to. Then associate the index in that table with the pointer in the output stream. Once your entire table is populated, you can write it out entry by entry to an output stream.

When deserializing the stream, a similar two-pass approach can be used. You first allocate an in-memory table with enough space to store the addresses of every dynamic object in your stream. Then while deserializing, when you come across a pointer you save off the index in the table which that pointer points to for later processing. Once the entire table has been read in, you can go back and fixup the pointers that you found while deserializing.

An example of an output stream after serialization might look like this (in text form):

0 Foo
[
        member1 17
        member2 187.2349
        member3 Bar
        [
               variable1 true
               variable2 45
               variable3 12.91
        ]
]

This just serves as an example of what a serialized object might look like. Foo is a class which has the members member1, member2, and member3 (which is itself an object with member variables). Note that in a real-world setting for speed, the stream would be all in binary to avoid any text conversion while deserializing the data.

---------

Reflection is a pretty complicated topic that relies on quite a bit of compiler trickery to get going. In order to fully understand how Qi does it, you can check out the source code here.

I should mention that there are of course other ways of going about this. In a more commercial engine such as Unreal, an offline tool can be built that analyzes the headers of every object in the code base and writes all of the reflection information into special reflection classes that the reflection system then uses at runtime to get type information. This is far more complicated to write than the method I've described above but it does have the added benefit of less input from the user (you no longer need to specify which members are being reflected and any inheritance information, only that you want a class reflected or not) and it would only build the class registry and reflection information once (at compile time) instead of every time you run the code. If Qi ever became a commercial engine, this would be the way to go to cut down on startup times and add even more ease-of-use.

Until next time!

No comments:

Post a Comment