Tuesday, February 17, 2015

Building your Core library (Part 3) - Math

Games are math. Just about every system you create will have some sort of mathematical component to it. Something as seemingly simple as having a character walk forward can use obstacle avoidance, animation blending, and position transformation just to name a few. That doesn't even touch the mathematics required to actually render the character. Luckily, as complex as the math in a game can get, it all really boils down to just three simple base concepts:
  • Vectors
  • Matrices
  • Quaternions
These different concepts will be so crucial to the rest of the engine that it's important to think about a generic API for ease of use as well as performance. Qi will handle this by abstracting away the complexities of dealing with any underlying math operations as well as exploit SIMD operations to get better use out of the underlying CPU.

**NOTE** Some of this information is slightly outdated and has been revised in the post Porting to Windows.

Vectors

Vectors are an important part of any game. They could be used to represent the positions of all objects in the world, perform line of sight checks, cast rays, the list goes on and on. Since Qi is designed to work with 3D games, a majority of operations are going to only need a 3D vector. Therefore, we can define a simple vector class that stores 4 components:

class Vec4
{
    public:
        float x, y, z, w;
};
Note that even though we just need a 3D vector, all vectors will in fact use 4 components. This is because most rendering and animation transformations will require four dimensional homogeneous coordinates. More than that though, most processors have support for at least four SIMD lanes which allow for execution of 4 floating-point operations in one instruction. Qi makes use of Intel's SSE 4.2 SIMD library for easy access to SIMD instructions. SSE defines the variable __m128 which is a 128bit-wide register (enough space for four 32bit values, exactly what we need for our vector!). Games rarely ever need more than 32bits of floating-point precision, so defining everything to be simple floats (instead of something like doubles or long doubles) is just fine. Oftentimes I've seen vectors that use a templated datatype but this can over complicate design and make SIMD operations more difficult (double-precision values would not fit into one __m128 for example)
class __attribute__ ((aligned(16))) Vec4
{
    public:

    union
    {
      struct { float x, y, z, w; };
      __m128 mm_value;   
    };
};
__attribute__ ((aligned(16))) is the way that you can enforce the allocation of a vector to be 16-byte aligned. This is necessary for performance because only when this alignment is enforced can a __m128 value be properly loaded.

Outside of simple scaling operations, vectors need to support at least: dot product, cross product, vector length, and normalization. SSE 4.2 provides simple SIMD abstractions for all of these concepts so it's very easy to quickly implement. The latest version of Qi's vector class is available here.

Matrices

In a game engine, matrices typically play the role of transforming from one space into another. Animations can have bones with matrices that describe their current positions relative to some kind of base pose, rendering oftentimes transforms all objects into a camera-relative space to apply lighting calculations, scene objects are usually authored in one space and need to be transformed into the game world space, etc. Luckily, as we're just targeting 3D for Qi, a matrix can always be assumed to be 4x4. There may be times where you'll need different sized matrices for specific math operations you're doing. These are outside the purview of this post so we'll just keep things simple for now and work only with 4x4 matrices. Building on our previous Vec4 implementation, we can easily define a matrix class:
class __attribute__ ((aligned(16))) Matrix4
{
    public:

    union
    {
        float m[16];      
        Vec4 m_rows[4]; 
    };
};

Note that in using our already SIMD-ready vector class, this matrix class will be able to exploit SIMD parallelism automatically!

In Qi, all matrices are stored in row-major order. This decision was made for SIMD reasons. Oftentimes when working with SIMD, you have to think about how the underlying math operations must be applied in order to use the least amount of instructions as possible. To understand this, take a look at the different ways a matrix can be laid out in memory:

Column Major
0,  4,  8, 12
1,  5,  9, 13
2,  6, 10, 14
3,  7, 11, 15

Row Major
 0,  1,  2,  3
 4,  5,  6,  7
 8,  9, 10, 11
12, 13, 14, 15

Note that the numbers relate to indices in a 1D array.

To multiply a vector by this matrix (transform the vector) in a right-handed coordinate system you could perform the following math operation:

V' = M * V

where V is your vector, M is your matrix, and V' is your resulting vector. The math operations behind this are fairly simple:

foreach row in M
   V'[index] = M.row(index).dot(V)

If you setup your transformation like this and used a column-major matrix, you would have to first transpose the matrix before performing the dot product (to get the right data into the SIMD registers). You could just read the matrix indices directly and perform the multiplication yourself but this has two caveats:
  1. You're not getting any benefit from the SIMD nature of the vector's dot product function
  2. You're jumping all around the matrix in memory, invalidating your cache
If your matrix was instead already row-major, you wouldn't have to transpose the matrix at all! It can be a nice performance win throughout the engine for any transformations you'll be doing to keep your ordering consistent.

But wait, you might say, what about rendering APIs like OpenGL...they require all matrices to be in colum-major order! This is true. However, it's easy to simple transpose your matrix and give it to the rendering API when required (usually only once per frame). The overhead of doing this only once is very small and shouldn't be a performance concern.

Your first line of defense against a user incorrectly using your matrix class is how you design your matrix API. Sure, a matrix being in row-major or colum-major order will change how the instructions are performed underneath. However, I believe it best to just make the user oblivious to the underlying layout and define functions to perform things such as a transformation like this:

Vec4 transform(const Vec4 &v);

Using a function like this instead of overloading the * operator in your matrix class forces a user to perform all transformations with the transform() function. This is nice because a user doesn't need to know, is it V * M or M * V, does the matrix need to be transposed, etc. This way you can always ensure that your matrix and vector classes are being used in an optimal way.

To get around the issue of different third-party APIs needing your matrix in a specific order (such as OpenGL requiring column-major), you can easily define a couple of functions in your matrix class to return a matrix with the proper ordering:

Matrix4 getColumnMajor() const;
Matrix4 getRowMajor() const;

Using functions like this again hide the user from having to know what format the underlying matrix is stored in.

At a minimum, your matrix class should support the following operations: vector transformation, transpose, and pre/post multiply. All of these can easily be implemented by exploiting the underlying vector class, getting SIMD for free!

The latest version of Qi's matrix class is available here.

Quaternions

Like I said before, transformations are a very powerful tool used across any game. However, there can be some problems with always using matrices for rotations:
  • The cost of always storing a 4x4 matrix can add up with all of the objects a game requires
  • Transforming a vector by a matrix requires many mathematical instructions
  • Matrices can suffer from gimbal lock
  • It's difficult to find a rotation that is an interpolation between two known rotations if the rotations are stored as matrices
How do we get around these issues? Enter the quaternion! I won't talk too much about the theory behind quaternions as you can easily go read the volumes of information on them already available online. What is important to know however is that a quaternion is a 4D complex number where three components of the quaternion represent a 3D axis and the last component represents a rotation around that axis. Using quaternions, we can easily represent a rotation about an arbitrary axis using just four floating-point values!

Quaternions allow for the following useful features:
  • Vector transformation
  • Concatenation of rotations
  • Interpolation between two quaternions (spherical-linear interpolation, or slerp for short)
  • Normalization
  • Easy conversion into a 3x3 rotation matrix
Great, that solves all of the issues stated above that rotations matrices suffer from! Qi makes extensive use of quaternions to represent all transformations (view-cameras, animation bones, character orientations, etc.). A simple quaternion class can be built on the previous vector class mentioned above:
class Quaternion
{
    public:

    union
    {
        struct { float x, y, z, w; }; 
        Vec4 m_quat;                 
    };
};

Note that internally I'm representing the quaternion as a Vec4 object. Just like in the case of matrices, this is done to easily exploit the SIMD code already present in the vector class. Additionally, vectors and quaternions share several similar operations (such as dot products, normalization, and magnitudes).

I'll cover the use of quaternions a little later (probably when talking about viewing systems for the renderer). For now, just know that a quaternion should support the following operations:

  • Creation out of an axis and rotation angle
  • Inversion
  • Normalization
  • Vector transformation
  • Conversion into a matrix
  • Concatenation (multiplication)
  • Slerp
For reference, the latest source code for Qi's quaternion class is available here.

----

Well, that's it for the math classes defined in Qi. These three objects will come up again and again in future posts so it's a good idea to understand what they're for and some of the early decisions made about how they should work. Hopefully you can use the concepts presented here to implement your own math library! 

Until next time!

No comments:

Post a Comment