Jon Eskin's Website

Model View Projection Matrices

Note: this post is a work in progress. Feedback to my public inbox is welcome.

Introduction

When a 3D model is rendered using a modern graphics API, three things typically occur:

  1. the programmer defines the vertices that define the model and passes them to the graphics API
  2. a transformation matrix is created in the application and sent to the GPU
  3. the GPU calculates the screen position and color of all visible points on the object’s surface using the vertices and transform matrix

The transform matrix sent to the GPU is often a combination of separate model, view, projection transform matrices. This blog post will give a brief overview of how the process can work.

The images in this post were produced by model-view-projection, a C99 application written against OpenGL ES. It lets you fly a free roam camera around or tug on some on-screen controls, and you can see how they affect the values of the model, view, and projection matrices in real-time.

preview

Some familiarity with graphics APIs such as OpenGL and knowledge of linear algebra is helpful but not required to follow along.

Defining Vertices

Before you can draw an object, you must tell the graphics API the locations of its vertices. Vertices are normalized floating point values between -1.0f and 1.0f that define points on a surface.

For example, the following define the position of a cube in 3D space:

static GLfloat cube_verts[] =
{
    -0.5f, 0.5f, 0.5f,   // 0 -- front
    -0.5f, -0.5f, 0.5f,  // 1
    0.5f, 0.5f, 0.5f,    // 2
    0.5f, -0.5f, 0.5f,   // 3
    0.5f, 0.5f, -0.5f,   // 4 -- back
    0.5f, -0.5f, -0.5f,  // 5
    -0.5f, 0.5f, -0.5f,  // 6
    -0.5f, -0.5f, -0.5f, // 7
};

verts

Any values outside -1.0f and 1.0f are discarded because these points are supposed to be normalized. They serve to describe the location of each vertex relative to other vertices on the model, but give no context about their location in the world.

These floating points are bound to objects in the graphics API and are stored in GPU memory.

After the models are defined in terms of vertices, the graphics API maps the object to screen space through a series of matrix transforms. If you’re unfamiliar with linear algebra, the next section should help understand what transforms do, but it can be skipped if you’re already familiar.

Transform Matrices Act Like Functions

Consider the polynomial functions $f(x) = x^2$ and $g(x) = x + 2$. If we apply these in sequence, we obtain a composite function that maps each member of the set of real numbers to a single point on a line.

gfx

Matrix transformations represent functions that map one point in space to another point in space. (The mechanics of the operation are nicely summarized by Khan Academy.)

As such,when you see a matrix transformation like this:

$\begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$

You should be thinking of it as serving the same purpose of a polynomial function $f(x)$. This particular matrix scales up any vector in 3D space to make it twice as large and is analogous to $f(x)=2x$.

And when you see matrix multiplication like this:

$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$

You should think of it as the composition $f(g(x))$. The first function in this sequence is the identity matrix, which leaves any input unchanged and is analogous to $f(x) = x$.

This is the nature of how model, view, and transformation matrices act on points on the surfaces of models (fragments) in order to map them from their original 3D space to 2D screen space on your monitor.

Model Matrix

Transforming a drawn object by translating it, scaling it, or rotating it is accomplished by changing variables which are later passed to a function that produces the model matrix. You might notice in the image below that changing the translation, scaling, and rotation sliders causes the model matrix’s values to change.

model

In the demo application, values for translation, scaling, and rotation in the x, y, and z directions are stored in a global variable.


struct orientation
{
    float tx;
    float ty;
    float tz;
    float sx;
    float sy;
    float sz;
    float rx;
    float ry;
    float rz;
};
...
static struct orientation cube_transform;

Eventually, these get wired up to the UI with a slider.


nk_layout_row_dynamic(ctx, 32, 2);
nk_label(ctx, "Scale x", NK_TEXT_ALIGN_LEFT | NK_TEXT_ALIGN_MIDDLE);
nk_slider_float(ctx, 0, &cube_transform.sx, 5.0f, 0.01f);

The model, view, and projection matrices are often constructed by well-known graphics libraries. In this application, the library is cglm. It provides functions which take parameters (like the x scaling float wired up to our UI) and turn them into a matrix in the graphics API’s expected format which you can see below.

cube_scale = glms_scale_make((vec3s) {{
            cube_transform.sx,
            cube_transform.sy,
            cube_transform.sz }});

cube_model = glms_mat4_mulN((mat4s *[]) {
        &cube_translate,
        &cube_rotate_z,
        &cube_rotate_y,
        &cube_rotate_x,
        &cube_scale },
    5);

View Matrix

A view matrix in 3D rendering is a mathematical representation of the position and orientation of the virtual camera in the 3D scene. It defines the position and orientation of the camera in relation to the 3D world, and is used in conjunction with the model and projection matrices to transform the vertices of an object into the 2D screen space of the rendering engine. This allows the object to be accurately rendered on the screen from the perspective of the virtual camera.

Changing the 3D position of the camera (sometimes referred to as the “eye”), or changing the direction the camera is looking (sometimes referred to as the “center”) is done by modifying 3D vectors which are parameters to the function that build the view matrix. In cglm, this is glm_lookat:

/*!
 * @brief set up view matrix
 *
 * NOTE: The UP vector must not be parallel to the line of sight from
 *       the eye point to the reference point
 *
 * @param[in]  eye    eye vector
 * @param[in]  center center vector
 * @param[in]  up     up vector
 * @param[out] dest   result matrix
 */
CGLM_INLINE
void
glm_lookat(vec3 eye, vec3 center, vec3 up, mat4 dest) {
#if CGLM_CONFIG_CLIP_CONTROL & CGLM_CLIP_CONTROL_LH_BIT
  glm_lookat_lh(eye, center, up, dest);
#elif CGLM_CONFIG_CLIP_CONTROL & CGLM_CLIP_CONTROL_RH_BIT
  glm_lookat_rh(eye, center, up, dest);
#endif
}

In the gif below, the free roam camera is implemented by changing the camera’s position in response to keyboard and mouse input. You can observe that when the camera pans around the cube, only the view transform is changing.

view

Projection Matrix

The view matrix defines the location of the observer and where they are looking, but that’s not enough to convert their line of sight to a 2D image. You also need to know the angle of how far they can see into their periphery, how close an object can come to their nose before it drops out of sight, and how far into the horizon they can see.

These are all described by the projection matrix. In many graphics libraries you specify a field of view, distance to a near plane, and distance to a far plane, and the graphics library will use these to draw a frustum pointing from the camera’s eye to the location that it’s looking. I had some trouble with CGLM’s implementation so I opted to hand roll one instead.

mat4s
projection(float FOV, float AspectRatio, float Near, float Far)
{
    mat4s result = GLMS_MAT4_ZERO;
    float Cotangent = 1.0f / tanf(FOV * (GLM_PI / 360.0f));

    result.raw[0][0] = Cotangent / AspectRatio;
    result.raw[1][1] = Cotangent;
    result.raw[2][3] = -1.0f;
    result.raw[2][2] = (Near + Far) / (Near - Far);
    result.raw[3][2] = (2.0f * Near * Far) / (Near - Far);
    result.raw[3][3] = 0.0f;

    return result;
}

The word “frustum” comes from the same Latin word meaning a removed slice. It is a 3D shape that is aligned with the oberserver’s eye and serves to capture a portion of 3D world space which will be rendered to the screen. That is, anything that is inside the current frustum may be drawn, but anything outside of it will not.

You’ll often see them shown as a square shape, such as the yellow frustum in the image below, but they can have different shapes as well.

projection

Drawing the scene

To form the final matrix transformation that will determine what is drawn to the screen, the model, view, and projection matrices are typically multiplied together in the application code, which means those calculations are being done by the CPU. If you go digging through math libraries, you’ll see they often use instruction set extensions where available to calculate the transformation more quickly and efficiently. Below you can see CGLM’s matrix multiplication function, which checks for extensions that can speed up the operation.

/*!
 * @brief multiply m1 and m2 to dest
 *
 * m1, m2 and dest matrices can be same matrix, it is possible to write this:
 *
 * @code
 * mat4 m = GLM_MAT4_IDENTITY_INIT;
 * glm_mat4_mul(m, m, m);
 * @endcode
 *
 * @param[in]  m1   left matrix
 * @param[in]  m2   right matrix
 * @param[out] dest destination matrix
 */
CGLM_INLINE
void
glm_mat4_mul(mat4 m1, mat4 m2, mat4 dest) {
#ifdef __AVX__
  glm_mat4_mul_avx(m1, m2, dest);
#elif defined( __SSE__ ) || defined( __SSE2__ )
  glm_mat4_mul_sse2(m1, m2, dest);
#elif defined(CGLM_NEON_FP)
  glm_mat4_mul_neon(m1, m2, dest);
#else
  float a00 = m1[0][0], a01 = m1[0][1], a02 = m1[0][2], a03 = m1[0][3],
        a10 = m1[1][0], a11 = m1[1][1], a12 = m1[1][2], a13 = m1[1][3],
        a20 = m1[2][0], a21 = m1[2][1], a22 = m1[2][2], a23 = m1[2][3],
        a30 = m1[3][0], a31 = m1[3][1], a32 = m1[3][2], a33 = m1[3][3],

        b00 = m2[0][0], b01 = m2[0][1], b02 = m2[0][2], b03 = m2[0][3],
        b10 = m2[1][0], b11 = m2[1][1], b12 = m2[1][2], b13 = m2[1][3],
        b20 = m2[2][0], b21 = m2[2][1], b22 = m2[2][2], b23 = m2[2][3],
        b30 = m2[3][0], b31 = m2[3][1], b32 = m2[3][2], b33 = m2[3][3];

  dest[0][0] = a00 * b00 + a10 * b01 + a20 * b02 + a30 * b03;
  dest[0][1] = a01 * b00 + a11 * b01 + a21 * b02 + a31 * b03;
  dest[0][2] = a02 * b00 + a12 * b01 + a22 * b02 + a32 * b03;
  dest[0][3] = a03 * b00 + a13 * b01 + a23 * b02 + a33 * b03;
  dest[1][0] = a00 * b10 + a10 * b11 + a20 * b12 + a30 * b13;
  dest[1][1] = a01 * b10 + a11 * b11 + a21 * b12 + a31 * b13;
  dest[1][2] = a02 * b10 + a12 * b11 + a22 * b12 + a32 * b13;
  dest[1][3] = a03 * b10 + a13 * b11 + a23 * b12 + a33 * b13;
  dest[2][0] = a00 * b20 + a10 * b21 + a20 * b22 + a30 * b23;
  dest[2][1] = a01 * b20 + a11 * b21 + a21 * b22 + a31 * b23;
  dest[2][2] = a02 * b20 + a12 * b21 + a22 * b22 + a32 * b23;
  dest[2][3] = a03 * b20 + a13 * b21 + a23 * b22 + a33 * b23;
  dest[3][0] = a00 * b30 + a10 * b31 + a20 * b32 + a30 * b33;
  dest[3][1] = a01 * b30 + a11 * b31 + a21 * b32 + a31 * b33;
  dest[3][2] = a02 * b30 + a12 * b31 + a22 * b32 + a32 * b33;
  dest[3][3] = a03 * b30 + a13 * b31 + a23 * b32 + a33 * b33;
#endif
}

The final model-view-projection matrix constitutes a function that is universally applied to all vertices on a model. Across all models in any given scene there are tons of vertices, so doing all this work serially on the CPU would be extremely slow. Instead, graphics cards contain hardware-level acceleration for highly parallelized matrix multiplication.

The programmer utilizes it by creating a shader program, such as the one below, that sits in GPU memory and is executed on the GPU every frame.

static const char
frustum_vert_shader[] =
    "precision mediump float;"
    "uniform mat4 mvp;"
    "attribute vec4 a_position;"
    "attribute vec4 a_color;"
    "varying vec4 v_color;"
    "void main()"
    "{"
    "gl_Position = mvp * a_position;"
    "v_color = a_color;"
    "}";

You might notice that technically we could pass in 3 (or any number) of mat4s to the GPU and calculate them there instead of on the CPU. But that would not be an effective use of resources, because the shader program is executed individually on each vertex, and you would be recalculating the same matrix transformations many times.

Conclusion

I hope this was a helpful step to understanding how 3D rendering works.

If you would like to learn more, I’ve found the following resources to be helpful:

If you have anything to improve or add, shoot an email to my public inbox! Thanks for reading.

Thanks to Sam Cho and Stephen Eskin for helping with editing