Meta Quest headsets let you write 3D programs in C or C++, using OpenGL ES or Vulkan as a rendering backend, and deploy them using the XR Mobile SDK.
I bought a Quest 3 at launch, and I've benefited a lot from studying the SDK and experimenting with the sample programs:
- It gave me a thorough understanding of the OpenXR specification, from its design intent to the low-level details of how it is implemented on the hardware.
- It significantly improved my understanding of OpenGL ES. I've skirted by with much less knowledge when writing other 3D programs in the past, but it was impossible for me to modify the SDK examples and get them to actually run correctly without filling in many knowledge gaps.
- I learned a lot about how to structure programs in C and C-flavored C++ from the simple & readable SDK example programs, which were written by extremely talented graphics programmers.
- It was a lot of fun! The best projects are the ones you're motivated to work on, and bringing your programs to life in VR is such a cool experience.
Given how powerful these devices are, and how much fun they are to hack on, it's kind of crazy to me how little community or discourse there is online surrounding them.
In this blog post I'll introduce how developing on Quest devices works. I'll touch on the inner workings of the SDK, cover the structure of sample projects, and go over a small set of changes I made to one of them.
Meta XR Mobile SDK
The SDK includes a collection of small programs, each consisting of an NDK project that focuses on a particular capability of the device.
In total, you'll find 18 such projects in the SDK.
For people who are curious about VR development, enjoy low-level programming, and have at least glanced at the core topics covered by OpenGL tutorials that are floating around online (1,2), these devices are a slam dunk for hands-on learning.
Meta Quest = Android + Beefy GPU + Sensors
Meta Quest devices run a modified version of Android operating system. I think it's helpful to build an understanding of how graphics work in the Android ecosystem at large to gain a deeper understanding of how Quest devices work.
There are three ways you can draw to the screen in Android-land: with the Canvas API, OpenGL ES, and Vulkan.
The Canvas API is built upon the Skia Graphics Library, a cross-platform rendering engine. Android programs that use the Canvas API typically use the official SDK or a cross-platform toolkit like Flutter.
Such toolkits offer pre-built UI components that facilitate basic 2D rendering. They offer solutions for simple graphics, text rendering, and basic animations.
Instead of using the Canvas, you can opt to draw by directly calling into the OpenGL ES and Vulkan runtimes on the device. If you want this level of control over drawing, you probably want to use the "Native Development Kit" or NDK.
Unlike with the traditional SDK, the NDK has you write application code in C or C++ and bridge to it with a very small amount of boilerplate; basically a single Java entry point and a handful of config files. This is the approach taken by the Mobile XR SDK.
Minimal Complexity
The example projects make minimal use of C++ language features. You won't find elaborate abstractions, excessive indirection, or extensive use of templates.
For each sample project, you get the impression that the developer knew exactly what they wanted to make and just wrote out a common sense implementation of that thing. Older projects are largely self-contained, while later ones started to factor out re-used code into libraries external to their respective projects.
The resulting code mostly takes the form of structs and functions and that favor clarity and readability. You can tell what things do by looking at them.
A benefit of this readability is that you end up build a lot of intuition on "how VR works" by making superficial observations. Take for example the following code from the main render loop of one of the samples.
typedef struct XrPosef {
XrQuaternionf orientation;
XrVector3f position;
} XrPosef;
// ...
XrPosef xfLocalFromEye[NUM_EYES];
for (int eye = 0; eye < NUM_EYES; eye++) {
// LOG_POSE( "viewTransform", &projectionInfo.projections[eye].viewTransform );
XrPosef xfHeadFromEye = projections[eye].pose;
XrPosef_Multiply(&xfLocalFromEye[eye], &xfLocalFromHead, &xfHeadFromEye);
XrPosef xfEyeFromLocal = XrPosef_Inverse(xfLocalFromEye[eye]);
XrMatrix4x4f viewMat{};
XrMatrix4x4f_CreateFromRigidTransform(&viewMat, &xfEyeFromLocal);
const XrFovf fov = projections[eye].fov;
XrMatrix4x4f projMat;
XrMatrix4x4f_CreateProjectionFov(&projMat, GRAPHICS_OPENGL_ES, fov, 0.1f, 0.0f);
frameIn.View[eye] = OvrFromXr(viewMat);
frameIn.Proj[eye] = OvrFromXr(projMat);
}
It becomes immediately clear when reading the above that the position for each eye is independently tracked, and as you might guess, independent frames are rendered for each eye.
You can verify this by looking at that project's vertex shader (seen below) and note that each eye gets its own matrix transforms:
#define NUM_VIEWS 2
#define VIEW_ID gl_ViewID_OVR
#extension GL_OVR_multiview2 : require
layout(num_views=NUM_VIEWS) in;
in vec3 vertexPosition;
in vec4 vertexColor;
uniform mat4 ModelMatrix;
uniform vec4 ColorScale;
uniform vec4 ColorBias;
uniform SceneMatrices
{
uniform mat4 ViewMatrix[NUM_VIEWS];
uniform mat4 ProjectionMatrix[NUM_VIEWS];
} sm;
out vec4 fragmentColor;
void main()
{
gl_Position = sm.ProjectionMatrix[VIEW_ID] * ( sm.ViewMatrix[VIEW_ID] * ( ModelMatrix * vec4( vertexPosition, 1.0 ) ) );
fragmentColor = vertexColor * ColorScale + ColorBias;
}
In hindsight, it should be obvious that this is something you do for VR, but I hadn't really thought about it. I find that the code conveys the concept more clearly and completely than words.
Structure of Sample Projects
Below is an overview of some representative files of a sample project. To build one of the same projects, you can grab a copy of Android Studio Bumblebee (2021.1.1) or earlier, and open settings.gradle. I ran projects in that IDE and worked on them in a separate text editor.
XrVirtualKeyboard
├── Projects
│ └── Android
│ ├── AndroidManifest.xml
│ ├── build
│ ├── build.bat
│ ├── build.gradle
│ ├── build.py
│ ├── jni
│ │ ├── Android.mk
│ │ └── Application.mk
│ ├── local.properties
│ └── settings.gradle
├── Src
│ ├── VirtualKeyboardModelRenderer.cpp
│ ├── VirtualKeyboardModelRenderer.h
│ ├── XrHandHelper.h
│ ├── XrHelper.h
│ ├── XrRenderModelHelper.cpp
│ ├── XrRenderModelHelper.h
│ ├── XrVirtualKeyboardHelper.cpp
│ ├── XrVirtualKeyboardHelper.h
│ └── main.cpp
├── assets
│ └── panel.ktx
├── java
│ └── MainActivity.java
└── res
└── values
└── strings.xml
Android.mk contains the Makefile that you'll be the most interested in. When you add source code or libraries, you'll be changing this file. They generally look like this:
LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)
LOCAL_MODULE := xrvirtualkeyboard
include ../../../../cflags.mk
LOCAL_C_INCLUDES := \
$(LOCAL_PATH)/../../../../../SampleCommon/Src \
$(LOCAL_PATH)/../../../../../SampleXrFramework/Src \
$(LOCAL_PATH)/../../../../../1stParty/OVR/Include \
$(LOCAL_PATH)/../../../../../1stParty/utilities/include \
$(LOCAL_PATH)/../../../../../3rdParty/stb/src \
$(LOCAL_PATH)/../../../../../3rdParty/khronos/openxr/OpenXR-SDK/include \
$(LOCAL_PATH)/../../../../../3rdParty/khronos/openxr/OpenXR-SDK/src/common
#
LOCAL_SRC_FILES := ../../../Src/main.cpp \
../../../Src/VirtualKeyboardModelRenderer.cpp \
../../../Src/XrRenderModelHelper.cpp \
../../../Src/XrVirtualKeyboardHelper.cpp \
# include default libraries
LOCAL_LDLIBS := -llog -landroid -lGLESv3 -lEGL
LOCAL_STATIC_LIBRARIES := samplexrframework
LOCAL_SHARED_LIBRARIES := openxr_loader
include $(BUILD_SHARED_LIBRARY)
$(call import-module,SampleXrFramework/Projects/Android/jni)
$(call import-module,OpenXR/Projects/AndroidPrebuilt/jni)
The entry point of each project contains OpenGL and OXR initialization before entering the event loop.
Graphics initialization is done with EGL in the same manner as on other embedded systems (or the desktop):
struct ovrEgl {
void Clear();
void CreateContext(const ovrEgl* shareEgl);
void DestroyContext();
#if defined(XR_USE_GRAPHICS_API_OPENGL_ES)
EGLint MajorVersion;
EGLint MinorVersion;
EGLDisplay Display;
EGLConfig Config;
EGLSurface TinySurface;
EGLSurface MainSurface;
EGLContext Context;
#elif defined(XR_USE_GRAPHICS_API_OPENGL)
HDC hDC;
HGLRC hGLRC;
#endif //
};
// ...
#if defined(XR_USE_GRAPHICS_API_OPENGL_ES)
void ovrEgl::CreateContext(const ovrEgl* shareEgl) {
if (Display != 0) {
return;
}
Display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
ALOGV(" eglInitialize( Display, &MajorVersion, &MinorVersion )");
eglInitialize(Display, &MajorVersion, &MinorVersion);
// Do NOT use eglChooseConfig, because the Android EGL code pushes in multisample
// flags in eglChooseConfig if the user has selected the "force 4x MSAA" option in
// settings, and that is completely wasted for our warp target.
const int MAX_CONFIGS = 1024;
EGLConfig configs[MAX_CONFIGS];
EGLint numConfigs = 0;
if (eglGetConfigs(Display, configs, MAX_CONFIGS, &numConfigs) == EGL_FALSE) {
ALOGE(" eglGetConfigs() failed: %s", EglErrorString(eglGetError()));
return;
}
// blah blah
Logging is done with a macro that delegates to NDK's provided logging facilities when the code is running on Android, otherwise it just falls back on printf.
ALOGV("Creating passthrough layer");
And OpenGL function calls use the common idiom of being wrapped in a macro that executes the OpenGL function, checks the global error message after execution, and prints an error message if an error flag is set after the line executes.
GL(glBindBuffer(GL_ARRAY_BUFFER, VertexBuffer));
Code that calls into OXR functionality work similar to OpenGL in a few ways.
OXR
is an error checking macro that behaves in the same manner as GL
.
And like with OpenGL, functions from the OpenXR specification are are dynamically linked to at runtime. This means you won't find their implementation in the SDK; they are running in their own address space outside of the program and are available to all applications on the device.
OXR(xrLocateViews(
app.Session,
&projectionInfo,
&viewState,
projectionCapacityInput,
&projectionCountOutput,
projections));
Making Modifications to Sample Projects
Some codebases can be very daunting, and it can take a long time before you feel comfortable making changes.
In this case, after skimming the code for a few of the samples, I was able to dive in and start poking at stuff.
One of the first things I did was to modify the spatial anchors project shown above to change the dimensions of the anchors and to draw a texture on one side of the cube.
Here are what the changes to make this work look like:
The dimensions and colors were single line changes in the event loop.
- persistedCube.Model *= OVR::Matrix4f::Scaling(0.01f, 0.01f, 0.05f);
+ persistedCube.Model *= OVR::Matrix4f::Scaling(0.05f, 0.05f, 0.005f);
- persistedCube.ColorBias = OVR::Vector4f(1, 0.5, 0, 1); // Orange
+ persistedCube.ColorBias = OVR::Vector4f(0.3, 0.2, 0.1, 1); // Brown
I dropped stb_image.h into the project and added some code that loads textures from images on the device and maintains a pool of the loaded textures which is cycled through during rendering.
#include "TexturePool.h"
#include <android/log.h>
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#include <android/asset_manager_jni.h>
#include <android/asset_manager.h>
std::vector<std::string> texture_filenames = {"NASA1.jpg", "NASA2.jpg", "NASA3.jpg", "NASA4.jpg", "NASA5.jpg", "NASA6.jpg", "NASA7.jpg", "NASA8.jpg" };
size_t texture_pointer = 0;
std::vector<GLuint> texture_ids;
void texture_pool_init(std::vector<std::string> filenames, AAssetManager *amgr) {
for (std::string filename : filenames) {
GLuint texture_id;
GL(glGenTextures(1, &texture_id));
texture_ids.push_back(texture_id);
GL(glBindTexture(GL_TEXTURE_2D, texture_id));
GL(glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER,
GL_LINEAR_MIPMAP_LINEAR));
GL(glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR));
GL(glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT));
GL(glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT));
GLenum format;
AAsset* asset = AAssetManager_open(amgr, filename.c_str(), AASSET_MODE_BUFFER);
if (asset == NULL) {
ALOGE("Failed to open %s", filename.c_str());
} else {
ALOGV("opened file %s", filename.c_str());
}
int width, height, channels;
stbi_set_flip_vertically_on_load(true);
void *data = stbi_load_from_memory((unsigned char*)AAsset_getBuffer(asset), AAsset_getLength(asset), &width, &height, &channels, 0);
if (data == NULL) {
ALOGE("Failed to load %s", filename.c_str());
} else {
ALOGV("loaded file %s", filename.c_str());
}
if (channels == 3) {
format = GL_RGB;
} else if (channels == 4) {
format = GL_RGBA;
} else {
fprintf(stderr, "Unsupported number of channels: %d\n", channels);
exit(EXIT_FAILURE);
}
GL(glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, format, GL_UNSIGNED_BYTE,
data));
GL(glGenerateMipmap(GL_TEXTURE_2D));
stbi_image_free(data);
}
}
// reset at the start of each frame to keep the same rendering order
void reset_texture_pointer() {
ALOGV("resetting texture from %u to 0",texture_pointer);
texture_pointer = 0;
}
// if more textures are requested than are available, wrap around
GLuint next_texture() {
size_t last_texture_idx = texture_pointer;
size_t next_texture_idx = (texture_pointer + 1) % texture_ids.size();
texture_pointer = next_texture_idx;
return texture_ids[last_texture_idx];
}
I added UVs to the stored vertices, and split out the indices into two groups: one group to be rendered with the a set of shaders that use colors, and another group to be rendered using a set of shaders that use textures.
void ovrGeometry::CreateCube()
{
struct ovrCubeVertices
{
signed char positions[8][4];
unsigned char colors[8][4];
float uvs[8][2];
};
static const ovrCubeVertices cubeVertices = {
// positions
{
{-127, -127, -127, +127},
{+127, -127, -127, +127},
{-127, +127, -127, +127},
{+127, +127, -127, +127},
{-127, -127, +127, +127},
{+127, -127, +127, +127},
{-127, +127, +127, +127},
{+127, +127, +127, +127}},
// colors
{
{0x00, 0x00, 0x00, 0xff},
{0xff, 0x00, 0x00, 0xff},
{0x00, 0xff, 0x00, 0xff},
{0xff, 0xff, 0x00, 0xff},
{0x00, 0x00, 0xff, 0xff},
{0xff, 0x00, 0xff, 0xff},
{0x00, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff}},
// uvs
{
{0.0f, 0.0f}, // Bottom back left corner
{1.0f, 0.0f}, // Bottom back right corner
{0.0f, 1.0f}, // Top back left corner
{1.0f, 1.0f}, // Top back right corner
{0.0f, 0.0f}, // Bottom front left corner
{1.0f, 0.0f}, // Bottom front right corner
{0.0f, 1.0f}, // Top front left corner
{1.0f, 1.0f}, // Top front right corner
}
};
static const unsigned short cube_indices[30] = {
0, 2, 1, 2, 3, 1, // back
6, 7, 2, 2, 7, 3, // top
4, 0, 5, 5, 0, 1, // bottom
0, 4, 2, 2, 4, 6, // left
5, 1, 7, 7, 1, 3 // right
};
static const unsigned short tex_indices[6] = {
4, 5, 6, 6, 5, 7, // front
};
VertexCount = 8;
ColorIndexCount = 30;
TextureIndexCount = 6;
IndexCount = 0;
VertexAttribs[0].Index = VERTEX_ATTRIBUTE_LOCATION_POSITION;
VertexAttribs[0].Size = 4;
VertexAttribs[0].Type = GL_BYTE;
VertexAttribs[0].Normalized = true;
VertexAttribs[0].Stride = sizeof(cubeVertices.positions[0]);
VertexAttribs[0].Pointer = (const GLvoid *)offsetof(ovrCubeVertices, positions);
VertexAttribs[1].Index = VERTEX_ATTRIBUTE_LOCATION_COLOR;
VertexAttribs[1].Size = 4;
VertexAttribs[1].Type = GL_UNSIGNED_BYTE;
VertexAttribs[1].Normalized = true;
VertexAttribs[1].Stride = sizeof(cubeVertices.colors[0]);
VertexAttribs[1].Pointer = (const GLvoid *)offsetof(ovrCubeVertices, colors);
VertexAttribs[2].Index = VERTEX_ATTRIBUTE_LOCATION_UV; // UV attribute
VertexAttribs[2].Size = 2;
VertexAttribs[2].Type = GL_FLOAT;
VertexAttribs[2].Normalized = false;
VertexAttribs[2].Stride = sizeof(cubeVertices.uvs[0]);
VertexAttribs[2].Pointer = (const GLvoid *)offsetof(ovrCubeVertices, uvs);
GL(glGenBuffers(1, &VertexBuffer));
GL(glBindBuffer(GL_ARRAY_BUFFER, VertexBuffer));
GL(glBufferData(GL_ARRAY_BUFFER, sizeof(cubeVertices), &cubeVertices, GL_STATIC_DRAW));
GL(glBindBuffer(GL_ARRAY_BUFFER, 0));
GL(glGenBuffers(1, &ColorIndexBuffer));
GL(glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ColorIndexBuffer));
GL(glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(cube_indices), cube_indices, GL_STATIC_DRAW));
GL(glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0));
GL(glGenBuffers(1, &TextureIndexBuffer));
GL(glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, TextureIndexBuffer));
GL(glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(tex_indices), tex_indices, GL_STATIC_DRAW));
GL(glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0));
}
I re-used the existing color shaders and added new ones for the texture face:
static const char CUBE_TEX_VERTEX_SHADER[] =
R"GLSL(
#define NUM_VIEWS 2
#define VIEW_ID gl_ViewID_OVR
#extension GL_OVR_multiview2 : require
layout(num_views=NUM_VIEWS) in;
in vec3 vertexPosition;
in vec2 vertexUv;
uniform mat4 ModelMatrix;
uniform SceneMatrices
{
uniform mat4 ViewMatrix[NUM_VIEWS];
uniform mat4 ProjectionMatrix[NUM_VIEWS];
} sm;
out vec2 UV;
void main()
{
gl_Position = sm.ProjectionMatrix[VIEW_ID] * ( sm.ViewMatrix[VIEW_ID] * ( ModelMatrix * vec4( vertexPosition, 1.0 ) ) );
UV = vertexUv;
}
)GLSL";
static const char CUBE_TEX_FRAGMENT_SHADER[] =
R"GLSL(
in vec2 UV;
out vec4 color;
uniform sampler2D Texture0;
void main()
{
color = vec4(texture( Texture0, UV ).rgb, 1.0);
}
)GLSL";
The main other substantive change was to add a new section to the frame rendering code to loop over the cubes and draw the texture faces:
GL(glUseProgram(Scene.TextureProgram.Program));
GL(glBindBufferBase(
GL_UNIFORM_BUFFER,
Scene.TextureProgram.UniformBinding[ovrUniform::Index::SCENE_MATRICES],
Scene.SceneMatrices));
if (Scene.TextureProgram.UniformLocation[ovrUniform::Index::VIEW_ID] >=
0) // NOTE: will not be present when multiview path is enabled.
{
GL(glUniform1i(Scene.TextureProgram.UniformLocation[ovrUniform::Index::VIEW_ID], 0));
}
for (auto c : Scene.CubeData) {
GL(glBindVertexArray(Scene.Cube.VertexArrayObject));
GLint loc = Scene.TextureProgram.UniformLocation[ovrUniform::Index::MODEL_MATRIX];
if (loc >= 0) {
GL(glUniformMatrix4fv(loc, 1, GL_TRUE, &c.Model.M[0][0]));
}
GL(glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, Scene.Cube.TextureIndexBuffer));
GL(glActiveTexture(GL_TEXTURE0));
GL(glBindTexture(GL_TEXTURE_2D, c.KaiTextureID));
GL(glDrawElements(GL_TRIANGLES, Scene.Cube.TextureIndexCount, GL_UNSIGNED_SHORT, nullptr));
}
glBindTexture(GL_TEXTURE_2D, 0);
GL(glBindVertexArray(0));
GL(glUseProgram(0));
What I'm trying to illustrate is that the code you're working with in this SDK looks and behaves very similarly to run of the mill OpenGL rendering code. For the fancy stuff, like per-eye matrix transforms, the sample programs give you a strong foundation to build on rather than you having to implement them from scratch.
In addition to the above changes to rendering spatial anchors, I also had fun messing around with XR controls. You call into the OpenXR bindings to do so, but it feels largely the same as dealing with game controller inputs, although these are more varied and much cooler.
There's a lot of overlap between the APIs, libraries, build systems used both in these projects and in other OpenGL code that has been authored over the last several decades - tutorials, example projects, open source games, and more. When you read those projects and learn from them, you eventually reach a plateau where it's unclear what to do next to continue learning. The answer here is just to continue getting exposure to more code and to continue building and thinking critically about what you write. This is the point where developing with Meta Quest becomes an extremely appealing.
I found it personally extremely cool to see how much is accomplished with code that is this simple. The code samples are doing interfacing with numerous sensors and 3D rendering on a tight frame budget, while maintaining at a high frame rate, on constrained hardware, using a less performant graphics backend for demonstrative purposes (using Vulkan would cause a large portion of the surface area of these programs to be dedicated to setup and bookkeeping).
You probably already know if you would enjoy hacking on one. If so, I'd definitely recommend picking one up. Quest 2's are pretty cheap these days, and you've always got the return period if you're not feeling it.