This is a first devlog entry for our upcoming game NANOVOID, a 2d sandbox space shooter with modular editable ships and fun and deep combat.

We're creating the game in our own Rust game engine (based on wgpu), and recently started working on UI. This meant a lot of need for quicky iteration, often tweaking values and testing every couple of seconds. Even after we greatly improved our incremental build speeds (now roughly around 3 seconds), it's still incredibly painful to tweak small values with the usual compile-run cycle (especially if the UI requires some setup in-game). We currently use the inline_tweak crate, which is great for tweaking individual constants, but it only works for constants, and requires them being wrapped in tweak!(...), which is a bit annoying if one just wants to tweak a random value somewhere while in the middle of the game.

Since our goal with the game was to eventually include modding support with Lua, we decided to use this as an opportunity to do this a bit earlier, and use Lua for our UI.

Our game engine was initially built around macroquad for rendering, but even as we evolved it to use raw wgpu we kept a very similar immediate mode drawing API. Our main changes were things like introducing a z_index parameter everywhere, changing y-down to y-up, and positioning things based on their center rather than top left. But fundamentally, our drawing API is very simple, and thus very easy to expose in Lua.

Here's an example of the API:

pub fn draw_circle(center: Vec2, r: f32, color: Color, z_index: i32);
pub fn draw_circle_outline(
    center: Vec2,
    radius: f32,
    thickness: f32,
    color: Color,
    z_index: i32,
);
pub fn draw_rect(center: Vec2, size: Vec2, color: Color, z_index: i32);
pub fn draw_rect_rot(
    center: Vec2,
    size: Vec2,
    rotation: f32,
    color: Color,
    z_index: i32,
);
pub fn draw_rect_outline(
    center: Vec2,
    size: Vec2,
    thickness: f32,
    color: Color,
    z_index: i32,
);
pub fn draw_rect_outline_rot(
    center: Vec2,
    size: Vec2,
    rotation: f32,
    thickness: f32,
    color: Color,
    z_index: i32,
);
// ... other shapes done in a similar way

One could use a builder pattern or descriptor structs to avoid having multiple variants of each function, but after a lot of experimentation it seemed that a few functions per shape are the most ergonomic.

This translates very easily into Lua (using the mlua crate), as

globals.set(
    "draw_rect",
    lua.create_function(
        |_, (center, size, color, z_index): (LuaVec2, LuaVec2, Color, i32)| {
            draw_rect(center.0, size.0, color, z_index);
            Ok(())
        },
    )?,
)?;

where LuaVec2 is simply a wrapper around glam's Vec2 since Rust's orphan rule prevents implementing traits for externally defined types, and mlua wants every type that is passed around to implement its UserData trait, hence pub struct LuaVec2(pub Vec2) and a large amount of macro boilerplate was born. But none of this is really a significant obstacle, and overall only took a few hours to expose all of the Vec2 API and all of our drawing functions.

One of the first things to try was to rewrite our cursor drawing code in Lua and hot reload this on every frame. By hot reload I of course mean std::fs:read_to_string called on every frame :) For a small scripts this is basically free, and saves extra time spent on doing file watching and stuff.

The resulting Lua code in the first iteration being the following:

local z = 200
local r = 16 * px + 5 * px * recoil

draw_circle_outline(mouse_world, r, 2 * px, RED, z)
draw_circle(mouse_world, 3 * px, RED, z)

local rev = clamp_scale(recoil, 0, 1, 0, PI / 2)
draw_revs(mouse_world, r, rev, BLUE, z)

Most of the variables are just globals exposed to mlua for convenience. All of this is ugly, but it works, and in our experience the main thing that matters is not breaking the flow with constant refactoring. Keeping a lot of convenient values in globals (such as mouse position, pixel size, delta, etc) makes this simpler.

globals.set("player", player_data)?;
globals.set("recoil", main_camera().recoil)?;
globals.set("mouse_world", mouse_world().lua())?;
globals.set("time", get_time() as f32)?;
globals.set("dt", delta())?;
globals.set("px", px())?;
globals.set("PI", PI)?;

Now the question is, how fast is this actually?

Performance

Disclaimer: All of these benchmarks are very rough and not scientific, they're meant to illustrate rough costs of things, not to be taken as exact measurements.

A relatively painless way of measuring performance in games is using tracy and just time how long certain blocks take. We usually don't care about performance at very granular level, since at the scale we're working on the stuff that shows up in the profiler is usually obvious, and the stuff that doesn't is so small it doesn't matter.

The main question was, how many shapes can we draw from Lua until it starts showing up?

As a baseline, the current Lua code runs in around 400us. This isn't amazing, but we're also creating a new Lua interpreter context and defining hundreds of functions and constants on every frame. But this is something we can fix later before shipping the game, and will obviously get faster. What we want to measure though is how long the actual drawing takes, because that might decide how much we do in Lua, how much we do in Rust, and just overall how to think about the costs of this whole thing.

As the very first test, let's just do something stupid like increment the cursor radius in a loop.

-- same as before
local r = 16 * px + 5 * px * recoil

for i = 1, 1000 do
    r = r + 1
end

draw_circle(..., r, ...)

Maybe you're thinking this is stupid and will get optimized. We can actually see that quite easily, just running the game with Tracy turned on, timing the whole Lua UI span, and hot reloading the script changing the number of iterations.

At low numbers we can just see the circle grow bigger, then it gets too big to be visible, then around 10000 iterations something moves a bit in the profiler, at 100000 it roughly doubles/triples to ~800us to 1.2ms, and at 1000000 iterations we're at roughly 5ms. This isn't really meant to be an amazingly accurate benchmark of Lua, but we do see something, and the numbers seem reasonable. Maybe using a different backend would yield different numbers (using Lua 5.4), maybe not. Right now we don't really care, all we care is we have some numbers that look mostly reasonable.

Next up we can swap the r = r + 1 for the actual drawing, so just copy pasting the draw_circle into the loop, and decreasing the number of iterations because it's likely not going to be a million before this blows up.

for i = 1, 100 do
  draw_circle(mouse_world, 3 * px, RED, z)
end

And we go back to increasing the number, at around 1000 things start to slow down, we're at around 1.6ms per frame. At 10000 circles we're at 14ms.

tracy img

Here's a zoomed in view on the spans for individual circles.

zoomed in circles

The filled in parts with "circ" on them is the time spent in Rust, the empty spaces inbetween is time spent in Lua and the interop.

Just for comparison we can try the same with a rectangle, where the time spent in Rust is smaller since the code required to generate a rectangle mesh is noticeably less than a circle. But we're still roughly on the scale of "1000 is probably fine, 10000 is not".

Note that we're not doing anything smart here. Our draw_circle and draw_rect generate a full mesh on each call. In the case of a circle this is going around in 40 segments to build a 40-sided polygon from triangles, and push this into a global draw queue.

As a rough conclusion, drawing tens of thousands of shapes per frame from Lua is probably not the best idea, but at least during development we don't have to worry about any of this, and if there is some part of the UI that turns out to actually show up in the profiler, we can just write that as a Rust function and call it from Lua.

Looking closely at Tracy, the "overhead" of a single "draw call" (not a GPU draw call, but calling a drawing function in Rust from Lua code) is roughly 1us. Maybe this is something that needs to be looked at a bit in the future.

What is worth noting is that there's a very measurable difference between doing these two variants

-- Slower
for i = 1, 10000 do
  draw_rect(mouse_world, vec2(5, 5), GREEN, 200)
end

-- Faster
local size = vec2(5, 5)

for i = 1, 10000 do
  draw_rect(mouse_world, size, GREEN, 200)
end

How big of a difference? The first variant takes around 15ms, the second one around 8.5ms, so almost half. The vec2 function is simply a Vec2::new from glam exposed as the following:

globals.set(
    "vec2",
    lua.create_function(|_, (x, y): (f32, f32)| Ok(LuaVec2(vec2(x, y))))?,
)?;

This is probably a decent indicator that the overhead of calling from Lua to Rust isn't completely free, which is quite interesting, and maybe even though our drawing code is crappy, it doesn't actually matter.

Conclusion

Even though this whole experiment took about a day I must say the results are very interesting. I kind of regret we haven't tried this approach sooner. While Lua isn't the most fun or interesting or comfortable or modern or ergonomic language, it's undeniable how quickly can one iterate with true hot reloading. We did some experiments in the past with other hot-reloadable languages (e.g. Common Lisp), but those had other issues, most notably the time we spent building our engine and comfy drawing APIs and the whole rest of Rust's ecosystem.

One thing that's probably worth investigating further is languages that compile to Lua, such as Fennel. While our modders will likely want to stick to Lua, we don't have to restrict ourselves to making reasonable choices for our own code :)

We're also likely going to start working on exposing egui in Lua in a nice and easy way, as a lot of our tooling and debug UIs are built in egui, which while it is great and has a lot of features it's anything but fun and comfortable to work with, especially without hot reloading the UI code. It'll be interesting to see how far this can be pushed, especially as one of the big features planned in NANOVOID is the ship editor, which even though it's currently small is a very non-trivial amount of logic and UI, and will only grow.


If you liked this article consider supporting us by wishlisting NANOVOID on Steam. It helps us a lot with visibility and we really appreciate it!


--> Discussion on Reddit <--