Building an entity-component-system in Rust

For as long as I can remember, I’ve been interested in game engines and their internals. My last foray into engine development was writing a (very limited and inefficient) renderer in C# to get my head around Binary Space Partitioning (BSP). More recently, I’ve been interested in learning Rust to implement other parts of an engine using a systems programming language that is more appropriate for the task.

One of my all-time favourite GDC talks is Tim Ford’s 2017 presentation on Overwatch Gameplay architecture and Netcode which talks about Blizzard’s implementation of the Entity Component System (ECS) architecture. Having only been exposed to object-oriented game logic in Valve’s GoldSrc engine - the composition over inheritance approach of ECS appeals greatly to me.

Over the last few weeks, I have been attempting to build my own ECS implementation in Rust. This is only my second project in Rust, so I am definitely still learning, though I am starting to feel more comfortable with the language overall.

Constraints

For this project, I am working within the following constraints:

Separation of concerns - ECS logic should not need to know all the Component types ✅
No external crates ✅
Where possible, Component items should be stored contiguously on the heap ✅
Does not need to be thread safe ✅
Needs to implement Clone ❌ (Relaxed this for now. I think this is possible, but not with automatic derivation)
Avoid unsafe code ✅

Because this is a personal project, I am “optimising for learning” (read: making life hard for myself by design) which informs some of the above constraints. I have also been avoiding referencing other implementations of this pattern in Rust (and there are quite a few), so I can both find all the trip hazards (learn) and explore my own approaches to the problem.

A contrived example

To exercise my API, I am going to simulate a water bucket with a fixed capacity. When the bucket reaches its capacity it magically empties itself. Exciting stuff.

To implement this, I will implement:

FiniteBucketComponent - A component to represent a bucket with fixed capacity
a_system - A cleverly named system that operates on all the FiniteBucketComponents and fills the bucket by 1 every tick.

Strictly speaking the Bucket should be the entity which is composed of a single component like CapacityComponent, but the point here is to build something minimal to exercise my API - and at this point I’ve already exhausted my creative genius.

Side quest: Polymorphism in Rust

Let’s dig a bit deeper into the first constraint on my list - the separation of concern between the ECS implementation and the game code. I would like to be able to use the engine crate across multiple projects without having it need to know all the Component types.

One way to achieve the first of these constraints would be to make the World generic, and have user-defined code provide an enum type. I’m not 100% sure if I can constrain a generic type in rust to be an enum - but in any case I feel like this loosens the API and invites the caller to use the interface in unintended ways.

I also expect that the engine will need to register a number of internal components - and at this point am not sure whether those will live in the same World or a separate one.

To keep my options open, I’ve gone down a path of using a combination of generics and the Any type internally instead. That’s OK, I’m optimising for learning anyway, right?

In Rust, this means we are handling trait objects which do not have a known sizes at compile time (ie: they implement the ?Sized trait).

As many of the foundational types in Rust require objects to implement the Sized trait (that is, they have a known size at compile time), this means we need to lean on Box<T> which is a smart pointer to your trait object on the heap. And in the case that your calling code needs to work with the original type - the code performing a downcast to their actual type needs to happen in a code path that does know the type at compile time.

The first iteration of my component lookup used the following typing:

HashMap<TypeId, Vec<Box<dyn Any>>>

This is not ideal for a couple of reasons:

Whilst a Vec stores values contiguously in memory, its values are smart pointers calling off to other random heap allocations - this is crappy for CPU cache performance
Downcasting to the known type means iterating the vector and performing a downcast on each element individually

How did I end up here? This was not my first tango with Box<T>, but was with Any - and I found myself getting very confused as I kept referencing material written prior to the introduction of dyn keyword in Rust 1.27 that suggested Box<Any> was valid usage (it is not).

Fortunately, shortly after I muddled through everything and got it working - I was quickly able to move to the following, much more sensible typing:

HashMap<TypeId, Box<dyn Any>>

Which probably should have been the obvious solution to begin with. By boxing the Vec<T: Component>:

Component values are now contiguous in memory, which significantly improves performance; and
Code to access this only requires a single downcast and makes this straightforward to work with.

Remember: I’m optimising for learning ;)

Victory! First iteration of the API

After the above yak shaving, my functional API looked like the following:

pub trait Component: Clone + Any {}
pub type EntityId = u32;

// Not actually implemented as a trait - just illustrating the API surface
pub trait WorldState {
    fn new() -> Self;
    fn create_entity(&mut self) -> EntityId;
    fn attach_component<T: Component>(&mut self, entity_id: EntityId, component: T);
    fn borrow_components<T: Component>(&self) -> Vec<&T>;
    fn borrow_components_mut<T: Component>(&mut self) -> Vec<&mut T>;
}

pub struct World {
    entity_id_sequence: EntityId,
    component_storage: HashMap<TypeId, Box<dyn Any>>,
    component_entity: HashMap<(EntityId, TypeId), usize>
}

Looks nice, and it is! Here is the system that exercises the API:

pub fn a_system(world: &mut World, _: u64) {
    for finite_bucket_component in  world.borrow_components_mut::<FiniteBucketComponent>() {
        finite_bucket_component.bucket_level = finite_bucket_component.bucket_level
            .add(1).rem_euclid(finite_bucket_component.bucket_capacity);
    }
}

My bucket overfloweth with smugness.

Okay, so what next?

Well that’s exciting, I suppose. However, systems that model slightly more complex behaviour need to operate on multiple components on the same entity.

Let’s extend the contrived example to include some kind of counter that tracks how many times the bucket has been filled. Instead of adding a field to the existing FiniteBucketComponent - let’s create a second component named, uh, OverflowCounterComponent to store that information.

This will let us extend our API to include methods to look at peer components of another type on the same entity. Which might (does) look something like this (for now):

pub trait WorldState {
	// snip - none of the existing methods have changed
	fn borrow_peer<T: Component, S: Component>(&mut self, peer: &T) -> Option<&S>;
	fn borrow_peer_mut<T: Component, S: Component>(&mut self, peer: &T) -> Option<&mut S>;
}

And we can update our system to look like this:

pub fn a_system(state: &mut World, _: u64) {
    for bucket_component in state.borrow_components_mut::<FiniteBucketComponent>() {
        bucket_component.bucket_level = bucket_component.bucket_level
            .add(1).rem_euclid(bucket_component.bucket_capacity);

        if let Some(counter_component) = state.borrow_peer_mut::<FiniteBucketComponent, OverflowCounterComponent>(bucket_component) {
            counter_component.counter = counter_component.counter.wrapping_add(1);
        }
    }
}

And… bugger. Our old friend the borrow checker disagrees with this:

error[E0499]: cannot borrow `*state` as mutable more than once at a time
  --> game/src/game.rs:38:42
   |
35 |     for bucket_component in state.borrow_components_mut::<FiniteBucketComponent>() {
   |                             ------------------------------------------------------
   |                             |
   |                             first mutable borrow occurs here
   |                             first borrow later used here
...
38 |         if let Some(counter_component) = state.borrow_peer_mut::<FiniteBucketComponent, OverflowCounterComponent>(bucket_component) {
   |                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ second mutable borrow occurs here

An extremely common error for a new Rust developer - but the first time I have tried this access pattern of re-borrowing in an iterator in this way.

Now my bucket doth not overflow at all :(

Options to resolve

Write idiomatic Rust - this feels like it must be a fairly common pattern, and I’m confident I’m missing something common/obvious here.
Lifetimes - Am I able to express the lifetime of the method calls in such a way to suggest the struct outlives the return values? I’m still not at the point with Rust where I can mentally reason about lifetimes effectively. This feels like it is probably the correct path.
Interior mutability - I am comfortable with this pattern from my first project, this would solve the problem by ~~cheating~~ putting everything behind immutable references.

If the only tool you have is a hammer…

The logic for my engine is only intended to operate in a single thread which will clone the previous World, mutate the state, then only share access via immutable references.

So the current API looks and smells like idiomatic Rust (because it is) and I like it this way.

If you are going to modify data - you need to hold the mutable reference to it. All this is guaranteed by the borrow checker and my API surface feels like the standard library.

Buuuuuuuuuut, having implemented interior mutability in my first Rust project - I know that it would give me a way around this problem, even if there’s a simpler way. So let’s ignore that ominous music playing in the background and take a shortcut.

Without further ado, the updated API:

pub trait Component: Clone + Any {}
pub type EntityId = u32;

// Not actually implemented as a trait - just illustrating the API surface
pub trait WorldState {
    fn new() -> Self;
    fn finalise(&self);
    fn create_entity(&self) -> EntityId;
    fn attach_component<T: Component>(&self, entity_id: EntityId, component: T);
    fn borrow_components<T: Component>(&self) -> Vec<&T>;
    fn borrow_components_mut<T: Component>(&self) -> Vec<&mut T>;
    fn borrow_peer<T: Component, S: Component>(&self, peer: &T) -> Option<&S>;
    fn borrow_peer_mut<T: Component, S: Component>(&self, peer: &T) -> Option<&mut S>;
}

pub struct World {
    finalised: Mutex<bool>,
    entity_id_sequence: Mutex<EntityId>,
    component_storage: RwLock<HashMap<TypeId, Box<dyn Any>>>,
    component_entity: RwLock<HashMap<(EntityId, TypeId), usize>>
}

All our write paths only take &self and we use Mutex and RwLock primitives for implementing interior mutability.

Note the introduction of finalise() and it’s corresponding state - interior mutability means any type of reference to this value can call mutate it, so we provide a one way flag to prevent future modification. Vomit.

Don’t worry though, this is where all the wheels have fallen off my wagon. The semantics of RwLock don’t seem to be helping me here - the intermediate LockResult gets bound locally to the function scope and causes a different error from our old friend:

error[E0515]: cannot return value referencing temporary value
   --> game/src/engine/ecs.rs:135:17
    |
133 |         match self.component_storage.write().unwrap().entry(TypeId::of::<T>()) {
    |               --------------------------------------- temporary value created here
134 |             Entry::Occupied(entry) => {
135 |                 entry.into_mut().downcast_mut::<Vec<T>>().unwrap().iter_mut().collect()
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function

I have hit this issue in this section of code before, but that was because earlier iterations of the code were constructing intermediate Vectors and not returning ownership of them to the caller.

Those issues were easily fixed by collecting references into a new Vector and returning ownership of that to the caller (though this violates my constraints, can you see it? I think this might be trivial to fix though).

But at this point, I’m a little stumped. I don’t see a way to make this more complex case of interior mutability work. I don’t want to make it work. I want to go back to my list of options above and figure out the idiomatic Rust way of doing this.

And that’s where this post started and ends for now - an exercise in rubberducking some problems I’m having. Hopefully this has helped collect my thoughts to isolate the problem sections before asking some help from the community, but who can tell at this stage.

And yet, despite all this, I close by noting how very happy with Rust as a language - the borrow checker has simultaneously given me the guardrails and confidence to write lower level code from scratch, and taken its toll on my sanity.

I’m a convert.