Skip to content

Remove sparse set in Table storage #14928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

cBournhonesque
Copy link
Contributor

@cBournhonesque cBournhonesque commented Aug 26, 2024

Objective

We need to remove any SparseSet<ComponentId> in preparation for relations, because we will have components with very high id values once relations are added. (Because relations have a ComponentId in the high bits of the u64 id)

Part 1 was removing the FixedBitSet in Access: #14385
Part 2 is done in this PR: removing the ImmutableSparseSet<ComponentId> in Storage::Table that lets you access the Column corresponding to a given ComponentId.

This PR replaces it with a Vec<ComponentId> and Vec<Column> which are sorted in the same order.
We use the ComponentIndex (which maps from a ComponentId to the list of archetypes that contain this component, along with extra metadata including the Column index containing the ComponentId) to retrieve the Column corresponding to a given ComponentId.

TODO:

  • run benchmarks

@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Aug 26, 2024
@cBournhonesque cBournhonesque marked this pull request as ready for review August 26, 2024 21:53
@cart
Copy link
Member

cart commented Aug 26, 2024

I believe the alternative to this is hierarchical sparse sets to cut down on allocation size while still preserving a lot of the benefits of bitsets.

@SanderMertens how does Flecs solve this problem?

@james-j-obrien
Copy link
Contributor

james-j-obrien commented Aug 26, 2024

Sander can correct me if I'm mistaken but this is similar to the approach flecs uses. Tables in flecs store a vector of component ids and several other vectors for mapping the type index to the column (code simplified for brevity):

struct ecs_table_t {
    uint64_t id;                     /* Table id in sparse set */
    int16_t column_count;            /* Number of components (excluding tags) */
    ecs_type_t type;                 /* Vector with component ids */

    ecs_data_t data;                 /* Component storage */
    
    int16_t *component_map;          /* Get column for component id */
    int16_t *column_map;             /* Map type index <-> column
                                      *  - 0..count(T):        type index -> column
                                      *  - count(T)..count(C): column -> type index
                                      */
};

When looking up the column flecs either uses a dense array for low component ids (a class of optimisation that we implicitly always use via sparse sets, we could also split the code paths for perf) or goes through the id index similar to this PR:

int32_t ecs_table_get_column_index(
    const ecs_world_t *world,
    const ecs_table_t *table,
    ecs_id_t id)
{
    if (id < FLECS_HI_COMPONENT_ID) {
        int16_t res = table->component_map[id];
        if (res > 0) {
            return res - 1;
        }
        return -1;
    }

    ecs_id_record_t *idr = flecs_id_record_get(world, id);
    if (!idr) {
        return -1;
    }

    ecs_table_record_t *tr = flecs_id_record_get_table(idr, table);
    if (!tr) {
        return -1;
    }

    return tr->column;
}

@SanderMertens
Copy link

SanderMertens commented Aug 27, 2024

@cart @james-j-obrien's post covers the gist of it. There are a few other details that impact perf:

  • A low id range is reserved for components, so the chance that a regular component hits the dense component_map array is pretty high. I'm guessing this is automatically the case in Bevy, but might change if/when component ids switch to entities
  • The element in the component_map is -type_index if it's a tag. This is different from the column index, which is the index in the array of table columns (see below).
  • The component index fallback is mostly for relationship pairs, as up to 256 ids are reserved for components and not many games will go beyond that
  • The column_map and column_count members are part of flecs' ZST optimization where the column array only contains element for components, not tags. Those aren't really relevant for this PR
  • The id record lookup (which is the component index) has a similar branch, where low component ids are looked up with a dense array (of 65k elements), with high ids (almost exclusively pairs) are looked up in a hashmap
  • The table record lookup is a hashmap

@@ -1250,6 +1313,7 @@ unsafe impl<'__w, T: Component> ReadOnlyQueryData for Ref<'__w, T> {}

#[doc(hidden)]
pub struct WriteFetch<'w, T> {
component_index: &'w ComponentIndex,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of Fetch types include a &ComponentIndex now. Would it make sense to store it once in the QueryIter and pass it in to set_table? That would let you avoid changing the Fetch types. You'd store one extra copy for the rare query that never needs it, but save copies for the common queries that access multiple components.

(Alternately, should the Fetch types be doing the first lookup by ComponentId and storing a &'w HashMap<ArchetypeId, ArchetypeRecord>?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the end-goal is to have the query cache the column per archetype, so maybe what you described by

(Alternately, should the Fetch types be doing the first lookup by ComponentId and storing a &'w HashMap<ArchetypeId, ArchetypeRecord>?)

cc @james-j-obrien

fetch
.component_index
.get_column_index(id, archetype_id)
.is_some_and(|column_index| table.has_column(column_index))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can has_column ever fail? It looks like get_column_index would return None rather than an out-of-range value, so you can remove the has_column() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@@ -961,7 +961,10 @@ impl<'w> EntityWorldMut<'w> {
// matches
let result = unsafe {
T::from_components(storages, &mut |storages| {
// TODO: how to fix borrow-checker issues here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you already fix this?

@@ -669,7 +673,8 @@ impl TableBuilder {
/// [`Component`]: crate::component::Component
/// [`World`]: crate::world::World
pub struct Table {
columns: ImmutableSparseSet<ComponentId, Column>,
columns: Vec<Column>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the component index rather than replacing columns: ImmutableSparseSet<ComponentId, Column> with columns: HashMap<ComponentId, Column>? That seems like it would be a simpler change, and it would only require one hash lookup per set_table/set_archetype instead of two.

Is the idea that this uses less memory? I wouldn't have expected a HashMap<ComponentId, Column> to use all that much more space than the Vec<Column> and Vec<ComponentId> that you're storing instead.

Copy link
Contributor Author

@cBournhonesque cBournhonesque Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@james-j-obrien knows more here.

I think a good first solution to fix memory issues would indeed be to just swap the SparseSet for a Hashmap.
I think the idea was that HashMap have relatively poor performance for iteration

Btw @chescock have you joined the bevy discord? It would be great to chat more about a path to relations!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw @chescock have you joined the bevy discord? It would be great to chat more about a path to relations!

Yup! I'm quartermeister there, and I've been lurking on #ecs-dev and the 🌈 working group room!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing a HashMap in the table is almost equivalent to storing it in the component index, however in a post-relations future where tables are being created and deleted much more frequently it's economical to try and keep the setup and teardown costs low. It may also be convenient for later optimizations like moving the sparse sets into the component index, that way we only have one codepath for both storages and additionally we are not looking up the info and then the sparse set separately.

While benchmarks haven't been run yet there's a decent chance we want to steal some of flecs' optimizations here and keep a low id lookup array in both the table and the component index.

@Diddykonga
Copy link

Would SmallVec make sense here, not sure the average component count, but I assume there would be a reasonable number that could be used potentially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged
Projects
Status: Candidate
Development

Successfully merging this pull request may close these issues.

7 participants