Skip to content

StandardMaterial causes long delays on Firefox with WebGL #18142

Open
@ashivaram23

Description

@ashivaram23

Bevy version

0.15.3

Relevant system information

AdapterInfo { name: "Apple M1, or similar", vendor: 4203, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "WebGL 2.0", backend: Gl }

Browser is Firefox 134.0, "WebGL 2 Driver WSI Info" in about:support says it's using CGL

What you did

Open any Bevy project that uses StandardMaterial in Firefox with WebGL. This includes most of the WebGL examples on the Bevy official examples page like https://bevyengine.org/examples/3d-rendering/3d-scene/.

Alternatively, run any Bevy WebGL project that uses a custom material with a large uniform array of large structs:

main.rs
use bevy::{
    prelude::*,
    render::render_resource::{AsBindGroup, ShaderRef, ShaderType},
};

#[derive(Clone, Copy, Default, ShaderType)]
struct LargeStruct {
    a: f32,
    b: f32,
    c: f32,
    d: f32,
    e: f32,
    f: f32,
    g: f32,
    h: f32,
    i: f32,
    j: f32,
    k: f32,
    l: f32,
    m: f32,
    n: f32,
    o: f32,
    p: f32,
}

#[derive(AsBindGroup, Asset, Clone, TypePath)]
struct TestMaterial {
    #[uniform(0)]
    struct_array: [LargeStruct; 200],
}

impl Material for TestMaterial {
    fn fragment_shader() -> ShaderRef {
        ShaderRef::from("test.wgsl")
    }
}

fn main() {
    let mut app = App::new();
    app.add_plugins((DefaultPlugins, MaterialPlugin::<TestMaterial>::default()))
        .add_systems(Startup, setup)
        .run();
}

fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<TestMaterial>>,
) {
    commands.spawn((Camera3d::default(), Transform::from_xyz(0.0, 0.0, 10.0)));
    commands.spawn((
        Mesh3d(meshes.add(Cuboid::new(4.0, 4.0, 4.0))),
        MeshMaterial3d(materials.add(TestMaterial {
            struct_array: [LargeStruct::default(); 200],
        })),
    ));
}
test.wgsl
struct LargeStruct {
    a: f32,
    b: f32,
    c: f32,
    d: f32,
    e: f32,
    f: f32,
    g: f32,
    h: f32,
    i: f32,
    j: f32,
    k: f32,
    l: f32,
    m: f32,
    n: f32,
    o: f32,
    p: f32,
}

@group(2) @binding(0) var<uniform> struct_array: array<LargeStruct, 200>;

@fragment
fn fragment() -> @location(0) vec4f {
    return vec4(struct_array[0].a, vec3(1.0));
}

What went wrong

The browser takes several seconds to link the WebGL program. This blocks the main thread and causes a long delay when loading the page or whenever shaders are recompiled.

Additional information

Any browser that uses OpenGL to implement WebGL on macOS runs into this problem. In addition to Firefox, this includes desktop Chrome/Safari whenever the WebGL backend is manually set to OpenGL. Profiling shows that most of the time is spent in calls to glGetActiveUniform or glGetActiveUniformsiv.

What seems to be going on is:

  • While linking a WebGL program, the browser has to collect information about each active uniform with calls to glGetActiveUniformsiv and glGetActiveUniform. OpenGL treats each member of a struct as a different uniform resource (see OpenGL introspection documentation), so large arrays of large structs can have a giant number of uniforms to query.
  • The StandardMaterial PBR shader accesses clusterable_objects.data in pbr_lighting.wgsl. When storage buffers are unavailable, this is a uniform array of 204 ClusterableObject structs, each with 11 members as defined in mesh_view_types.wgsl. In Firefox's implementation, that makes 2244 uniforms for each of five calls to glGetActiveUniformsiv to check, plus 2244 calls to glGetActiveUniform.
  • Parts of a uniform array can be excluded from the list of active uniforms if the shader compiler finds that they're unused, but when the array is in a uniform block, all its uniforms are apparently counted as active (this might be implementation dependent). Naga translates WGSL uniform buffers to GLSL ES uniform blocks, so all members of all 204 entries are queried if any one is accessed.
  • The Apple OpenGL implementation is very slow at this for some reason, and the time it takes scales noticeably with the number of uniforms.

If possible, it might help to make space for fewer than 204 entries in clusterable_objects at first if there aren't too many point or spot lights in the scene, then somehow increase the capacity up to MAX_UNIFORM_BUFFER_CLUSTERABLE_OBJECTS if more lights are added at runtime. This might require recompiling shaders, but it would only affect platforms without storage buffer support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-RenderingDrawing game state to the screenC-PerformanceA change motivated by improving speed, memory usage or compile timesD-ShadersThis code uses GPU shader languagesO-WebGL2Specific to the WebGL2 render APIS-Needs-DesignThis issue requires design work to think about how it would best be accomplishedX-ContentiousThere are nontrivial implications that should be thought through

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions