[test] Add demo-based axe-core accessibility tests for mui-material#48341
[test] Add demo-based axe-core accessibility tests for mui-material#48341siriwatknp wants to merge 41 commits intomui:masterfrom
Conversation
Runs axe-core against real docs demos in Chromium, reusing the VRT Vite server. Enrollment is an explicit list in test/a11y/a11y.test.mjs (POC: Button + Card, 4 demos). Output a11y-results.json carries both per-component aggregates (for the docs widget) and per-demo detail (for future per-demo UI). Supersedes mui#47895 — swaps hand-written scenario modules for the docs demos themselves, per Olivier's suggestion to mirror Janpot's html-validate-over-docs-pages pattern in mui#48088.
Deploy previewhttps://deploy-preview-48341--material-ui.netlify.app/ Bundle size
Check out the code infra dashboard for more information about this PR. |
Replaces the hardcoded ENROLLED map in a11y.test.mjs with a full component roster at packages/mui-material/test/a11y/config.ts. Every component has an entry; status is 'enabled' (Button, Card) or 'pending' (rest). Adoption per PR = flip 'pending' to 'enabled' and fill in `demos`. The config doubles as the rollout ledger — a future docs presentation can iterate the full array to show both enrolled and pending components. Drops ComponentAccessibilityStatus.js and the two .md embeds. Presentation is a separate concern; the JSON shape stays (per-component aggregates + per-demo detail) so the future widget has the data it needs.
Replaces the single a11y-results.json (383 KB / 14k lines at full rollout)
with packages/mui-material/test/a11y/results/{Component}.json. A future
docs widget can import('.../results/Button.json') and webpack/vite will
code-split per component, so each page only ships its own ~2 KB payload
instead of the whole roster.
Also updates the docs:a11y script to rimraf the directory before each
run, and the AGENTS.md reference.
Flips 45 components from 'pending' to 'enabled' in config.ts. The 26 that currently trip color-contrast keep a skipRules: ['color-contrast'] entry so CI stays green while the violations still land in failedRules for the team to triage. a11y.test.mjs now falls back to the VRT nav when a config entry omits its 'demos' list, so enrolling is one line (flip the status). Explicit demos remain the override knob. The 11 components VRT excludes (need interaction, no demos, flaky) stay 'pending' with inline reasons — nothing observable today.
|
Rendering all the demos, one by one, in a real browser and run tests on them. Sounds an awful lot like the current screenshot tests we have. Would it make sense (be possible?) to fold accessibility testing into the regression tests. It would likely save a ton on circleci credits compared to running another full suite where we render all demos. |
`skipAssertions` better describes the behavior — the rule still runs and still lands in the results JSON, only the test-failing assertion is suppressed. `skipRules` read as "rule is skipped entirely", which isn't what happens. Also drops Box from the roster — nothing to a11y-test.
Layout wrappers (Grid, Masonry, Portal, NoSsr, Stack, Paper) and pure behavior components (Backdrop, ClickAwayListener, Container, CssBaseline) have no a11y semantics of their own — axe either finds nothing or attributes violations to them that actually belong to their inner content (already covered by the child component's own entry). Keeps Typography (heading hierarchy), Skeleton (aria-busy), Divider (caught a real list violation) since those contribute real signal. 44 components in roster → 36 enabled, 8 pending (interaction-only).
The status gate was infrastructure for a presentation UI that isn't in
scope for this PR. The config now lists only components that are actually
enrolled; pending ones live as `// TODO: {Component} — <blocker>` comments
so the reason stays visible without needing a parseable data shape.
If a future docs widget ever wants a "coming soon" list, convert the
comments back to structured entries then.
Infra is already shared — Kept it as a separate Vitest file for a standalone dev loop, not because of the infra cost. Happy to fold axe into Do you think saving around ~40s is a hard constraint? Happy to follow your judgement. |
It'd cost about 8 euro per month, so cost-wise probably not. It'd increase the current job time (6m 12s/last 30 days) to ~7m, making it the second longest running job, after the type checking. That technically doesn't make it the bottle neck. The bottle neck currently is type checking, we plan to migrate to typescript native soon, so in not so long the |
|
I pushed a commit to move to regression test, the timing does not differ much but the code is simplified. Net CI wall-clock saving is ~break-even; real wins are ~16 s of duplicate Chromium setup avoided per run and infra simplification (one Vitest suite, one config, one script deleted). No new violations; 2 of 38 result files differ (all pre-existing false-positive |
Per Jan's review on mui#48341: run axe inline in test/regressions/index.test.js, reuse the same Chromium session. Drop the standalone test/a11y/ suite and the docs:a11y:dev script. CI a11y gate moves from test_browser to test_regressions. Slider/Table results reflect a cleaner DOM (mouse parked, scrollbar normalized); all deltas are false-positive color-contrast flags disappearing.
|
Ok, if the win in runtime is marginal then we can go either way, your call. Did you check how currently screenshots are being excluded? Because if you implement this as enrollment in the existing screenshots fixtures, you will also inherit its exclusion mechanism. If we go the combined route then we should probably import all demos and build separate exclusion mechanisms for a11y and screenshots. e.g. declare const demoMeta: Map<string, {
screenshot: {
enabled: boolean,
// extra options, e.g. waitForSelector...
},
a11y: {
enabled: boolean,
rules: {
// ...
}
}
}>()aside: Thanks for editing the LLM generated comment, it really demotivates me when I have to answer these. Feels like I'm putting in the effort the author should have done when interacting with their LLM. |
There was a problem hiding this comment.
Just two things,
- My edited comment in #48341 (comment) about screenshot exclusion already being encoded in how we load the test fixtures, is it a problem?
- We probably want to generate the data nested under the ./docs folder somewhere, just like we do for other docs generated data?
Sorry for that, I usually don't want LLM to write response directly on GH but it was trying to be smart and push the comment directly 🥲. I'm building guard rail around it, bear with me. |
Per-tool exclusions now live in test/regressions/demoMeta.ts instead of glob negations in index.jsx. Screenshot-specific reasons (Redundant, Flaky image loading) no longer silently drop a11y coverage; autocomplete Redundant variants, skeleton/Facebook, snackbars/Customized, steppers image-heavy demos, and AccessibleTabs1/2 now get axe coverage. Glob keeps only structural exclusions + whole slugs no tool consumes. DEMO_META accepts both demo-level and slug-level keys; demo wins when both exist. shouldScreenshot + resolveA11y are the test-runner gates. Unit tests in demoMeta.test.ts cover current shape + the 3-line workflow for enabling a11y on an interaction-heavy slug (e.g. dialogs). a11yConfig.ts removed; AGENTS.md updated.
|
Pushed f337bc7 implementing the separation. The regression exclusion can be separately configurable in |
Colocates the generated JSON with other docs data, per Janpot's second review point. Reporter's OUT_DIR and the test:regressions prettier step updated to the new path. No other consumers touched the old path.
Migration expanded axe coverage to demos previously screenshot-excluded (autocomplete Redundant, skeleton/Facebook, etc.). Surfaces color-contrast near-threshold findings on fab, skeleton, bottom-navigation — same class of issue already suppressed elsewhere. Added skipAssertions for the three slugs and regenerated results JSONs at docs/data/material/a11y/.
|
@Janpot for the result to appear on the docs per demo. Do you think it should be shipped at build-time or better to lazy-loading the json when demo is visible? Currently, the result is generated per demo .e.g buttons-BasicButton.json The implementation to show the UI is in separate PR so I want to make sure that this PR introduce the suitable structure for the UI. |
Yes, let's ship it at build time. It's static content and we want to make search engines aware of it. It doesn't look like it's huge. Maybe it makes sense to emit those files next to the demo files, like we do for the js version and the preview? If it becomes a problem, we can always move to lazy loading. Config loading looks a lot better, personally I'd avoid merging. Merging is what makes renovatebot config so harder to reason about imo. Last-one-wins, like netlify redirects feels simpler. Safer to move rules around, fewer opportunities for adding a rule that silently breaks everything that comes after |
Janpot
left a comment
There was a problem hiding this comment.
No hard blockers, just a few suggestions.
|
|
||
| // Also use some of the demos to avoid code duplication. | ||
| // | ||
| // Two exclusion layers: |
There was a problem hiding this comment.
is this comment stuill relevant?
| test: string; | ||
| enabled?: boolean; | ||
| /** Playwright waits for this selector before snapshotting. */ | ||
| waitForSelector?: string; |
There was a problem hiding this comment.
This is not yet wired up to
material-ui/test/regressions/index.test.js
Lines 118 to 125 in 30091d1
The shouldScreenshot and resolveA11y functions make the demoMeta abstraction a bit leaky. Alternatively we could just export getConfig and parseRoute and let the screenshot and a11y functions themselves decide how to handle the defaults? We're going to need to screenshot config anyway to be able to use waitForSelector.
| @@ -0,0 +1,56 @@ | |||
| { | |||
| "BasicButtons": { | |||
| "passedRules": [ | |||
There was a problem hiding this comment.
I would recommend prototyping the docs before generating for all other components. That way we can benchmark this data format first.
There was a problem hiding this comment.
The structure is brought from #47895 to be able to show like this https://deploy-preview-47895--material-ui.netlify.app/material-ui/react-card/#accessibility-compliance
The thought behind:
- each component/demo could have different scenario tests, so must show what rules are tested
- the rule name is used to link to the source for the details of the rule to
https://dequeuniversity.com/rules/axe/* - the passed/failed are for degbuggin/plan to fix them. we can decide if we want to display them or not.
There was a problem hiding this comment.
Will there be an overview? As in, as a customer I'd like to know:
- If MUI is committed to a certain standard? Can I Hold MUI to that standard?
- Which trade-offs are there to be made?
I can't be required to open each and every demo and aggregate myself to answer that question, right?
There was a problem hiding this comment.
Absolutely, there should be an overview but we are not in the stage to do it (at least in this PR).
The main issue (the obvious one) is the color contrast which the team agree on having the test setup first, see how many affected, update the colors and iterate until the tests passed.
At the end, when we are confident about the compliance. We will have that overview page.
This PR is like a first step toward the goal. Let me know if you think any task should be slot in this PR apart from the setup.
There was a problem hiding this comment.
Some minor findings from Codex review:
waitForSelectoris declared onScreenshotRule, but the runner only readsscreenshotRule?.enabledbefore callingtakeScreenshot. A future rule withwaitForSelectorwould silently no-op. Either wire it before screenshot capture or remove the field until it’s needed.Generated
*.a11y.jsonfiles aren’t pruned before regeneration. Since the reporter only writes slugs seen in the current run, removing a slug fromA11Y_RULEScould leave stale tracked JSON behind while CI still passes. Consider deleting existing generated a11y files before the run or pruning stale files in the reporter.
Do you aim to maximize the coverage for the Button component in this PR or just set up the infra? There are a few more WCAG SC that could be covered by existing demos (e.g. size="small" and size="large" for SC 2.5.8 Target size (minimum)), but also some that would require new fixtures (e.g. color="info", warning, inherit) for full coverage of the component's whole API surface
just the infra (with couple of Button demos generated). |
Address Codex review on mui#48341: waitForSelector field on ScreenshotRule was declared but never read. Migrate the hardcoded ReactVirtualizedTable wait into a rule and apply screenshotRule.waitForSelector before axe + screenshot in the VRT loop. Reporter now prunes stale {slug}.a11y.json after writing — gated on the VRT module having actually run and no -t filter, so unrelated unit-test runs and filtered slug subsets leave untouched files alone.
nice catch, fixed both of them. |
closes #47887 · supersedes #47895
Summary
Automated
axe-coreaccessibility testing for@mui/material, driven off the real docs demos. Folded into the existing visual-regression Playwright loop (test/regressions/index.test.js) — for each enrolled demo, axe runs against the rendered[data-testid="testcase"]element in the same browser session that takes the screenshot, and the result is written todocs/data/material/components/{slug}/{slug}.a11y.json(one file per slug, keyed by demo name).Initial scope is
buttonsonly —BasicButtonsandColorButtons— to validate the pipeline end-to-end before broader enrolment in follow-ups. Layout wrappers (Box,Grid,Stack,Paper,Container,Portal, …) will be excluded on principle when added: they contribute no a11y signal of their own; any violation found on them really belongs to their children, already covered by the child's own enrolment.Config shape
Two independent rule arrays — one per tool — evaluated last-match-wins (no inheritance: an override rule must restate every field it cares about) against the docs path
docs/data/material/components/{slug}/{Demo}(minimatch globs). Independent arrays mean editing one tool's config can't stomp the other.pnpm test:regressionsrefreshes the*.a11y.jsonfiles. CI enforces they're up to date via a git-diff check.For Reviewers
Why demo-based, not scenario-based
#47895 introduced scenario files (
test/a11y/scenarios/Button.a11y.tsx) that hand-authored every axe case per component. Per Olivier's feedback, docs demos already cover every real usage scenario — duplicating them as test fixtures is waste. This PR replaces the scenarios folder with a thin rule list pointed at the demo routes VRT already serves.Why folded into the VRT loop
@Janpot's earlier concern about duplicate Playwright sessions: rather than a parallel browser, axe now runs inside the existing screenshot loop. Each enrolled demo gets one
axe.runagainst the already-rendered[data-testid="testcase"]element before the screenshot is taken. Failure isolation is preserved by recording the result onctx.task.meta.a11yfirst; the assertion fires only on visual rules (color-contrast,link-in-text-block) that depend on real CSS.Screenshot and a11y opt-outs are independent — a demo screenshot-disabled for being redundant or interaction-heavy can still be audited by axe.
Config follow-up from review
@Janpot's review comment flagged the original dual-Map structure (
SLUG_A11Y+DEMO_META) as non-intuitive and at risk of cross-tool stomping. Replaced with two rule arrays + minimatch globs as suggested; per-demo opt-outs are appended after slug-wide rules; the override rule restates every field it wants (no implicit inheritance).Output shape
One file per enrolled slug at
docs/data/material/components/{slug}/{slug}.a11y.json, co-located with the slug's other docs files. Each file is an object keyed by demo name:{ "BasicButtons": { "passedRules": [...], "failedRules": [], "testedRules": { "wcag2a": [...], "wcag2aa": [...], "wcag22aa": [...] } }, "ColorButtons": { ... } }Co-locating with
{slug}.mdand the demo source files makes the data discoverable from the slug folder and avoids the slug-prefix collision problem of a flat per-demo layout.Code
test/regressions/demoMeta.ts— two rule arrays + resolvers (shouldScreenshot,resolveA11y).test/regressions/a11y/axe.ts—recordA11yrecords ontoctx.task.meta.a11y, asserts visual rules only.skipAssertionssuppresses the assertion for listed rule IDs without dropping them from recorded data — failing rules still land infailedRulesfor triage.test/regressions/a11y/a11yReporter.ts— Vitest reporter that groups results by slug and writes one file per slug todocs/data/material/components/{slug}/{slug}.a11y.json.test/regressions/index.test.js— callsrecordA11yinside the existing per-route loop, before the screenshot.test/regressions/demoMeta.test.ts— 8 unit tests for the resolvers (precedence, brace-glob enrolment, screenshot vs a11y independence).CI
The existing
test:regressionsjob already builds, serves, and runs Playwright. Adding axe is a no-op on infra. The git-diff guard ensures every*.a11y.jsonis up-to-date.Agent docs
AGENTS.md— "Accessibility Testing" section covers the rule-array shape, brace-glob narrowing, per-demo opt-outs, and theskipAssertionsescape hatch.Not in scope (follow-ups)
*.a11y.jsonvia the docs render pipeline and shows pass/fail counts plus a per-WCAG-tag rule breakdown on click.A11Y_RULES.skipAssertionslists it. Fine for a POC; may want finer control once more components enroll.