how to express that a large number of tests are expected to fail? #1832

devoncarew · 2022-12-29T01:39:20Z

I'm currently working on a project where most of the tests for the project are code-gen'd from a specification. Many of the tests are currently failing - of ~7,590 tests, 545 are failing; as I fix things the pass percentage increases.

Because some tests currently fail, I'm not running tests on my CI (otherwise it's just a continuous red signal). This does mean that unfortunately I have effectively no regression coverage - I don't know when I break tests that had previously been passing.

One solution other systems w/ lots of tests have landed on is the notion of status files - encoding expected failures in a data file. The CI would then pass when the test results matched what was described in the status file.

Have you thought about how to support this use case - having a setup with lots of tests, where you can't reasonably expect them all to pass (but you still want CI + test coverage)? There are a few possible solutions I've thought of:

encode 'failure expected' in the test definition

This would be having a new 'bool fails' flag on the test() method. This would invert the normal test expectation. This technique would avoid the complexity of a new file format + aux. file. It has the drawback that people would need to modify the test definition (for me, in ~500 different source locations). It also wouldn't be able to express things like 'only expected to fail on windows'.

parameterize `test()`

have a new expectedStatus(String testId) closure param for test()
each package could decide how to figure out the test status (backed by a status file? a database?)
package:test doesn't need to add much new mechanics in terms of file formats and such
it pushes some of the complexity to the users of the feature (which is probably ok here)

support a status file

This is defining a status file format for package:test (or re-using dart_test.yaml?). You'd minimally want to be able to express that n tests that were expected to fail, and may want to push it as far as allowing different sections in the file (using package:boolean_selector?) each with separate test expectations for different platforms.

The text was updated successfully, but these errors were encountered:

devoncarew · 2023-01-01T01:43:38Z

For now I'm working around this by maintaining an 'expected fail' status file (https://github.com/devoncarew/wasmd/blob/main/tool/tests.properties), wrapping dart test --reporter json in a script which parses the json test reporter output (https://github.com/devoncarew/wasmd/blob/main/tool/tests.dart), and running that script on the CI to fail the build if the tests results don't match the status file.

jakemac53 · 2023-01-03T15:22:34Z

Today we only really support skipping tests, either by the skip parameter to test/group/expect, the @Skip annotation, or via dart_test.yaml config for specific tags etc.

I do think there is a missing feature here that actually expects a failure, I remember some discussion about it in the past, it just hasn't been prioritized (or requested).

jakemac53 · 2023-01-03T15:47:07Z

Fwiw I personally am not a fan of the status file or DB approach at all - it separates the status of the test too far from the test itself. But I do think a general feature that requires certain tests to fail is a good idea, I just like the configuration to live with the actual test similar to how skip works.

One idea could just be a flag to run skipped tests, but expect them all to fail. If any pass that is treated as a failure. This could be a command line flag, and so we could avoid increasing the API surface area. We could also add it as an option to dart_test.yaml so that it would always run all skipped tests and expect them to fail? The immediate issue I see with that would be flaky tests, but we could allow the configuration to be per preset, so you could add a flaky tag or something which doesn't enable the option.

It might still be better to just a more explicit expected failure option, but I think probably >90% of the current skipped tests are meant to fail.

devoncarew · 2023-01-03T17:22:41Z

I just like the configuration to live with the actual test similar to how skip works.

That makes sense. In my case, the tests are generated, so I would need to keep the 'expected fail' status in a sidecar file in any case. But my use case is probably rare so we wouldn't need to optimize for it.

One idea could just be a flag to run skipped tests, but expect them all to fail.

I like this - as you say, it doesn't increase the API surface area. It does mean that there are a few different concerns layered into the skip parameter; don't run this test, this test is expected to fail, and this test is too flaky to run. I don't see a way to specify why the test is being skipped; that would make it easier to recycle the skip feature. Perhaps parameterize the Skip class a bit? Skip.expectedFail()? Skip.flaky()?

jakemac53 · 2023-01-03T18:13:23Z

The skip parameters are typed dynamic, I think its just the annotation that only accepts a string due to limitations around our understanding of annotations. But we could probably support a specific list of constants, or as you say have some specialized versions of the annotation.

jakemac53 · 2023-01-03T18:14:56Z

Fwiw, I think the for expected failure case, skip really isn't the right word (you want the test to run, just fail). So maybe it is worth adding some API surface area here instead?

We also surface the number of skipped tests as a yellow color, indicating basically a warning status. But expected failures should probably just be green (passing).

natebosch · 2023-02-03T22:21:11Z

It does seem useful to be able to declare that a test is expected to fail, and then fail the suite if the test passes instead. (I thought I recalled a previous issue from @jakemac53 discussing this, but now I can't find it)

We could also use the onPlatform to support expected failure on windows.

A bool argument doesn't capture the reason a test fails though. One reason I'm resistant to building this in as a first class feature is that I think it should be used as rarely as possible. In my experience tests that are expected to fail for any reason go stale - some code change would require them to update, but no one notices unless it's a static error. Over time, when you try to fix the test you find that it's not failing for the original reason anymore. It would be safer if this feature only accepted TestFailure exceptions and ignored others, but we're considering moving away from that distinction, not building new behavior on top of it.

It seems like this is something that would be better built as a new tool on top of the test runner, but we don't yet have any of the utilities you'd need to build it.
#1571 (comment)

jakemac53 · 2023-02-07T16:37:59Z

I agree that at least some support for matching a reason for the failure would be ideal. But it is still better to assert some sort of failure rather than skip entirely, which is the only current option. At least you will know if it starts passing :).

natebosch mentioned this issue Mar 7, 2023

Failing test for assigning to nullable typed var dart-archive/code_builder#401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to express that a large number of tests are expected to fail? #1832

how to express that a large number of tests are expected to fail? #1832

devoncarew commented Dec 29, 2022

devoncarew commented Jan 1, 2023

jakemac53 commented Jan 3, 2023

jakemac53 commented Jan 3, 2023 •

edited

Loading

devoncarew commented Jan 3, 2023 •

edited

Loading

jakemac53 commented Jan 3, 2023

jakemac53 commented Jan 3, 2023

natebosch commented Feb 3, 2023

jakemac53 commented Feb 7, 2023

how to express that a large number of tests are expected to fail? #1832

how to express that a large number of tests are expected to fail? #1832

Comments

devoncarew commented Dec 29, 2022

encode 'failure expected' in the test definition

parameterize test()

support a status file

devoncarew commented Jan 1, 2023

jakemac53 commented Jan 3, 2023

jakemac53 commented Jan 3, 2023 • edited Loading

devoncarew commented Jan 3, 2023 • edited Loading

jakemac53 commented Jan 3, 2023

jakemac53 commented Jan 3, 2023

natebosch commented Feb 3, 2023

jakemac53 commented Feb 7, 2023

parameterize `test()`

jakemac53 commented Jan 3, 2023 •

edited

Loading

devoncarew commented Jan 3, 2023 •

edited

Loading