Skip to content

how to express that a large number of tests are expected to fail? #1832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
devoncarew opened this issue Dec 29, 2022 · 8 comments
Open

how to express that a large number of tests are expected to fail? #1832

devoncarew opened this issue Dec 29, 2022 · 8 comments

Comments

@devoncarew
Copy link
Member

I'm currently working on a project where most of the tests for the project are code-gen'd from a specification. Many of the tests are currently failing - of ~7,590 tests, 545 are failing; as I fix things the pass percentage increases.

Because some tests currently fail, I'm not running tests on my CI (otherwise it's just a continuous red signal). This does mean that unfortunately I have effectively no regression coverage - I don't know when I break tests that had previously been passing.

One solution other systems w/ lots of tests have landed on is the notion of status files - encoding expected failures in a data file. The CI would then pass when the test results matched what was described in the status file.

Have you thought about how to support this use case - having a setup with lots of tests, where you can't reasonably expect them all to pass (but you still want CI + test coverage)? There are a few possible solutions I've thought of:

encode 'failure expected' in the test definition

This would be having a new 'bool fails' flag on the test() method. This would invert the normal test expectation. This technique would avoid the complexity of a new file format + aux. file. It has the drawback that people would need to modify the test definition (for me, in ~500 different source locations). It also wouldn't be able to express things like 'only expected to fail on windows'.

parameterize test()

  • have a new expectedStatus(String testId) closure param for test()
  • each package could decide how to figure out the test status (backed by a status file? a database?)
  • package:test doesn't need to add much new mechanics in terms of file formats and such
  • it pushes some of the complexity to the users of the feature (which is probably ok here)

support a status file

This is defining a status file format for package:test (or re-using dart_test.yaml?). You'd minimally want to be able to express that n tests that were expected to fail, and may want to push it as far as allowing different sections in the file (using package:boolean_selector?) each with separate test expectations for different platforms.

@devoncarew
Copy link
Member Author

For now I'm working around this by maintaining an 'expected fail' status file (https://github.com/devoncarew/wasmd/blob/main/tool/tests.properties), wrapping dart test --reporter json in a script which parses the json test reporter output (https://github.com/devoncarew/wasmd/blob/main/tool/tests.dart), and running that script on the CI to fail the build if the tests results don't match the status file.

@jakemac53
Copy link
Contributor

Today we only really support skipping tests, either by the skip parameter to test/group/expect, the @Skip annotation, or via dart_test.yaml config for specific tags etc.

I do think there is a missing feature here that actually expects a failure, I remember some discussion about it in the past, it just hasn't been prioritized (or requested).

@jakemac53
Copy link
Contributor

jakemac53 commented Jan 3, 2023

Fwiw I personally am not a fan of the status file or DB approach at all - it separates the status of the test too far from the test itself. But I do think a general feature that requires certain tests to fail is a good idea, I just like the configuration to live with the actual test similar to how skip works.

One idea could just be a flag to run skipped tests, but expect them all to fail. If any pass that is treated as a failure. This could be a command line flag, and so we could avoid increasing the API surface area. We could also add it as an option to dart_test.yaml so that it would always run all skipped tests and expect them to fail? The immediate issue I see with that would be flaky tests, but we could allow the configuration to be per preset, so you could add a flaky tag or something which doesn't enable the option.

It might still be better to just a more explicit expected failure option, but I think probably >90% of the current skipped tests are meant to fail.

@devoncarew
Copy link
Member Author

devoncarew commented Jan 3, 2023

I just like the configuration to live with the actual test similar to how skip works.

That makes sense. In my case, the tests are generated, so I would need to keep the 'expected fail' status in a sidecar file in any case. But my use case is probably rare so we wouldn't need to optimize for it.

One idea could just be a flag to run skipped tests, but expect them all to fail.

I like this - as you say, it doesn't increase the API surface area. It does mean that there are a few different concerns layered into the skip parameter; don't run this test, this test is expected to fail, and this test is too flaky to run. I don't see a way to specify why the test is being skipped; that would make it easier to recycle the skip feature. Perhaps parameterize the Skip class a bit? Skip.expectedFail()? Skip.flaky()?

@jakemac53
Copy link
Contributor

The skip parameters are typed dynamic, I think its just the annotation that only accepts a string due to limitations around our understanding of annotations. But we could probably support a specific list of constants, or as you say have some specialized versions of the annotation.

@jakemac53
Copy link
Contributor

Fwiw, I think the for expected failure case, skip really isn't the right word (you want the test to run, just fail). So maybe it is worth adding some API surface area here instead?

We also surface the number of skipped tests as a yellow color, indicating basically a warning status. But expected failures should probably just be green (passing).

@natebosch
Copy link
Member

It does seem useful to be able to declare that a test is expected to fail, and then fail the suite if the test passes instead. (I thought I recalled a previous issue from @jakemac53 discussing this, but now I can't find it)

We could also use the onPlatform to support expected failure on windows.

A bool argument doesn't capture the reason a test fails though. One reason I'm resistant to building this in as a first class feature is that I think it should be used as rarely as possible. In my experience tests that are expected to fail for any reason go stale - some code change would require them to update, but no one notices unless it's a static error. Over time, when you try to fix the test you find that it's not failing for the original reason anymore. It would be safer if this feature only accepted TestFailure exceptions and ignored others, but we're considering moving away from that distinction, not building new behavior on top of it.

It seems like this is something that would be better built as a new tool on top of the test runner, but we don't yet have any of the utilities you'd need to build it.
#1571 (comment)

@jakemac53
Copy link
Contributor

I agree that at least some support for matching a reason for the failure would be ideal. But it is still better to assert some sort of failure rather than skip entirely, which is the only current option. At least you will know if it starts passing :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants