Evaluate ways to determine accuracy and reduce incomplete type stubs #8481

kkirsche · 2022-08-04T11:28:46Z

I've reached out to psf/requests via their issue tracker (psf/requests#6211) to begin exploring why various projects aren't merging the type stubs from typeshed.

From requests, they've said that previous reviews of the type stubs are inaccurate and prevent them from merging the type hints into the main codebase.

This issue is to explore how type stub accuracy can be better evaluated to reduce the maintenance burden currently imposed on typeshed caused by inaccuracy or incompleteness.

Ultimately, while having stubs may help end users in the short term, inaccurate or incomplete stubs can prevent or delay long term adoption by the upstream team(s).

AlexWaygood · 2022-08-20T17:50:15Z

Well, I think we all agree that incomplete and inaccurate stubs are Bad^TM :)

How do you suggest we improve here? I've spent a fair amount of time over the last few months working on improving several of our existing tools for evaluating accuracy and completeness of stubs (stubtest, flake8-pyi, mypy_primer, etc.), as have most of the other maintainers of typeshed. I plan on continuing to do so. Is there something else we should be doing, in your opinion?

hauntsaninja · 2022-08-20T18:37:11Z

kkirsche previously mentioned that we could do a better job of working with upstream to integrate type hints. Some ways we could improve the odds of that happening:

Add documentation for using libcst / retype for applying stubs back to inline code
Improve https://typing.readthedocs.io/en/latest/source/writing_stubs.html
Make it clearer what typeshed stubs we think are "good enough" to integrate back to upstream. One possible set of criteria could be a) stubtest passes without --ignore-missing-stub, b) mypy passes with --disallow-incomplete-defs --disallow-untyped-defs , c) no use of Incomplete or module __getattr__
Have "upstream status" documented per-package, e.g. whether upstream is waiting for someone to do the work or for some Python to go EOL or for the stubs to be good enough or they never want types ever etc. In particular, typeshed could serve as a better coordination point than it does today for people who have energy to upstream stubs to find projects that are willing to have them upstreamed.

Some other ideas:

One thing that would be useful is just better stub generation. Existing stub generators assume you want to lovingly handcraft your stubs and do no type inference for you. Maybe Jelle's autotyping would be something of a starting point? I also wouldn't be surprised if it was very easy to get pyright to do this / if pyright could already do it, it's been a while since I've played with pyright --createstub

A related category of tool that we could use are static validation of type hints against upstream code. For instance, something along the lines of "apply hints back to upstream code, type check it, and see what happens". This would take a bit of work to find something with high enough signal to noise, but I think it's doable to create something here that adds value. A similar thing that might be more feasible is type checking an upstream's tests against the stubs.

(I realise I'm basically suggesting stubgen-but-with-inference and stubtest-but-with-inference)

kkirsche · 2022-08-20T18:46:01Z

I plan on continuing to do so. Is there something else we should be doing, in your opinion?

This actually kinda demonstrates what I think could be done. It sounds like a lot of the goals, plans, etc. are spread out across a variety of repositories, people, and is very individualized.

That's 100% reasonable, and people who are contributing in their free time should be allowed to work on what they find most interesting or valuable. With that said, it would be useful for outsiders if these were more clearly documented and grouped to allow others to assist you all. I can't speak for others, but I've found determining what the team finds valuable contributions vs wastes of their times rather challenging and at times frustrating.

Potentially using GitHub's milestones to set more tangible goals around type incompleteness, accuracy, and tooling could be provided to help users find how they want to contribute to these goals?

Another idea may be to add an index of tooling, where it's located, what the motivation of its author (as not all are projects under the Python or PyCQA namespace) to the contributing or readme documentation. This could be grouped by organization or author so that users can more clearly understand when they're switching boundaries that may result in different expectations than exist under a different author / org.

My third idea would be to expand into new tooling areas that do more analysis of the function bodies to evaluate if the type being declared is more restrictive (reasonable in some cases but may indicate a misunderstanding or error in others) than the functions the type is being used in to highlight areas where additional consideration should be given while reviewing the existing types.

Curious what others think about these ideas or if I missed something that already exists related to these areas.

EDIT: fixed iOS typo from wand to what

AlexWaygood added the project: policy Organization of the typeshed project label Aug 20, 2022

Avasam mentioned this issue Sep 1, 2022

Run stubtest under different OS environments #8660

Closed

kkirsche closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2022

This was referenced Sep 14, 2022

Improve the daily.yml workflow #8737

Merged

Ideas for improving typeshed testing and tooling #8774

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate ways to determine accuracy and reduce incomplete type stubs #8481

Evaluate ways to determine accuracy and reduce incomplete type stubs #8481

kkirsche commented Aug 4, 2022

AlexWaygood commented Aug 20, 2022

hauntsaninja commented Aug 20, 2022 •

edited

Loading

kkirsche commented Aug 20, 2022 •

edited by AlexWaygood

Loading

Evaluate ways to determine accuracy and reduce incomplete type stubs #8481

Evaluate ways to determine accuracy and reduce incomplete type stubs #8481

Comments

kkirsche commented Aug 4, 2022

AlexWaygood commented Aug 20, 2022

hauntsaninja commented Aug 20, 2022 • edited Loading

kkirsche commented Aug 20, 2022 • edited by AlexWaygood Loading

hauntsaninja commented Aug 20, 2022 •

edited

Loading

kkirsche commented Aug 20, 2022 •

edited by AlexWaygood

Loading