Skip to content

Type hints #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Type hints #61

wants to merge 3 commits into from

Conversation

kmike
Copy link
Member

@kmike kmike commented Jul 7, 2016

I've tried to add PEP-484 type hints in this PR (checked using mypy). They showed how tricky are our string types :) For each string there are several options:

  • str
  • bytes
  • typing.AnyStr
  • typing.Text (or six.text_type)
  • w3lib._types.String (a custom type I had to add)

There are several interesting points revealed by these type checks, e.g. that html_body_declared_encoding only worked in Python 3 because stdlib encodings.normalize_encoding accept binary values (I'm not sure this stdlib module is even documented).


Update: this PR is rather useless for the end users because of python/mypy#1190; it also adds typing runtime dependency which could be good to avoid. Switch to stub files?

@@ -7,16 +7,19 @@
import re
import six
from six import moves
from typing import AnyStr, Optional, Iterable, Tuple, Union, Sequence
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get off annoying bot, these imports are used. You're not aware of PEP-484.

@codecov-io
Copy link

codecov-io commented Jul 7, 2016

Current coverage is 94.07%

Merging #61 into master will increase coverage by 0.41%

Powered by Codecov. Last updated by ec0230a...825c1cb

def read_bom(data):
# type: (bytes) -> Tuple[str, bytes]
r"""Read the byte order mark in the text, if present, and
return the encoding represented by the BOM and the BOM.

If no BOM can be detected, ``(None, None)`` is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick - this is not reflected by the type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; currently mypy assumes that None is a valid value for any type, so Tuple[str, bytes] allows None, None. I recall using Union[Tuple[str, bytes], Tuple[None, None]] in some other place, but I'm not sure it works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't know that, thanks! mypy behavior makes sense for tuple, although not that I would expect it in general :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have docs for this: http://mypy.readthedocs.io/en/latest/planned_features.html. Let me try Union again :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it returns a weird error for this line if annotation is changed to Union:

bom_enc, bom = read_bom(html_body_str)

w3lib/encoding.py: note: In function "html_to_unicode":
w3lib/encoding.py:274: error: 'Union[Tuple[builtins.str, builtins.bytes], Tuple[void, void]]' object is not iterable

It looks related to Union handling, not to Tuple[None, None] handling because it also fails if Tuple[str, str] is used as a second option for Union.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So mypy does not see that all union "kinds" here are iterable, that's a pity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevant mypy issue: python/mypy#1575

@@ -0,0 +1,25 @@
# -*- coding: utf-8 -*-
"""
Which string type to use?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@eliasdorneles
Copy link
Member

it also adds typing runtime dependency which could be good to avoid

if we want to avoid the runtime dependency, we could use a hack like:

if os.getenv('ENABLE_TYPE_CHECKS'):
    from typing import ...

it'd make the imports kinda ugly, tho.

@lucywang000
Copy link
Member

FTR: I'll continue this PR soon.

@kmike
Copy link
Member Author

kmike commented Sep 26, 2018

FTR: There was a lot of new development in past 2 years on typing. Now type annotations from third-party libraries can be even more useful, as mypy implemented support for https://www.python.org/dev/peps/pep-0561/ - it is possible to check user code against annotations of third-party libraries.

@yozachar
Copy link

Bumping to close outdated PR.

@kmike kmike closed this Aug 11, 2022
@wRAR wRAR deleted the type-hints branch November 18, 2022 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants