-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Parametrized values containing non-ascii bytes break cache #1031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
By default, string IDs just get passed through: # _pytest/python.py
def _idval(val, argname, idx, idfn):
# ...
if isinstance(val, (float, int, str, bool, NoneType)):
return str(val) It seems Python 2's $ python3 -c 'import json; print(json.dumps({"": "\xac\x10\x02G"}))'
{"": "\u00ac\u0010\u0002G"}
$ python2 -c 'import json; print(json.dumps({"": "\xac\x10\x02G"}))'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xac in position 0: invalid start byte I think the issue is that with Python 2, normal strings are bytestrings, while in Python 3, they're unicode strings. I haven't looked at the cache code yet, but I think one solution would be to encode the data as utf-8 before passing it to Another solution would be to sanitize invalid UTF-8 when generating the test IDs already - but I don't know enough about encodings in Python 2 to do that. I think that'd be preferrable as it caused trouble elsewhere already. |
we should simplyencode python2 strings that dont fit into ascii as string-escape, |
while we are at it we should have a unittest that ensures all kinds of test id string are encodable on all supported python versions, |
Related to #1030; committing directly to pytest repository to get feedback from others on how to proceed.
c9b0370
to
716fa97
Compare
Thanks @RonnyPfannschmidt for the suggestion! 😄 Now Python 2: I also took the opportunity to apply the same logic for Remove two xfail marks related to #714 which have been passing since #908, I presume. |
well done |
Parametrized values containing non-ascii bytes break cache
is a full ascii string, otherwise escape it into its binary form. | ||
""" | ||
try: | ||
return val.encode('ascii') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicoddemus encode/decode mismatch
also the kind of code that begs for unit-tests to run across all code paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, thanks for tracking this down.
Some unit-tests were written for this (take a look at test_idmaker_native_strings
), you think adding more values in there that demonstrate this problem is enough, or some other tests should be written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directly for _escape_bytes, the idea is to properly run trough all code-paths of that function
so we don't have to exercise the code paths via the idmaker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I reached the same conclusion after I posted my comment and thought about it a little more. I will start working on this now. 😅
Just added the test to see what was going on. This is the value of the
lastfailed
attribute when it fails:I thought
nodeids
would always be a "well formed" string, was I wrong or perhaps this is a bug? @RonnyPfannschmidt or @hpk42?Committing directly to pytest repository to get feedback from others on how to proceed (and contribute directly to this branch).
Related to #1030