Parametrized values containing non-ascii bytes break cache #1031

nicoddemus · 2015-09-21T05:42:46Z

Just added the test to see what was going on. This is the value of the lastfailed attribute when it fails:

{'test_cache_param.py::test_fail[\xac\x10\x02G]': True}

I thought nodeids would always be a "well formed" string, was I wrong or perhaps this is a bug? @RonnyPfannschmidt or @hpk42?

Committing directly to pytest repository to get feedback from others on how to proceed (and contribute directly to this branch).

Related to #1030

The-Compiler · 2015-09-21T06:22:34Z

By default, string IDs just get passed through:

# _pytest/python.py
def _idval(val, argname, idx, idfn):
    # ...
    if isinstance(val, (float, int, str, bool, NoneType)):
        return str(val)

It seems Python 2's json can't handle that, while Python 3 can:

$ python3 -c 'import json; print(json.dumps({"": "\xac\x10\x02G"}))'
{"": "\u00ac\u0010\u0002G"}

$ python2 -c 'import json; print(json.dumps({"": "\xac\x10\x02G"}))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xac in position 0: invalid start byte

I think the issue is that with Python 2, normal strings are bytestrings, while in Python 3, they're unicode strings.

I haven't looked at the cache code yet, but I think one solution would be to encode the data as utf-8 before passing it to json - the question is what should be done with unencodable data.

Another solution would be to sanitize invalid UTF-8 when generating the test IDs already - but I don't know enough about encodings in Python 2 to do that.

I think that'd be preferrable as it caused trouble elsewhere already.

RonnyPfannschmidt · 2015-09-21T07:41:16Z

we should simplyencode python2 strings that dont fit into ascii as string-escape,
and to be consistent encode python3 byre strings the same way

RonnyPfannschmidt · 2015-09-21T07:42:07Z

while we are at it we should have a unittest that ensures all kinds of test id string are encodable on all supported python versions,

Related to #1030; committing directly to pytest repository to get feedback from others on how to proceed.

nicoddemus · 2015-09-23T02:28:57Z

Thanks @RonnyPfannschmidt for the suggestion! 😄

Now bytes are escaped if they are not directly convertible to ascii.

Python 2: I also took the opportunity to apply the same logic for unicode strings, they are returned as plain bytes/str if they are ascii. This is specially useful for python 2 shops which always use from __future__ import unicode_literals. Also I think this is acceptable solution for #656.

Remove two xfail marks related to #714 which have been passing since #908, I presume.

RonnyPfannschmidt · 2015-09-23T07:03:38Z

well done

Parametrized values containing non-ascii bytes break cache

RonnyPfannschmidt · 2015-09-29T18:15:56Z

_pytest/python.py

+        is a full ascii string, otherwise escape it into its binary form.
+        """
+        try:
+            return val.encode('ascii')


@nicoddemus encode/decode mismatch

also the kind of code that begs for unit-tests to run across all code paths

True, thanks for tracking this down.

Some unit-tests were written for this (take a look at test_idmaker_native_strings), you think adding more values in there that demonstrate this problem is enough, or some other tests should be written?

directly for _escape_bytes, the idea is to properly run trough all code-paths of that function
so we don't have to exercise the code paths via the idmaker

Yeah, I reached the same conclusion after I posted my comment and thought about it a little more. I will start working on this now. 😅

nicoddemus added 3 commits September 22, 2015 23:18

Write failing test for parametrized tests with unmarshable parameters

661495e

Related to #1030; committing directly to pytest repository to get feedback from others on how to proceed.

escape bytes when creating ids for parametrized values

e106367

Update CHANGELOG

716fa97

nicoddemus force-pushed the unmarshable-parametrize branch from c9b0370 to 716fa97 Compare September 23, 2015 02:21

nicoddemus changed the title ~~Write failing test for parametrized tests with unmarshable parameters~~ Parametrized values containing non-ascii bytes break cache Sep 23, 2015

RonnyPfannschmidt added a commit that referenced this pull request Sep 23, 2015

Merge pull request #1031 from pytest-dev/unmarshable-parametrize

a3fdcd9

Parametrized values containing non-ascii bytes break cache

RonnyPfannschmidt merged commit a3fdcd9 into master Sep 23, 2015

nicoddemus deleted the unmarshable-parametrize branch September 23, 2015 07:56

nicoddemus mentioned this pull request Sep 23, 2015

Better Unicode handling in ID generation #656

Closed

RonnyPfannschmidt reviewed Sep 29, 2015
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Parametrized values containing non-ascii bytes break cache #1031

Parametrized values containing non-ascii bytes break cache #1031

Uh oh!

nicoddemus commented Sep 21, 2015

Uh oh!

The-Compiler commented Sep 21, 2015

Uh oh!

RonnyPfannschmidt commented Sep 21, 2015

Uh oh!

RonnyPfannschmidt commented Sep 21, 2015

Uh oh!

nicoddemus commented Sep 23, 2015

Uh oh!

RonnyPfannschmidt commented Sep 23, 2015

Uh oh!

RonnyPfannschmidt Sep 29, 2015

Uh oh!

nicoddemus Sep 29, 2015

Uh oh!

RonnyPfannschmidt Sep 29, 2015

Uh oh!

nicoddemus Sep 29, 2015

Uh oh!

Uh oh!

Uh oh!

Parametrized values containing non-ascii bytes break cache #1031

Parametrized values containing non-ascii bytes break cache #1031

Uh oh!

Conversation

nicoddemus commented Sep 21, 2015

Uh oh!

The-Compiler commented Sep 21, 2015

Uh oh!

RonnyPfannschmidt commented Sep 21, 2015

Uh oh!

RonnyPfannschmidt commented Sep 21, 2015

Uh oh!

nicoddemus commented Sep 23, 2015

Uh oh!

RonnyPfannschmidt commented Sep 23, 2015

Uh oh!

RonnyPfannschmidt Sep 29, 2015

Choose a reason for hiding this comment

Uh oh!

nicoddemus Sep 29, 2015

Choose a reason for hiding this comment

Uh oh!

RonnyPfannschmidt Sep 29, 2015

Choose a reason for hiding this comment

Uh oh!

nicoddemus Sep 29, 2015

Choose a reason for hiding this comment

Uh oh!

Uh oh!