Skip to content

Fix UnicodeDecodeError in assertion with mixed non-ascii bytes repr + text #4001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 20, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog/3999.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix ``UnicodeDecodeError`` in python2.x when a class returns a non-ascii binary ``__repr__`` in an assertion which also contains non-ascii text.
13 changes: 9 additions & 4 deletions src/_pytest/assertion/rewrite.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import os
import re
import six
import string
import struct
import sys
import types
Expand Down Expand Up @@ -466,10 +467,14 @@ def _saferepr(obj):

"""
r = py.io.saferepr(obj)
if isinstance(r, six.text_type):
return r.replace(u"\n", u"\\n")
else:
return r.replace(b"\n", b"\\n")
# only occurs in python2.x, repr must return text in python3+
if isinstance(r, bytes):
# Represent unprintable bytes as `\x##`
r = u"".join(
u"\\x{:x}".format(ord(c)) if c not in string.printable else c.decode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there the hex-escape encoding we can use instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string-escape, yes: but it also escapes quote characters
unicode-escape: almost, but it decodes escape sequences to their unicode character

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asottile i see, thanks for researching the details

for c in r
)
return r.replace(u"\n", u"\\n")


from _pytest.assertion.util import format_explanation as _format_explanation # noqa
Expand Down
18 changes: 17 additions & 1 deletion testing/test_assertrewrite.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function

import glob
Expand Down Expand Up @@ -57,7 +58,7 @@ def getmsg(f, extra_ns=None, must_pass=False):
except AssertionError:
if must_pass:
pytest.fail("shouldn't have raised")
s = str(sys.exc_info()[1])
s = six.text_type(sys.exc_info()[1])
if not s.startswith("assert"):
return "AssertionError: " + s
return s
Expand Down Expand Up @@ -608,6 +609,21 @@ def __repr__(self):

assert r"where 1 = \n{ \n~ \n}.a" in util._format_lines([getmsg(f)])[0]

def test_custom_repr_non_ascii(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test something we want to keep around after dropping Python 2? If not, I suggest to leave a comment so we can remove the test itself when the time comes (the code itself already contains a "python2.x" so I think that's enough). 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't have a need for this when python2.x is gone, should I update the test or do you think the 2.x comment in the code is enough? (I actually added the comment with the same thought because I knew we were eventually going to want to clean it up!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in the code is enough to remove the code, but the test doesn't have any reference so I think it will mostly stay around. One alternative is to skip the test on python 3,this makes it clear it is python2 only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, yeah I added a comment to the test

def f():
class A(object):
name = u"ä"

def __repr__(self):
return self.name.encode("UTF-8") # only legal in python2

a = A()
assert not a.name

msg = getmsg(f)
Copy link
Contributor

@blueyed blueyed Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call f maybe?

assert "UnicodeDecodeError" not in msg
assert "UnicodeEncodeError" not in msg


class TestRewriteOnImport(object):
def test_pycache_is_a_file(self, testdir):
Expand Down