bpo-29729: uuid.UUID now accepts bytes-like object #3801

vstinner · 2017-09-28T14:21:55Z

uuid.UUID doesn't require the 'bytes' argument to be an instance of
bytes: accept bytes-like types, bytearray and memoryview for example.

https://bugs.python.org/issue29729

uuid.UUID doesn't require the 'bytes' argument to be an instance of bytes: accept bytes-like types, bytearray and memoryview for example.

serhiy-storchaka

Add a "versionchanged" directive in the module documentation.

Hmm, memoryview is not accepted as bytes_le. And the length check works correctly only with memoryview with byte format.

* Accept also bytes-like for bytes_le: convert memoryview to bytes to use slices. Check also the length after the conversion to integer to raise a TypeError even if the length is not 16. * Fix unit tests: pass bytes to 'bytes' and 'bytes_le' parameters * Add more tests on Unicode passed to bytes and bytes_le * Document UUID changes

vstinner · 2017-09-28T19:35:25Z

Hmm, memoryview is not accepted as bytes_le

Oh, I didn't look at bytes_le. I also modified bytes_le to accept bytes-like objects.

And the length check works correctly only with memoryview with byte format.

I took care of such issues in my updated PR.

serhiy-storchaka · 2017-09-28T19:48:37Z

Lib/uuid.py

@@ -159,15 +159,18 @@ def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None,
                raise ValueError('badly formed hexadecimal UUID string')
            int = int_(hex, 16)
        if bytes_le is not None:
+            if not isinstance(bytes_le, (bytes_, bytearray)):


I don't understand why bytes_le should accept an int? Do you have an example please?

It should not! But accepts. Add a test for bytes_le=16. 😉

serhiy-storchaka · 2017-09-28T19:50:01Z

Misc/NEWS.d/next/Library/2017-09-28-16-21-13.bpo-29729.skDkJA.rst

@@ -0,0 +1,3 @@
+uuid.UUID now accepts bytes-like object. The constructor doesn't require the
+'bytes' argument to be an instance of bytes: accept bytes-like types,


... and 'bytes_le'

Oops, fixed.

serhiy-storchaka · 2017-09-28T19:54:13Z

A ValueError is raised if 'bytes_le' is a str of len != 16. To solve this duplicate int.from_bytes() for bytes_le instead of reusing the code for bytes.

vstinner · 2017-09-28T20:40:15Z

A ValueError is raised if 'bytes_le' is a str of len != 16.

I added an unit tests to ensure that a TypeError is raised if you pass a str of any length to bytes or bytes_le.

vstinner · 2017-09-28T20:42:14Z

Please don't merge my PR until PR #3796 is merged. IMHO PR #3796 is more important :-)

serhiy-storchaka · 2017-09-28T21:16:21Z

I added an unit tests to ensure that a TypeError is raised if you pass a str of any length to bytes or bytes_le.

Ah, yes, a bytes constructor raises an error.

But still there is a problem with memoryview with non-byte format. bytes_le=memoryview(b'\0'*16).cast('I') works, but bytes=memoryview(b'\0'*16).cast('I') is rejected and bytes=memoryview(b'\0'*64).cast('I') is accepted by mistake.

vstinner · 2017-09-28T21:34:54Z

PR #3796 has been merged. I merged master into my branch.

But still there is a problem with memoryview with non-byte format. memoryview(b'\0'*64).cast('I') is accepted by mistake.

I don't think that we should overengineer the code here. We are consenting adults here. You can also find a value to skip input validations. It's really hard to write the perfect validation. If you can a random memoryview, yeah, maybe we get a valid UUID object whereas you shouldn't. Well, I don't think that we should detect this corner case. Why would you do that on purpose?

vadmium

It may not be worth supporting every kind of invalid input (unless you have to handle untrusted input). But it is important to implement what the documentation promises (or fix the documentation). As of rev a169e89, bytes has to also support len.

vadmium · 2017-09-30T07:30:46Z

Lib/uuid.py

@@ -160,15 +160,19 @@ def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None,
                raise ValueError('badly formed hexadecimal UUID string')
            int = int_(hex, 16)
        if bytes_le is not None:
+            # Don't cast int to bytes to get a TypeError above
+            if not isinstance(bytes_le, (bytes_, bytearray, int_)):
+                bytes_le = bytes_(bytes_le)
            if len(bytes_le) != 16:


By above in the comment, do you mean this check, which is actually below the comment? :)

I removed the comment.

vstinner · 2017-10-02T15:41:45Z

@vadmium: "It may not be worth supporting every kind of invalid input (unless you have to handle untrusted input). But it is important to implement what the documentation promises (or fix the documentation). As of rev a169e89, bytes has to also support len."

I'm sorry, but I don't understand what you are expecting from me. What do you mean by "bytes has to also support len"? Do you expect a nicer error message if len() fails?

vstinner · 2017-10-02T15:42:08Z

@serhiy-storchaka, @vadmium: Would you mind to review the latest version of my change, please?

vadmium · 2017-10-03T05:52:22Z

For your code to work, the object passed as the bytes argument has to implement __len__ returning 16. But your documentation promises any bytes-like object should work (as long as it is a string of 16 bytes). Since the glossary definition of bytes-like objects has no requirements about implementing __len__, I would expect len to not be used on the argument. You could change the code for bytes to make a copy like you do for bytes_le:

bytes = bytes_(bytes)  # Convert bytes-like to bytes object
if len(bytes) != 16 . . .

Or you could use memoryview:

with memoryview(bytes) as view:
    if view.nbytes != 16 . . .

Or you could change the documentation:

bytes_le now accepts any bytes-like object, and bytes now accepts any bytes-like object for which len returns 16.

vstinner · 2017-10-03T09:47:11Z

Or you could change the documentation:

UUID constructor documentation is explicit:

   Create a UUID from either a string of 32 hexadecimal digits, a string of 16
   bytes as the *bytes* argument, a string of 16 bytes in little-endian order as
   the *bytes_le* argument, ...

I guess that your remarks was on my "versionchanged" markup and the NEWS entry. I added "of 16 bytes" there.

pitrou · 2017-10-04T21:36:29Z

Lib/uuid.py

            if len(bytes_le) != 16:
                raise ValueError('bytes_le is not a 16-char string')
            bytes = (bytes_le[4-1::-1] + bytes_le[6-1:4-1:-1] +
                     bytes_le[8-1:6-1:-1] + bytes_le[8:])
        if bytes is not None:
+            int = int_.from_bytes(bytes, byteorder='big')
+            # test length after the conversion to raise a TypeError exception
+            # if 'bytes' type is str even if 'bytes' length is not 16
            if len(bytes) != 16:


>>> a = array.array('i', [-1]*4) >>> x = int.from_bytes(a, byteorder='big') >>> len(a) 4 >>> x.bit_length()/8 16.0

And conversely:

>>> a = array.array('i', [-1]*16) >>> x = int.from_bytes(a, byteorder='big') >>> len(a) 16 >>> x.bit_length()/8 64.0

vadmium · 2017-10-04T22:50:58Z

Yes I was talking about the versionchanged note, which now says "any bytes-like objects of 16 bytes". But I don't think this is clear enough. Antoine and Serhiy already gave examples of bytes-like objects containing exactly 16 bytes where len returns a different number.

Since you already copy bytes_le into a true bytes object, why not parallel this for the bytes parameter?

if bytes is not None:
    # Check for int so that bytes=16 triggers an exception below
    if not isinstance(bytes, (bytes_, bytearray, int_)):
        bytes = bytes_(bytes)  # Copy any bytes-like object
    if len(bytes) != 16:
        raise ValueError('bytes is not a 16-byte string')
    int = int_.from_bytes(bytes, 'big')

vstinner · 2017-10-05T11:56:50Z

I decided to work on https://bugs.python.org/issue29729 because I understand that the proposed change was just to remove an assertion. That's all.

It seems like you expect much more changes, much more complex code. It's already my 6th version of my pull request.

Honestly, I don't care much of the uuid module. I don't think that anyone ever tried to call UUID(bytes=value) with a type different than bytes (ok, apart the bug reporter ;-)). If I would really need to pass non-bytes, having to cast manually with bytes(value) wouldn't burn my hands.

For all these reasons, I abandon my change. Feel free to reuse my code if someone wants to pursue the effort on UUID enhancement ;-)

serhiy-storchaka · 2017-10-05T12:54:07Z

The problem with adding support of bytes-like objects is that in different cases this term has different meaning. The definition in the glossary is too wide, in many particular cases we need the support of not just the buffer protocol, but len(), indexing, slicing, iterating, startswith() or even lower().

I think we need different terms for different levels of "bytesness".

vstinner · 2017-10-05T13:46:20Z

I think that we should replace the assertion with a regular if and raise a proper TypeError, rather than trying to support "bytes-like" objects. It's fine to require bytes on UUID. UUID is not a common type, and passing bytes to UUID is even less common (IMHO).

bpo-29729: uuid.UUID now accepts bytes-like object

9038f43

uuid.UUID doesn't require the 'bytes' argument to be an instance of bytes: accept bytes-like types, bytearray and memoryview for example.

vstinner requested a review from serhiy-storchaka September 28, 2017 14:21

the-knights-who-say-ni added the CLA signed label Sep 28, 2017

bedevere-bot added the awaiting merge label Sep 28, 2017

serhiy-storchaka reviewed Sep 28, 2017

View reviewed changes

Update the NEWS entry.

58d013c

vstinner added 2 commits September 28, 2017 23:30

Merge branch 'master' into uuid_bytes

d7262ad

Fix UUID(bytes_le=16)

a169e89

vadmium reviewed Sep 30, 2017

View reviewed changes

Remove a comment

583c9fc

vstinner added 2 commits October 3, 2017 11:42

Merge branch 'master' into uuid_bytes

f437da5

Repeat that objects must have a length of 16

bf919ce

pitrou reviewed Oct 4, 2017

View reviewed changes

vstinner closed this Oct 5, 2017

vstinner deleted the uuid_bytes branch October 5, 2017 11:56

Mariatta removed the awaiting merge label Oct 8, 2017

		@@ -0,0 +1,3 @@
		uuid.UUID now accepts bytes-like object. The constructor doesn't require the
		'bytes' argument to be an instance of bytes: accept bytes-like types,

Uh oh!

bpo-29729: uuid.UUID now accepts bytes-like object #3801

bpo-29729: uuid.UUID now accepts bytes-like object #3801

Uh oh!

Conversation

vstinner commented Sep 28, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner commented Sep 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented Sep 28, 2017

Uh oh!

vstinner commented Sep 28, 2017

Uh oh!

vstinner commented Sep 28, 2017

Uh oh!

serhiy-storchaka commented Sep 28, 2017

Uh oh!

vstinner commented Sep 28, 2017

Uh oh!

vadmium left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner commented Oct 2, 2017

Uh oh!

vstinner commented Oct 2, 2017

Uh oh!

vadmium commented Oct 3, 2017

Uh oh!

vstinner commented Oct 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vadmium commented Oct 4, 2017

Uh oh!

vstinner commented Oct 5, 2017

Uh oh!

serhiy-storchaka commented Oct 5, 2017

Uh oh!

vstinner commented Oct 5, 2017

Uh oh!

Uh oh!

vstinner commented Sep 28, 2017 •

edited by bedevere-bot

Loading