-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
implement PEP 3118 struct changes #47382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It seems the new modifiers to the struct.unpack/pack module that were |
If the struct changes are made, add also 2 formats for C types ssize_t and |
It's looking pessimistic that this is going to make it by beta 3. If |
Let's retarget it to 3.1 then. It's a new feature, not a behaviour |
:-( |
This can be re-targeted to 3.1 as described. |
Travis, Does this supersede bpo-2395 or is this a subset of that one.? |
Is anyone working on implementing these new struct modifiers? If not, then I would love to take a shot at it. |
2010/2/12 Meador Inge <[email protected]>:
Not to my knowledge. |
On Feb 12, 2010, at 7:29 PM, Meador Inge wrote:
That would be great. -Travis |
Some of the proposed struct module additions look far from straightforward; I find that section of the PEP significantly lacking in details and motivation. "Unpacking a long-double will return a decimal object or a ctypes long-double." Returning a Decimal object here doesn't make a lot of sense, since Decimal objects aren't generally compatible with floats. And ctypes long double objects don't seem to exist, as far as I can tell. It might be better not to add this code. Another bit that's not clear to me: how is unpacking an object pointer expected to work, and how would it typically be used? What if the unpacked pointer no longer points to a valid Python object? How would this work in other Python implementations? For the 'X{}' format (pointer to a function), is this supposed to mean a Python function or a C function? What's a 'specific pointer'? |
Whoops. ctypes does have long double, of course. Apologies. |
Hi All, On Sat, Feb 13, 2010 at 5:07 AM, Mark Dickinson <[email protected]>wrote:
I agree.
And under what conditions would a ctype long double be used vs. a Decimal Another bit that's not clear to me: how is unpacking an object pointer
I guess if an object associated with the packed address does not exist, then
I read that as a Python function. However, I am not completely sure how the
I think this means a pointer to a specific type, e.g. '&d' is a pointer to a I also have the following questions:
The new features of the struct-string syntax are so different that I think In addition, I was thinking that a reasonable implemenation strategy would I think this will simplify the implementation and will provide a way to I have attached a patch against the PEP containing my current thoughts on |
Well, I'm guessing that this was really just an open question for the PEP, and that the PEP authors hadn't decided which of these two options was more appropriate. If all long doubles were converted to Decimal, then we need to determine what precision is appropriate to use for the conversion: any long double *can* be represented exactly as a Decimal, but to get an exact representation can need thousands of digits in some cases, so it's probably better to always round to some fixed number of signficant digits. 36 significant digits is a reasonable choice here: it's the minimum number of digits that's guaranteed to distinguish two distinct long doubles, for the case where a long double has 113 bits of precision (i.e., IEEE 754 binary128 format); other common long double formats have smaller precision than this (usually 53 (normal double), 64 (x87 extended doubles), or 106 (double double)). There would probably also need to be some way to 'repack' the Decimal instance. The 'platform long double -> Decimal' conversion itself would also be nontrivial to implement; I can lend a hand here if you want it. Using ctypes makes more sense to me, since it doesn't involve trying to mix decimal and binary, except that I don't know whether it's acceptable for other standard library modules to have dependencies on ctypes. I'm not sure whether ctypes is available on all platforms that Python runs on. It's also a bit ugly that, depending on the platform, sometimes a long double will unpack to an instance of ctypes.long_double, and sometimes (when long double == double) to a regular Python float. Anyway, this particular case (long double) isn't a big deal: it can be overcome, one way or another. I'm more worried about some of the other aspects of the changes. [About unpacking with the 'O' format.]
And how do you determine whether an address gives a valid object or not? I can only assume that packing and unpacking with the 'O' format is only supposed to be used in certain restricted circumstances, but it's not clear to me what those circumstances are.
I think a lot of this discussion needs to go back to python-dev; with luck, we can get some advice and clarifications from the PEP authors there. I'm not sure whether it's appropriate to modify the original PEP (especially since it's already accepted), or whether it would be better to produce a separate document describing the proposed changes in detail. |
I'm looking for previous discussions of this PEP. There's a python-dev thread in April 2007: http://mail.python.org/pipermail/python-dev/2007-April/072537.html Are there other discussions that I'm missing? |
Mark,
So the next step is to kick off a thread on python-dev summarizing the questions\problems we have come up with? I can get that started.
I did a quick search and came up with the same. |
Closed bpo-2395 as a duplicate of this one. |
[Meador Inge]
Sounds good. I'd really like to see some examples of how these struct-module additions would be used in real life. |
About long doubles again: I just encountered someone on the #python IRC channel who wanted to know whether struct.pack and struct.unpack supported reading and writing of x87 80-bit long doubles (padded to 12 bytes each in the input). A few quotes from him/her, with permission (responses from others, including me, edited out; I can supply a fuller transcript if necessary, but I hope what's below isn't misleading). [18:39] bdesk: Hi, is struct.pack able to handle 80-bit x86 extended floats? The main thing that I realized from this is that unpacking as a ctypes long double isn't all that useful for someone who wants to be able to do arithmetic on the unpacked result. And if you don't want to do arithmetic on the unpacked result, then you're probably just shuffling the bytes around without caring about their meaning, so there's no need to unpack as anything other than a sequence of 12 bytes. On the other hand, I suppose it's enough to be able to unpack as a ctypes c_longdouble and then convert to a Python float (losing precision) for the arithmetic. Alternatively, we might consider simply unpacking a long double directly into a Python float (and accepting the loss of precision); that seems to be what would be most useful for the use-case above. |
I agree. Especially since ctypes 'long double' maps to a Python float and
One benefit of having a type code for 'long double' (assuming you are mapping ===========================================
I guess that would be acceptable. The only thing that I don't like is that # this will not hold
Which use case? From the given IRC trace it seems that 'bdesk' was mainly So using ctypes 'long double' is easier to implement, but is lossy and clunky |
Agreed: it's nice to have struct.pack already know your machine. Actually, this brings up (yet) another open question: native packing/unpacking of a long double would presumably return something corresponding to the platform long double, as above; but non-native packing/unpacking should do something standard, instead, for the sake of interoperability between platforms. Currently, I believe that packing a Python float always---even in native mode---packs in IEEE 754 format, even when the platform doubles aren't IEEE 754. For native packing/unpacking, I'm slowly becoming convinced that unpacking as a ctypes long double is the only thing that makes any sense, so that we keep round-tripping, as you point out. The user can easily enough extract the Python float for numerical work. I still don't like having the struct module depend on ctypes, though. |
Attached is a patch that implements part of the additions. More specifically, the 'T{}' syntax and the ability to place byte-order specifiers ('<', '>', '@', '^', '!", '=') anywhere in the struct string. The changes dictated by the PEP are so big that it is better to split things up into multiple patches. These two features will lay some ground work and are probably less controversial than the others. Surely some more tweaks will be needed, but I think what I have now is at least good enough for review. I tested on OS X 10.6 and Ubuntu 10.4. I also used valgrind and 'regrtest.py -R:' to check for memory and |
Thanks for this. Any chance you could upload the patch to Rietveld (http://codereview.appspot.com/) for ease of review? |
For reference, Numpy's PEP-3118 implementation is here: http://github.com/numpy/numpy/blob/master/numpy/core/_internal.py#L357 http://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/buffer.c#L76 It would be a good idea to ensure that the numpy and struct implementations are in agreement about details of the format strings.
|
Thanks for that, and the other information you give; that's helpful. It sounds like we're on the same page with respect to alignment of substructs. (Bar the mostly academic question of max versus lcm.) I still like the idea of scoped endianness markers in the substructs, but if we have to abandon that for compatibility with NumPy that's okay.
I'm still confused about how this could work: when unpacking, how do you know whether the PyObject* pointer points to a valid object or not? You can ensure that the pointer will always point to a valid object by having the *pack* operation increment reference counts, but then you need a way to automatically decref when the packed string goes out of scope. So the object returned by 'pack' would somehow have to be something other than a plain string, so that it can deal with automatically doing the DECREF of the held PyObject* pointers when it goes out of scope. What's the need to have the 'O' format in the struct module? Is it really necessary there? Can we get away with not implementing it? |
BTW, does this already exist in a released version of NumPy? If not, when is it likely to appear in the wild? |
That, or change the Numpy implementation. I don't believe there's yet much code in the wild that changes the alignment specifier on the fly. [clip: 'O' format code]
Yes, the packed object would need to own the references, and it would be the responsibility of the provider of the buffer to ensure that the pointers are valid. It seems that it's not possible for the Another possibility is to implement the 'O' format unsafely and leave managing the reference counting to whoever uses the [clip]
Numpy arrays, when containing Python objects, function as per the 'O' format. However, for the struct module, I don't see what would be the use case for the 'O' format.
It's included since the 1.5.0 release which came out last July.
I think after the implementation is done, the PEP probably needs to be amended with clarifications (and possibly cutting out what is not really needed). |
Hmm. I don't much like that idea. Historically, it's supposed to be very difficult to segfault the Python interpreter with pure Python code (well except if you're using ctypes, I guess). |
Attached is the latest version of the struct string patch. I tested on OS X 10.6.5 (64-bit) and Ubuntu 10.04 (32-bit). I also scanned for memory problems with Valgrind. There is one test failing on 32-bit systems ('test_crasher'). This is due to the fact that 'struct.pack("357913941b", ...)' no longer tries to allocate 357913941 format codes. This implementation just allocates *one* code and assigns a count of 357913941, which is utilized later when packing/unpacking. Some work could be done to add better large memory consumption checks, though. Previous feedback has been incorporated:
As before, there will surely be more iterations, but this is good enough for general review to see if things are headed in the right direction. This is a difficult one for review because the diffs are really large. I placed a review on Rietveld here: http://codereview.appspot.com/3863042/. If anyone has any ideas on how to reduce the number number of diffs (perhaps a way to do multiple smaller pataches), then that would be cool. I don't see an obvious way to do this at this point. |
Is there still any interest in this work? |
Yes, there's interest (at least here). I've just been really short on Python-time recently, so haven't found time to review your patch. |
I'm going to unassign for now; I still hope to look at this at some point, but can't see a time in the near future when it's going to happen. |
Is this work something that might be suitable for the features/pep-3118 repo (http://hg.python.org/features/pep-3118/) ? |
Yes, definitely. I'm going to push a new memoryview implementation Once that is done, perhaps we could create a memoryview-struct |
Following up here after rejecting bpo-15622 as invalid The "unicode" codes in PEP-3118 need to be seriously rethought before any related changes are made in the struct module.
UCS1 would then be "S{latin-1}", UCS2 would be approximated as "S{utf-16}" and UCS4 would be "S{utf-32}" and arbitrary encodings would also be supported. struct packing would implicitly encode from text to bytes while unpacking would implicitly decode bytes to text. As with 's' a length mismatch in the encoded form would mean an error. |
Following up on http://mail.python.org/pipermail/python-ideas/2011-March/009656.html, I would like to request that struct also handle half-precision floats directly. It's a short change, and half-precision floats are becoming much more popular in applications. Adding this to struct would also maybe need to change math.isinf and math.isnan, but maybe not. |
Paul: there's already an open issue for adding float16 to the struct module: see bpo-11734. |
Whoops, never mind. Thanks for the pointer to 11734. |
Here's a grammar that roughly describes the subset that NumPy supports. As for implementing this in the struct module: There is a new data http://datashape.readthedocs.org/en/latest/ It does not have all the low-level capabilities (e.g changing alignment PEP-3118: "(2,3)10f0fZdT{10B:x:(2,3)d:y:Q:z:}B" There are a lot of open questions still. Should "10f" be viewed as an In the context of PEP-3118, I think so. |
It's been more than ten years now and most of the additions to At this point, I would propose to close the issue. If there is interest in adding new codes, that should be discussed in a new feature with a more specific motivation. It doesn't make sense to implement them based on a PEP from more than a decade ago. |
I'll note that Numpy does seem to support at least part of these additions: >>> dt = np.dtype([('x', np.float64), ('y', np.float64)])
>>> a = np.array([(1, 2)], dtype=dt)
>>> m = memoryview(a)
>>> m.format
'T{d:x:d:y:}' |
It would make sense to support at least what Numpy supports. |
Also note that
>>> import ctypes
>>> class C(ctypes.Structure):
... _fields_ = [(f'f{num}', type) for num, type in enumerate([
... ctypes.c_bool,
... ctypes.c_longdouble,
... ctypes.py_object,
... ctypes.c_double_complex,
... ctypes.c_wchar * 2,
... ])]
...
>>> memoryview(C()).format
'T{<?:f0:15x<g:f1:<O:f2:<C:f3:(2)<u:f4:}' There are formats in the wild that IMO, at this point, standardizing needs a new PEP. |
That was discussed, see e.g. here. Unfortunately, the |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: