Skip to content

implement PEP 3118 struct changes #47382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
benjaminp opened this issue Jun 17, 2008 · 63 comments
Open

implement PEP 3118 struct changes #47382

benjaminp opened this issue Jun 17, 2008 · 63 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@benjaminp
Copy link
Contributor

BPO 3132
Nosy @warsaw, @mdickinson, @ncoghlan, @abalkin, @pitrou, @devdanzin, @benjaminp, @pv, @skrah, @meadori, @vadmium
Files
  • pep-3118.patch
  • struct-string.py3k.patch: Patch for 'T{}' syntax and multiple byte order specifiers.
  • struct-string.py3k.2.patch: Patch with fixed assertions
  • struct-string.py3k.3.patch: Patch for 'T{}' against py3k r87813
  • grammar.y
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2008-06-17.22:30:31.496>
    labels = ['type-feature', 'library']
    title = 'implement PEP 3118 struct changes'
    updated_at = <Date 2016-04-13.10:20:21.199>
    user = 'https://github.com/benjaminp'

    bugs.python.org fields:

    activity = <Date 2016-04-13.10:20:21.199>
    actor = 'skrah'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2008-06-17.22:30:31.496>
    creator = 'benjamin.peterson'
    dependencies = []
    files = ['16242', '17386', '17416', '20298', '42451']
    hgrepos = []
    issue_num = 3132
    keywords = ['patch']
    message_count = 58.0
    messages = ['68347', '68507', '71313', '71316', '71338', '71342', '71882', '87921', '99296', '99297', '99309', '99312', '99313', '99460', '99472', '99474', '99551', '99655', '99656', '99677', '99711', '99771', '105952', '105955', '105970', '106087', '106088', '106089', '106090', '106091', '106153', '106155', '106157', '106164', '106168', '106173', '106175', '106177', '106180', '106181', '106188', '106416', '123093', '123204', '123205', '123226', '123366', '125617', '130694', '130695', '130696', '143505', '143509', '167963', '187583', '187589', '187591', '263321']
    nosy_count = 17.0
    nosy_names = ['barry', 'teoliphant', 'mark.dickinson', 'ncoghlan', 'belopolsky', 'pitrou', 'inducer', 'ajaksu2', 'MrJean1', 'benjamin.peterson', 'pv', 'Arfrever', 'noufal', 'skrah', 'meador.inge', 'martin.panter', 'paulehoffman']
    pr_nums = []
    priority = 'high'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue3132'
    versions = ['Python 3.6']

    @benjaminp
    Copy link
    Contributor Author

    It seems the new modifiers to the struct.unpack/pack module that were
    proposed in PEP-3118 haven't been implemented yet.

    @benjaminp benjaminp added the type-feature A feature request or enhancement label Jun 17, 2008
    @MrJean1
    Copy link
    Mannequin

    MrJean1 mannequin commented Jun 21, 2008

    If the struct changes are made, add also 2 formats for C types ssize_t and
    size_t, perhaps 'z' resp. 'Z'. In particular since on platforms
    sizeof(size_t) != sizeof(long).

    @warsaw
    Copy link
    Member

    warsaw commented Aug 18, 2008

    It's looking pessimistic that this is going to make it by beta 3. If
    they can't get in by then, it's too late.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 18, 2008

    Let's retarget it to 3.1 then. It's a new feature, not a behaviour
    change or a deprecation, so adding it to 3.0 isn't a necessity.

    @pitrou pitrou added stdlib Python modules in the Lib dir and removed release-blocker labels Aug 18, 2008
    @benjaminp
    Copy link
    Contributor Author

    Actually, this may be a requirement of bpo-2394; PEP-3118 states that
    memoryview.tolist would use the struct module to do the unpacking.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 18, 2008

    Actually, this may be a requirement of bpo-2394; PEP-3118 states that
    memoryview.tolist would use the struct module to do the unpacking.

    :-(
    However, we don't have any examples of the buffer API / memoryview
    object working with something else than 1-dimensional contiguous char
    arrays (e.g. bytearray). Therefore, I suggest that Python 3.0 provide
    official support only for 1-dimensional contiguous char arrays. Then
    tolist() will be easy to implement even without using the struct module
    (just a list of integers, if I understand the functionality).

    @teoliphant
    Copy link
    Mannequin

    teoliphant mannequin commented Aug 24, 2008

    This can be re-targeted to 3.1 as described.

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented May 16, 2009

    Travis,
    Do you think you can contribute for this to actually land in 3.2? Having
    a critical issue slipping from 3.0 to 3.3 would be bad...

    Does this supersede bpo-2395 or is this a subset of that one.?

    @meadori
    Copy link
    Member

    meadori commented Feb 13, 2010

    Is anyone working on implementing these new struct modifiers? If not, then I would love to take a shot at it.

    @benjaminp
    Copy link
    Contributor Author

    2010/2/12 Meador Inge <[email protected]>:

    Meador Inge <[email protected]> added the comment:

    Is anyone working on implementing these new struct modifiers?  If not, then I would love to take a shot at it.

    Not to my knowledge.

    @teoliphant
    Copy link
    Mannequin

    teoliphant mannequin commented Feb 13, 2010

    On Feb 12, 2010, at 7:29 PM, Meador Inge wrote:

    Meador Inge <[email protected]> added the comment:

    Is anyone working on implementing these new struct modifiers? If
    not, then I would love to take a shot at it.

    That would be great.

    -Travis

    @mdickinson
    Copy link
    Member

    Some of the proposed struct module additions look far from straightforward; I find that section of the PEP significantly lacking in details and motivation.

    "Unpacking a long-double will return a decimal object or a ctypes long-double."

    Returning a Decimal object here doesn't make a lot of sense, since Decimal objects aren't generally compatible with floats. And ctypes long double objects don't seem to exist, as far as I can tell. It might be better not to add this code.

    Another bit that's not clear to me: how is unpacking an object pointer expected to work, and how would it typically be used? What if the unpacked pointer no longer points to a valid Python object? How would this work in other Python implementations?

    For the 'X{}' format (pointer to a function), is this supposed to mean a Python function or a C function?

    What's a 'specific pointer'?

    @mdickinson
    Copy link
    Member

    Whoops. ctypes does have long double, of course. Apologies.

    @meadori
    Copy link
    Member

    meadori commented Feb 17, 2010

    Hi All,

    On Sat, Feb 13, 2010 at 5:07 AM, Mark Dickinson <[email protected]>wrote:

    Mark Dickinson <[email protected]> added the comment:

    Some of the proposed struct module additions look far from straightforward;
    I find that section of the PEP significantly lacking in details and
    motivation.

    I agree.

    "Unpacking a long-double will return a decimal object or a ctypes
    long-double."

    Returning a Decimal object here doesn't make a lot of sense, since Decimal
    objects aren't generally compatible with floats. And ctypes long double
    objects don't seem to exist, as far as I can tell. It might be better not
    to add this code.

    And under what conditions would a ctype long double be used vs. a Decimal
    object.

    Another bit that's not clear to me: how is unpacking an object pointer

    expected to work, and how would it typically be used? What if the unpacked
    pointer no longer points to a valid Python object? How would this work in
    other Python implementations?

    I guess if an object associated with the packed address does not exist, then
    you would unpack None (?). This is especially a problem if the struct-sting
    is being sent over the wire to another machine.

    For the 'X{}' format (pointer to a function), is this supposed to mean a
    Python function or a C function?

    I read that as a Python function. However, I am not completely sure how the
    prototype would be enforced when unpacking. I am also wondering, though,
    how the signatures on pointers-to-functions are specified? Are
    the arguments and return type full struct strings as well?

    What's a 'specific pointer'?

    I think this means a pointer to a specific type, e.g. '&d' is a pointer to a
    double. If this is the case, though, the use cases are not completely clear
    to me.

    I also have the following questions:

    • Can pointers be nested, '&&d' ?

    • What nesting level can structures have? Arbitrary?

    • The new array syntax claims "multi-dimensional array of whatever follows".

      Truly whatever? Arrays of structures? Arrays of pointers?

    • "complex (whatever the next specifier is)". Not really 'whatever'. You
      can not have a 'complex bool' or 'complex int'. What other types of
      complex are there besides complex double?

    • How do array specifiers and pointer specifiers mix? For example, would
      '(2, 2)&d' be a two-by-two array of pointers to doubles? What about
      '&(2, 2)d'? Is this a pointer to an two-by-two array of doubles?

    The new features of the struct-string syntax are so different that I think
    we
    need to specify a grammar. I think it will clarify some of the open
    questions.

    In addition, I was thinking that a reasonable implemenation strategy would
    be to keep the current struct-string syntax mostly in place within the C
    module
    implementation. The C implementation would just provide an interface to
    pack\unpack sequences of primitive data elements. Then we could write a
    layer in the Python 'struct' module that took care of the higher-order
    concepts like nested structures, arrays, named values, and pointers to
    functions. The higher-order concepts would be mapped to the appropriate
    primitive sequence strings.

    I think this will simplify the implementation and will provide a way to
    phase
    it. We can implement the primitive type extensions in C first followed by
    the higher-level Python stuff. The result of each phase is immediately
    usuable.

    I have attached a patch against the PEP containing my current thoughts on
    fleshing out the grammar and some of the current open questions. This still
    needs work, but I wanted to share to see if I am on the right track.
    Please advise on how to proceed.

    @mdickinson
    Copy link
    Member

    And under what conditions would a ctype long double be used vs. a
    Decimal object.

    Well, I'm guessing that this was really just an open question for the PEP, and that the PEP authors hadn't decided which of these two options was more appropriate. If all long doubles were converted to Decimal, then we need to determine what precision is appropriate to use for the conversion: any long double *can* be represented exactly as a Decimal, but to get an exact representation can need thousands of digits in some cases, so it's probably better to always round to some fixed number of signficant digits. 36 significant digits is a reasonable choice here: it's the minimum number of digits that's guaranteed to distinguish two distinct long doubles, for the case where a long double has 113 bits of precision (i.e., IEEE 754 binary128 format); other common long double formats have smaller precision than this (usually 53 (normal double), 64 (x87 extended doubles), or 106 (double double)). There would probably also need to be some way to 'repack' the Decimal instance.

    The 'platform long double -> Decimal' conversion itself would also be nontrivial to implement; I can lend a hand here if you want it.

    Using ctypes makes more sense to me, since it doesn't involve trying to mix decimal and binary, except that I don't know whether it's acceptable for other standard library modules to have dependencies on ctypes. I'm not sure whether ctypes is available on all platforms that Python runs on. It's also a bit ugly that, depending on the platform, sometimes a long double will unpack to an instance of ctypes.long_double, and sometimes (when long double == double) to a regular Python float.

    Anyway, this particular case (long double) isn't a big deal: it can be overcome, one way or another. I'm more worried about some of the other aspects of the changes.

    [About unpacking with the 'O' format.]

    I guess if an object associated with the packed address does not
    exist, then you would unpack None (?). This is especially a problem
    if the struct-sting is being sent over the wire to another machine.

    And how do you determine whether an address gives a valid object or not? I can only assume that packing and unpacking with the 'O' format is only supposed to be used in certain restricted circumstances, but it's not clear to me what those circumstances are.

    I also have the following questions: [...]

    I think a lot of this discussion needs to go back to python-dev; with luck, we can get some advice and clarifications from the PEP authors there. I'm not sure whether it's appropriate to modify the original PEP (especially since it's already accepted), or whether it would be better to produce a separate document describing the proposed changes in detail.

    @mdickinson
    Copy link
    Member

    I'm looking for previous discussions of this PEP. There's a python-dev thread in April 2007:

    http://mail.python.org/pipermail/python-dev/2007-April/072537.html

    Are there other discussions that I'm missing?

    @meadori
    Copy link
    Member

    meadori commented Feb 19, 2010

    Mark,

    I think a lot of this discussion needs to go back to python-dev; with
    luck, we can get some advice and clarifications from the PEP authors
    there.

    So the next step is to kick off a thread on python-dev summarizing the questions\problems we have come up with? I can get that started.

    Are there other discussions that I'm missing?

    I did a quick search and came up with the same.

    @mdickinson
    Copy link
    Member

    Closed bpo-2395 as a duplicate of this one.

    @mdickinson
    Copy link
    Member

    [Meador Inge]

    So the next step is to kick off a thread on python-dev summarizing the
    questions\problems we have come up with? I can get that started.

    Sounds good. I'd really like to see some examples of how these struct-module additions would be used in real life.

    @mdickinson
    Copy link
    Member

    About long doubles again: I just encountered someone on the #python IRC channel who wanted to know whether struct.pack and struct.unpack supported reading and writing of x87 80-bit long doubles (padded to 12 bytes each in the input). A few quotes from him/her, with permission (responses from others, including me, edited out; I can supply a fuller transcript if necessary, but I hope what's below isn't misleading).

    [18:39] bdesk: Hi, is struct.pack able to handle 80-bit x86 extended floats?
    [18:40] bdesk: How can I read and write these 80-bit floats, in binary, using python?
    [18:44] bdesk: dickinsm: I have a C program that uses binary files as input and output, and I want to deal with these files using python if possible.
    [18:49] bdesk: I don't need to do arithmetic with the full 80 bits of precision within the python program, although It would be better if I could.
    [18:50] bdesk: I would need to use the float in a more semantically useful manner than treating it as a black box of 12 bytes.
    [18:55] bdesk: Until python gets higher precision floats, my preferred interface would be to lose some precision when unpacking the floats.

    The main thing that I realized from this is that unpacking as a ctypes long double isn't all that useful for someone who wants to be able to do arithmetic on the unpacked result. And if you don't want to do arithmetic on the unpacked result, then you're probably just shuffling the bytes around without caring about their meaning, so there's no need to unpack as anything other than a sequence of 12 bytes.

    On the other hand, I suppose it's enough to be able to unpack as a ctypes c_longdouble and then convert to a Python float (losing precision) for the arithmetic. Alternatively, we might consider simply unpacking a long double directly into a Python float (and accepting the loss of precision); that seems to be what would be most useful for the use-case above.

    @meadori
    Copy link
    Member

    meadori commented Feb 22, 2010

    The main thing that I realized from this is that unpacking as a ctypes long
    double isn't all that useful for someone who wants to be able to do arithmetic
    on the unpacked result.

    I agree. Especially since ctypes 'long double' maps to a Python float and
    '.value' would have to be referenced on the ctype 'long double' instance
    for doing arithmetic.

    And if you don't want to do arithmetic on the unpacked result, then you're
    probably just shuffling the bytes around without caring about their meaning,
    so there's no need to unpack as anything other than a sequence of 12 bytes.

    One benefit of having a type code for 'long double' (assuming you are mapping
    the value to the platform's 'long double') is that the you don't have to know
    how many bytes are in the underlying representation. As you know, it isn't
    always just 12 bytes. It depends on the architecture and ABI being used. Which
    from a quick sample, I am seeing can be anywhere from 8 to 16
    bytes:

    ===========================================
    | Compiler | Arch | Bytes |
    ===========================================
    | VC++ 8.0 | x86 | 8 |
    | VC++ 9.0 | x86 | 8 |
    | GCC 4.2.4 | x86 | 12 (default), 16 |
    | GCC 4.2.4 | x86-64 | 12, 16 (default) |
    | GCC 4.2.4 | PPC IBM | 16 |
    | GCC 4.2.4 | PPC IEEE | 16 |
    ===========================================

    On the other hand, I suppose it's enough to be able to unpack as a ctypes
    c_longdouble and then convert to a Python float (losing precision) for the
    arithmetic. Alternatively, we might consider simply unpacking a long double
    directly into a Python float (and accepting the loss of precision);

    I guess that would be acceptable. The only thing that I don't like is that
    since the transformation is lossy, you can't round trip:

    # this will not hold
    pack('g', unpack('g', byte_str)[0]) == byte_str

    that seems to be what would be most useful for the use-case above.

    Which use case? From the given IRC trace it seems that 'bdesk' was mainly
    concerned with (1) pushing bytes around, but (2) thought it "it would be better"
    to be able to do arithmetic and thought it would be more useful if it were
    not a "black box of 12 bytes". For use case (1) the loss of precision would
    probably not be acceptable, due to the round trip issue mentioned above.

    So using ctypes 'long double' is easier to implement, but is lossy and clunky
    for arithmetic. Using Python 'float' is easy to implement and easy for
    arithmetic, but is lossy. Using Decimal is non-lossy and easy for arithmetic,
    but the implementation would be non-trivial and architecture specific
    (unless we just picked a fixed number of bytes regardless of the architecture).

    @mdickinson
    Copy link
    Member

    One benefit of having a type code for 'long double' (assuming you are
    mapping the value to the platform's 'long double') is that the you
    don't have to know how many bytes are in the underlying representation.

    Agreed: it's nice to have struct.pack already know your machine.

    Actually, this brings up (yet) another open question: native packing/unpacking of a long double would presumably return something corresponding to the platform long double, as above; but non-native packing/unpacking should do something standard, instead, for the sake of interoperability between platforms. Currently, I believe that packing a Python float always---even in native mode---packs in IEEE 754 format, even when the platform doubles aren't IEEE 754.

    For native packing/unpacking, I'm slowly becoming convinced that unpacking as a ctypes long double is the only thing that makes any sense, so that we keep round-tripping, as you point out. The user can easily enough extract the Python float for numerical work. I still don't like having the struct module depend on ctypes, though.

    @meadori
    Copy link
    Member

    meadori commented May 18, 2010

    Attached is a patch that implements part of the additions. More specifically, the 'T{}' syntax and the ability to place byte-order specifiers ('<', '>', '@', '^', '!", '=') anywhere in the struct string.

    The changes dictated by the PEP are so big that it is better to split things up into multiple patches. These two features will lay some ground work and are probably less controversial than the others.

    Surely some more tweaks will be needed, but I think what I have now is at least good enough for review. I tested on OS X 10.6 and Ubuntu 10.4. I also used valgrind and 'regrtest.py -R:' to check for memory and
    reference leaks, respectively.

    @mdickinson
    Copy link
    Member

    Thanks for this.

    Any chance you could upload the patch to Rietveld (http://codereview.appspot.com/) for ease of review?

    @meadori
    Copy link
    Member

    meadori commented May 18, 2010

    @meadori meadori assigned meadori and unassigned teoliphant Aug 14, 2010
    @pv
    Copy link
    Mannequin

    pv mannequin commented Dec 2, 2010

    For reference, Numpy's PEP-3118 implementation is here:

    http://github.com/numpy/numpy/blob/master/numpy/core/_internal.py#L357

    http://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/buffer.c#L76

    It would be a good idea to ensure that the numpy and struct implementations are in agreement about details of the format strings.
    (I wouldn't take the Numpy implementation as the definitive one, though.)

    • The "sub-structs" in Numpy arrays (in align=True mode) are aligned
      according to the maximum alignment of the fields.

    • I assumed the 'O' format in the PEP is supposed to be similar to Numpy
      object arrays. This implies some reference counting semantics. The
      Numpy PEP-3118 implementation assumes the memory contains borrowed
      references, valid at least until the buffer is released.
      Unpacking 'O' should probably INCREF whatever PyObject* pointer is
      there.

    • I assumed the alignment specifiers were unscoped. I'm not sure
      however whether this is the best thing to do.

    • The function pointers and pointers to pointers were not implemented.
      (Numpy cannot represent those as data types.)

    @mdickinson
    Copy link
    Member

    For reference, Numpy's PEP-3118 implementation is here:

    Thanks for that, and the other information you give; that's helpful.

    It sounds like we're on the same page with respect to alignment of substructs. (Bar the mostly academic question of max versus lcm.)

    I still like the idea of scoped endianness markers in the substructs, but if we have to abandon that for compatibility with NumPy that's okay.

    • I assumed the 'O' format in the PEP is supposed to be similar to Numpy
      object arrays. This implies some reference counting semantics. The
      Numpy PEP-3118 implementation assumes the memory contains borrowed
      references, valid at least until the buffer is released.
      Unpacking 'O' should probably INCREF whatever PyObject* pointer is
      there.

    I'm still confused about how this could work: when unpacking, how do you know whether the PyObject* pointer points to a valid object or not? You can ensure that the pointer will always point to a valid object by having the *pack* operation increment reference counts, but then you need a way to automatically decref when the packed string goes out of scope. So the object returned by 'pack' would somehow have to be something other than a plain string, so that it can deal with automatically doing the DECREF of the held PyObject* pointers when it goes out of scope.

    What's the need to have the 'O' format in the struct module? Is it really necessary there? Can we get away with not implementing it?

    @mdickinson
    Copy link
    Member

    For reference, Numpy's PEP-3118 implementation is here

    BTW, does this already exist in a released version of NumPy? If not, when is it likely to appear in the wild?

    @pv
    Copy link
    Mannequin

    pv mannequin commented Dec 3, 2010

    I still like the idea of scoped endianness markers in the substructs,
    but if we have to abandon that for compatibility with NumPy that's
    okay.

    That, or change the Numpy implementation. I don't believe there's yet much code in the wild that changes the alignment specifier on the fly.

    [clip: 'O' format code]

    So the object returned by 'pack' would somehow
    have to be something other than a plain string, so that it can deal
    with automatically doing the DECREF of the held PyObject* pointers
    when it goes out of scope.

    Yes, the packed object would need to own the references, and it would be the responsibility of the provider of the buffer to ensure that the pointers are valid.

    It seems that it's not possible for the struct module to correctly implement packing for the 'O' format. Unpacking could be possible, though (but then if you don't have packing, how write tests for it?).

    Another possibility is to implement the 'O' format unsafely and leave managing the reference counting to whoever uses the struct module's capabilities. (And maybe return ctypes pointers on unpacking.)

    [clip]

    What's the need to have the 'O' format in the struct module? Is it
    really necessary there? Can we get away with not implementing it?

    Numpy arrays, when containing Python objects, function as per the 'O' format.

    However, for the struct module, I don't see what would be the use case for the 'O' format.

    BTW, does this already exist in a released version of NumPy? If not,
    when is it likely to appear in the wild?

    It's included since the 1.5.0 release which came out last July.

    ***
    

    I think after the implementation is done, the PEP probably needs to be amended with clarifications (and possibly cutting out what is not really needed).

    @mdickinson
    Copy link
    Member

    Another possibility is to implement the 'O' format unsafely [...]

    Hmm. I don't much like that idea. Historically, it's supposed to be very difficult to segfault the Python interpreter with pure Python code (well except if you're using ctypes, I guess).

    @meadori
    Copy link
    Member

    meadori commented Jan 7, 2011

    Attached is the latest version of the struct string patch. I tested on OS X 10.6.5 (64-bit) and Ubuntu 10.04 (32-bit). I also scanned for memory problems with Valgrind. There is one test failing on 32-bit systems ('test_crasher'). This is due to the fact that 'struct.pack("357913941b", ...)' no longer tries to allocate 357913941 format codes. This implementation just allocates *one* code and assigns a count of 357913941, which is utilized later when packing/unpacking. Some work could be done to add better large memory consumption checks, though.

    Previous feedback has been incorporated:

    1. Multiplicities allowed on struct specifiers.
    2. Maximum alignment rule.
    3. Struct nesting depth limited (64 levels).
    4. The old behavior of only one byte order specified. However,
      the code is written in a way such that the scoped behavior
      would be easy to add.

    As before, there will surely be more iterations, but this is good enough for general review to see if things are headed in the right direction.

    This is a difficult one for review because the diffs are really large. I placed a review on Rietveld here: http://codereview.appspot.com/3863042/. If anyone has any ideas on how to reduce the number number of diffs (perhaps a way to do multiple smaller pataches), then that would be cool. I don't see an obvious way to do this at this point.

    @pitrou pitrou assigned mdickinson and unassigned meadori Jan 8, 2011
    @meadori
    Copy link
    Member

    meadori commented Mar 12, 2011

    Is there still any interest in this work?

    @mdickinson
    Copy link
    Member

    Yes, there's interest (at least here). I've just been really short on Python-time recently, so haven't found time to review your patch.

    @mdickinson
    Copy link
    Member

    I'm going to unassign for now; I still hope to look at this at some point, but can't see a time in the near future when it's going to happen.

    @mdickinson mdickinson removed their assignment Mar 12, 2011
    @meadori
    Copy link
    Member

    meadori commented Sep 5, 2011

    Is this work something that might be suitable for the features/pep-3118 repo (http://hg.python.org/features/pep-3118/) ?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 5, 2011

    Yes, definitely. I'm going to push a new memoryview implementation
    (complete for all 1D/native format cases) in a couple of days.

    Once that is done, perhaps we could create a memoryview-struct
    branch on top of that.

    @ncoghlan
    Copy link
    Contributor

    Following up here after rejecting bpo-15622 as invalid

    The "unicode" codes in PEP-3118 need to be seriously rethought before any related changes are made in the struct module.

    1. The 'c' and 's' codes are currently used for raw bytes data (represented as bytes objects at the Python layer). This means the 'c' code cannot be used as described in PEP-3118 in a world with strict binary/text separation.

    2. Any format codes for UCS1, UCS2 and UCS4 are more usefully modelled on 's' than they are on 'c' (so that repeat counts create longer strings rather than lists of strings that each contain a single code point)

    3. Given some of the other proposals in PEP-3118, it seems more useful to define an embedded text format as "S{<encoding>}".

    UCS1 would then be "S{latin-1}", UCS2 would be approximated as "S{utf-16}" and UCS4 would be "S{utf-32}" and arbitrary encodings would also be supported. struct packing would implicitly encode from text to bytes while unpacking would implicitly decode bytes to text. As with 's' a length mismatch in the encoded form would mean an error.

    @paulehoffman
    Copy link
    Mannequin

    paulehoffman mannequin commented Apr 22, 2013

    Following up on http://mail.python.org/pipermail/python-ideas/2011-March/009656.html, I would like to request that struct also handle half-precision floats directly. It's a short change, and half-precision floats are becoming much more popular in applications.

    Adding this to struct would also maybe need to change math.isinf and math.isnan, but maybe not.

    @mdickinson
    Copy link
    Member

    Paul: there's already an open issue for adding float16 to the struct module: see bpo-11734.

    @paulehoffman
    Copy link
    Mannequin

    paulehoffman mannequin commented Apr 22, 2013

    Whoops, never mind. Thanks for the pointer to 11734.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 13, 2016

    Here's a grammar that roughly describes the subset that NumPy supports.

    As for implementing this in the struct module: There is a new data
    description language on the horizon:

    http://datashape.readthedocs.org/en/latest/

    It does not have all the low-level capabilities (e.g changing alignment
    on the fly), but it is far more readable. Example:

    PEP-3118: "(2,3)10f0fZdT{10B:x:(2,3)d:y:Q:z:}B"
    Datashape: "2 * 3 * (10 * float32, 0 * float32, complex128, {x: 10 * uint8, y: 2 * 3 * float64, z: int64}, uint8)"

    There are a lot of open questions still. Should "10f" be viewed as an
    array[10] of float, i.e. equivalent to (10)f?

    In the context of PEP-3118, I think so.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @JelleZijlstra
    Copy link
    Member

    It's been more than ten years now and most of the additions to struct proposed by PEP-3118 (https://peps.python.org/pep-3118/#additions-to-the-struct-string-syntax) have not been implemented. The ? and c codes have been added, though.

    At this point, I would propose to close the issue. If there is interest in adding new codes, that should be discussed in a new feature with a more specific motivation. It doesn't make sense to implement them based on a PEP from more than a decade ago.

    @pitrou
    Copy link
    Member

    pitrou commented May 24, 2023

    I'll note that Numpy does seem to support at least part of these additions:

    >>> dt = np.dtype([('x', np.float64), ('y', np.float64)])
    >>> a = np.array([(1, 2)], dtype=dt)
    >>> m = memoryview(a)
    >>> m.format
    'T{d:x:d:y:}'

    @pitrou
    Copy link
    Member

    pitrou commented May 24, 2023

    If there is interest in adding new codes, that should be discussed in a new feature with a more specific motivation. It doesn't make sense to implement them based on a PEP from more than a decade ago.

    It would make sense to support at least what Numpy supports.

    @encukou
    Copy link
    Member

    encukou commented Jan 22, 2025

    Also note that ctypes supports ?, g, O (...), T{...}, and :name: but chose:

    • c for char, not ucs-1 string
    • u for wchar_t, not ucs-2 string
    • C, E, F (not Zd, Zf, Zg) for complex types (cc @skirpichev -- just FYI)
    >>> import ctypes
    >>> class C(ctypes.Structure):
    ...     _fields_ = [(f'f{num}', type) for num, type in enumerate([
    ...         ctypes.c_bool,
    ...         ctypes.c_longdouble,
    ...         ctypes.py_object,
    ...         ctypes.c_double_complex,
    ...         ctypes.c_wchar * 2,
    ...     ])]
    ...     
    >>> memoryview(C()).format
    'T{<?:f0:15x<g:f1:<O:f2:<C:f3:(2)<u:f4:}'

    There are formats in the wild that struct can't parse. Making struct parse them in an incompatible way could lead to data corruption.

    IMO, at this point, standardizing needs a new PEP.

    @skirpichev
    Copy link
    Member

    C, E, F (not Zd, Zf, Zg) for complex types

    That was discussed, see e.g. here. Unfortunately, the fielddesc struct seems to be supporting only single-character names.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants