Skip to content

Support more integer types in PyMemberDef #117031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
serhiy-storchaka opened this issue Mar 19, 2024 · 19 comments
Open

Support more integer types in PyMemberDef #117031

serhiy-storchaka opened this issue Mar 19, 2024 · 19 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Mar 19, 2024

I plan to add more wrappers for Posix structures. Many of them use types not supported in PyMemberDef, like uint32_t or pid_t. So the first step is to add support of many standard C and Posix integer types.

gh-114388 and gh-115011 were preparations to this step.

The open question: should we assign new codes for new types or define them as aliases to existing types if they are already exist (for example, Py_T_PID can be an alias of Py_T_INT, Py_T_LONG or Py_T_LONGLONG, depending on platform).

Linked PRs

@serhiy-storchaka serhiy-storchaka added topic-C-API 3.13 bugs and security fixes labels Mar 19, 2024
@serhiy-storchaka
Copy link
Member Author

cc @vstinner, @encukou

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Mar 19, 2024
Add support for standard C and Posix integer types like Py_T_UINT32,
Py_T_PTRDIFF, Py_T_OFF and Py_T_PID.
Add Py_T_SSIZE as alias of Py_T_PYSSIZET.
@encukou
Copy link
Member

encukou commented Mar 20, 2024

Do we need this in 3.13?
I would much rather add support for fields of custom size (and perhaps alignment/signedness/endianness), than special-case only the C/Posix types.

@vstinner
Copy link
Member

I like the idea of adding stdint.h types: int8/16/32/64_t, uint8/16/32/64_t, size_t and ssize_t. I'm also fine with intptr_t and uintptr_t since it's the recommended way to store an integer as a pointer in C.

I'm less sure about 128-bit flavor: is it supposed by all platforms supposed by Python? Currently, we don't use intmax_t. I would prefer to also leave this one aside for now.

Also, I'm not sure that we need aliases such as ptrdiff_t. I prefer to not add it yet.

For Unix types, "off_t" and pid_t", can we not add them as PyMemberDef types, but use the macro black magic comparing their SIZEOF to select the corresponding int or uint type with a size, such as int32_t for pid_t? If we add off_t, pid_t, what about uid_t and gid_t? and file descriptor? and socket handle? etc. System programming uses a tons of types.

In short, I prefer to start with a bare minimum set of types.

I would much rather add support for fields of custom size (and perhaps alignment/signedness/endianness),

I'm not aware of any type which requires a specific alignment. Usually, it "just" works :-)

Endianness: are you aware of an object which requires endianness conversion? Honestly, for "custom" use cases, just use getter and setter functions, no?

@serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label Mar 27, 2024
@serhiy-storchaka
Copy link
Member Author

I need this to create wrappers of C structures. Several issues wait for this feature.

How do you specify the custom size/alignment/signedness/endianness? There is no place for this in PyMemberDef.

struct PyMemberDef {
    const char *name;
    int type;
    Py_ssize_t offset;
    int flags;
    const char *doc;
};

You can only encode this as a part of type or flags.

@serhiy-storchaka
Copy link
Member Author

Support of off_t and pid_t is needed for F_SETLK, F_SETLKW, and F_GETLK.

           struct flock {
               ...
               short l_type;    /* Type of lock: F_RDLCK,
                                   F_WRLCK, F_UNLCK */
               short l_whence;  /* How to interpret l_start:
                                   SEEK_SET, SEEK_CUR, SEEK_END */
               off_t l_start;   /* Starting offset for lock */
               off_t l_len;     /* Number of bytes to lock */
               pid_t l_pid;     /* PID of process blocking our lock
                                   (set by F_GETLK and F_OFD_GETLK) */
               ...
           };

(on different platforms the types and the order of fields can be different).

Support of pid_t is needed for F_GETOWN_EX and F_SETOWN_EX.

                  struct f_owner_ex {
                      int   type;
                      pid_t pid;
                  };

@vstinner
Copy link
Member

For me, it would be easier to review if your PR only add intX_t/uintX_t types. Later, we can discuss other types.


I understand the need for Unix types such as pid_t. My question is more if we can implement them as aliases to intX_t or uintX_t:

#if SIZEOF_PID_T == 8
#  define Py_T_PID Py_T_INT64
#elif SIZEOF_PID_T == 4
#  define Py_T_PID Py_T_INT32
// ... add more cases if needed ...
#else
#  error "unsupported pid_t type size"
#endif

The problem of this approach is that configure doesn't check if types are signed or not.

I see the that your implementation does it partially:

#ifdef MS_WINDOWS
# define Py_T_OFF      Py_T_LONGLONG
#else
# define Py_T_OFF      35
#endif
#define Py_T_PID 36

and:

#ifndef MS_WINDOWS
    case Py_T_OFF:
        v = _PyLong_FromByteArray((const unsigned char *)addr,
                                  sizeof(off_t), PY_LITTLE_ENDIAN, 1);
        break;
#endif
    case Py_T_PID:
        v = PyLong_FromPid(*(pid_t*)addr);
        break;

@encukou
Copy link
Member

encukou commented Mar 27, 2024

Let's go with size and signedness only, then. (Ignore alignment/endian, I was too fast to type that.)

Let's say that if ((type) & 0xFF00) >> 8 is non-zero, it gives the size of the type, and type & 1 is signedness. The other bits need to be 0. To define the members, we can add a macro like Py_T_INTEGER, so that Py_T_INTEGER(int32_t) gives you 0x401 and Py_T_INTEGER(uint32_t) gives 0x400.
Then anyone can add their favourite integer typedef, we're not limited to C/Posix ones.

@serhiy-storchaka
Copy link
Member Author

I was thinking about a similar idea when assigned values to new code types. It would be nice to encode signedness in the lowest bit, but unfortunately the existing code types do not follow any system.

As for the macro, how do you determine the signedness of the C type?

The larger issue is that there are more than two options for signedness. For example, -1 is a special value for uid_t and gid_t, but on some platforms these types are unsigned, and support values larger that the maximal value of the signed type of the same size. So _PyLong_FromUid and _Py_Uid_Converter are special, they cannot be made an alias of the converter for the standard C integer type. And we may need special converters for other Posix types, like dev_t (#89928, #113078).

Note also that the existing code types were not consistent with this. Only recently it was partially fixed, but there is more work, and it will take several releases to finish:

This is why I started with adding type codes for concrete C types. It is easier to specialize code for every concrete type. I am open to the idea of making some of new type codes aliases to other type codes if they do not need special converters yet, but we should decide what they should use -- standard integer C types (short, int, long, etc) or new fixed-size integer C types (intXX_t).

@encukou
Copy link
Member

encukou commented Mar 28, 2024

the existing code types do not follow any system.

That's fine. There'll always be exceptions, like PyObject* needing refcounting.
My proposed scheme allows exceptions when ((type) & 0xFF00) is zero. So all the existing codes are fine, and there's space for a couple hundred more.

how do you determine the signedness of the C type?

(((type) -1) < 0)? Perhaps there's some magic needed to avoid compiler warnings.

The larger issue is that there are more than two options for signedness. For example, -1 is a special value for uid_t and gid_t

More generally, something the C type system doesn't capture is whether negative Python integers can be stored in unsigned fields. Sadly, the current member types don't care about this property. Sometimes that's an error, but sometimes it's useful to allow this -- @zooba likes to give Windows error codes as examples here.
I see you've added a warning about it for Py_T_UINT; I wish that got some more review.

My proposed scheme has 7 bits left for details like this.


Anyway: IMO, at the very least we should do a systematic approach (with the macro) for the intNN_t and uintNN_t types -- those should all behave the same, and should match Py T_INT/Py_T_UINT in handling signedness.

@vstinner
Copy link
Member

Let's say that if ((type) & 0xFF00) >> 8 is non-zero, it gives the size of the type, and type & 1 is signedness.

What's the use case for that? Existing types don't implement it. I'm not convinced that it's useless.

Note: use _Py_IS_TYPE_SIGNED() to check if a type is signed or not, but it doesn't work in the preprocessor (#if), only in C.

@encukou
Copy link
Member

encukou commented Mar 28, 2024

What's the use case for that?

It allows the second part of the proposal: a Py_T_INTEGER(type) macro that anyone can use to support any integral type, not just the C/POSIX ones.

@vstinner
Copy link
Member

It allows the second part of the proposal: a Py_T_INTEGER(type) macro that anyone can use to support any integral type, not just the C/POSIX ones.

You cannot write a macro which checks the signedness of a type. I tested, the preprocessor cannot test it.

Do you have a working implementation (or just a proof-of-concept)?

@encukou
Copy link
Member

encukou commented Mar 28, 2024

You don't need to test this in the preprocessor. The macro only needs to produce a value to put in the memberdef.

#include <stdio.h>
#include <stdint.h>

#define Py_T_INTEGER(type) ((sizeof(type) << 8) | ((type)-1 < 0 ? 0 : 1))

int main() {
    printf("%04x\n", Py_T_INTEGER(int8_t));
    printf("%04x\n", Py_T_INTEGER(uint8_t));
    printf("%04x\n", Py_T_INTEGER(int16_t));
    printf("%04x\n", Py_T_INTEGER(uint16_t));
    printf("%04x\n", Py_T_INTEGER(int32_t));
    printf("%04x\n", Py_T_INTEGER(uint32_t));
    printf("%04x\n", Py_T_INTEGER(int64_t));
    printf("%04x\n", Py_T_INTEGER(uint64_t));
}

@zooba
Copy link
Member

zooba commented Mar 28, 2024

A set of bits that can map into a PyLong_AsNativeBytes call would make sense.

The "sign" bit needs to be two bits to handle the cases:

  • allow/fail negative input values
  • allow/fail positive input values with MSB set

(I need to poke on #116053 to figure out how best to handle the extra argument for the second one of these in AsNativeBytes.)

An unsigned pid_t that allows assigning -1 would be "allow" for both, which is the case that isn't covered by the basic signed/unsigned definitions.

I like Petr's idea of taking the second byte for size and to indicate that the lower byte is flags.

@erlend-aasland erlend-aasland added 3.14 bugs and security fixes and removed 3.13 bugs and security fixes labels Jul 31, 2024
@picnixz picnixz removed the 3.14 bugs and security fixes label Dec 15, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Apr 15, 2025
…erDef.type

Py_T_INTEGER(type) represents arbitrary C integer type.
@serhiy-storchaka
Copy link
Member Author

#117031 is an alternative PR which implements @encukou's idea for Py_T_INTEGER().

@serhiy-storchaka
Copy link
Member Author

The "sign" bit needs to be two bits to handle the cases:

Only three of these cases make sense:

  • signed integers
  • unsigned integers
  • compatible mode: input -- signed and unsigned integers, output -- signed integers. It is not safe -- different Python integers are converted to the same C integers, and you can not get the same value back.

We may need other modes. For example, integers in range -1 to UINT_MAX-1 -- -1 is only negative integer, all other integers with MSB set are unsigned. We may use None for -1. But this is for future.

Question: how would it work with Py_INTEGER()? It can only detect the size and the signess of the type. Do we need other macros?

Should we also define Py_INT32 as alias of Py_INTEGER(int32_t)?

@encukou
Copy link
Member

encukou commented Apr 17, 2025

Only three of these cases make sense

Yes -- but, by encoding them using PyLong_AsNativeBytes flags, we'd only need to document one way of encoding these details.

@zooba, how do you feel about adding a "REJECT_OVERFLOW" flag for this use case? That would allow the same set of flags to be set for declarative "fields" (here, and later for ctypes fields).
(You might remember I pushed for it for PyLong_AsNativeBytes, but dropped it because the functional API doesn't need it and we can add in later. This is the time/use case, IMO.)

We may need other modes. ... But this is for future.

At some point, the user should switch to getters & setters.

Question: how would it work with Py_T_INTEGER()? It can only detect the size and the signess of the type. Do we need other macros?

I don't think we need other macros now.

Should we also define Py_T_INT32 as alias of Py_T_INTEGER(int32_t)?

No, I don't think it's worth saving a dozen characters here.

@serhiy-storchaka
Copy link
Member Author

Using PyLong_AsNativeBytes flags will limit their number to 8, as we only have 8 bits here.

Also, I think that it would be nice idea to encode hybrid mode as Py_T_INTEGER(int32_t)|Py_T_INTEGER(uint32_t) (or Py_T_INT32|Py_T_UINT32 in original implementation). This does not work with PyLong_AsNativeBytes flags,
because unsigned integer needs Py_ASNATIVEBYTES_UNSIGNED_BUFFER|Py_ASNATIVEBYTES_REJECT_NEGATIVE, signed integer needs 0, and hybrid mode needs Py_ASNATIVEBYTES_UNSIGNED_BUFFER. We need Py_ASNATIVEBYTES_ACCEPT_NEGATIVE instead of Py_ASNATIVEBYTES_REJECT_NEGATIVE to make this idea working.

At some point, the user should switch to getters & setters.

It makes a great lap in complexity. I need this feature to implement a simple PyStructSequence-like API, without writing boilerplate code for getters and setters.

@zooba
Copy link
Member

zooba commented Apr 17, 2025

Using PyLong_AsNativeBytes flags will limit their number to 8, as we only have 8 bits here.

Yeah, this is probably the reason we should have independent sets of flags (a macro to do the conversion should be possible). Though if AsNativeBytes starts growing beyond 8 options then it's probably trying to do too many things, so I'm not entirely opposed to an artificial limit, but we probably shouldn't rely on them being equivalent.

how do you feel about adding a "REJECT_OVERFLOW" flag for this use case?

Shouldn't we always be rejecting overflows for struct members? They're being assigned from Python code so that they can later be retrieved by Python (or C) code, and so loss of information is a poor choice. Whereas directly calling PyLong_AsNativeBytes to get into a C value without checking for overflow is meant for when you're going to only use it in C (like doing (uint32_t)PyLong_AsLongLong(v) because you know what you need).

Anyway, I dislike adding the flag :-) Just like I disliked some of the other ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants