Skip to content

BUG: Validate All Py_* Functions in C Extensions check for NULL #49756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
WillAyd opened this issue Nov 17, 2022 · 4 comments
Open

BUG: Validate All Py_* Functions in C Extensions check for NULL #49756

WillAyd opened this issue Nov 17, 2022 · 4 comments
Labels

Comments

@WillAyd
Copy link
Member

WillAyd commented Nov 17, 2022

@jbrockmendel pointed out in #49034 that we don't consistently validate the return of Py_* functions in the C API. The net effect of this is that our extensions are much harder to refactor, as our lack of checks allows at worst for undefined behavior and at best for segfaults.

If someone were to audit the extensions we could easily add in error handling where missing and shore up these extensions. See also https://docs.python.org/3/c-api/exceptions.html#

@WillAyd WillAyd added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 17, 2022
@WillAyd WillAyd added Clean and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 17, 2022
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jan 25, 2023

Does it have to check NULL, or are other checks allowed?

From a "quick and dirty" check, here's a couple I found:

char *result = PyObject_Malloc(*len);
// Check to see if PyDateTime has a timezone.
// Don't convert to UTC if it doesn't.
int is_tz_aware = 0;
if (PyObject_HasAttrString(obj, "tzinfo")) {
PyObject *offset = extract_utc_offset(obj);
if (offset == NULL) {
PyObject_Free(result);
return NULL;
}
is_tz_aware = offset != Py_None;
Py_DECREF(offset);
}
ret = make_iso_8601_datetime(&dts, result, *len, is_tz_aware, base);
if (ret != 0) {
PyErr_SetString(PyExc_ValueError,
"Could not convert datetime value to string");
PyObject_Free(result);
return NULL;
}
// Note that get_datetime_iso_8601_strlen just gives a generic size
// for ISO string conversion, not the actual size used
*len = strlen(result);
return result;

func = PyObject_GetAttrString(src->obj, "read");

npyarr->shape.ptr = PyObject_Malloc(sizeof(npy_intp) * NPY_MAXDIMS);

type = PyObject_Type(value);
if (!PyArray_DescrConverter(type, &dtype)) {
Py_DECREF(type);
goto fail;
}

PyObject *tz = PyObject_GetAttrString(obj, "tz");
if (tz != Py_None) {
// Go through object array if we have dt64tz, since tz info will
// be lost if values is used directly.
Py_DECREF(tz);
values = PyObject_CallMethod(obj, "__array__", NULL);
return values;
}

but I don't know which are allowed and which not. tz != Py_None looks fine, for example? But in the second example, I don't see any validation of func (or does Py_XDECREF(func) count?)

@WillAyd
Copy link
Member Author

WillAyd commented Jan 25, 2023

Those are all good examples of incorrect code. Yea they should all be checking for NULL

@WillAyd
Copy link
Member Author

WillAyd commented Jan 25, 2023

At least all PyObject* funcs. Other Py* funcs May not return a pointer so could have different requirements (ex: -1 for integral return values)

@jbrockmendel
Copy link
Member

ive asked enough times to know this a pipe dream, but this logic would be so much more maintainable if it lived in cython

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants