Skip to content

BUG: Fix unintialized strlen when PyUnicode_AsUTF8AndSize fails #50387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 22, 2022

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Dec 21, 2022

Hopefully fixes 32bit builds.

@WillAyd
Copy link
Member

WillAyd commented Dec 21, 2022

Very nice find. Makes sense as far as the issue goes, but ideally we'd work with the Python C API pattern and leave as is. Do you think it's just a limitation of our ujson that prevents that? Wonder if there's an upstream fix there we should look at

@lithomas1 lithomas1 marked this pull request as ready for review December 21, 2022 23:30
@lithomas1 lithomas1 added Bug IO JSON read_json, to_json, json_normalize labels Dec 21, 2022
@lithomas1
Copy link
Member Author

Very nice find. Makes sense as far as the issue goes, but ideally we'd work with the Python C API pattern and leave as is. Do you think it's just a limitation of our ujson that prevents that? Wonder if there's an upstream fix there we should look at

We are allocating a buffer for the string, before checking that getStringValue succeeded.

https://github.com/lithomas1/pandas/blob/c279d921b2ecac840943c0781f813ad88c5da942/pandas/_libs/src/ujson/lib/ultrajsonenc.c#L1082-L1087

That's probably a good thing to follow up on.

ujson does this correctly, though. See
https://github.com/ultrajson/ultrajson/blob/1876c02e0fd765895670d04591422202950a4358/lib/ultrajsonenc.c#L882-L889

@lithomas1
Copy link
Member Author

I'll also look into setting up valgrind/asan for our C code to catch stuff like this in the future(also nice for preventing memleaks/security headaches down the road).

Thanks everyone for helping me to debug this!

@WillAyd
Copy link
Member

WillAyd commented Dec 21, 2022

ujson does this correctly, though. See https://github.com/ultrajson/ultrajson/blob/1876c02e0fd765895670d04591422202950a4358/lib/ultrajsonenc.c#L882-L889

Nice find. I would rather vendor this - or is that causing issues?

@lithomas1
Copy link
Member Author

ujson does this correctly, though. See https://github.com/ultrajson/ultrajson/blob/1876c02e0fd765895670d04591422202950a4358/lib/ultrajsonenc.c#L882-L889

Nice find. I would rather vendor this - or is that causing issues?

It should be fine, I think. I'd appreciate more scrutiny over this though (worried about double free/memleaks with the buffer here).

There are also more usages elsewhere to change(which is probably a nice followup).

@WillAyd
Copy link
Member

WillAyd commented Dec 22, 2022

Might be worth trying -fanalyzer with gcc

https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html

@lithomas1
Copy link
Member Author

Might be worth trying -fanalyzer with gcc

https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html

No warnings from the json module at least.
Note: The C parser is throwing some warnings, though, and compiling the c++ extension modules crashes gcc(maybe it's oom-ing on 32bit).

I've also left in the initialization of the string length just to be safe though.

@WillAyd
Copy link
Member

WillAyd commented Dec 22, 2022

We should remove the 0 string length - at the end of the day we have NULL so the length is undefined. Want to stick with CPython standards too

For the gcc crash, is it gcc that is crashing or g++? I'm not sure the history of why that one module in particular uses c++ - could try simplifying and sticking with c consistently

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm when green

@lithomas1 lithomas1 merged commit 1ce4455 into pandas-dev:main Dec 22, 2022
@lithomas1 lithomas1 deleted the fix-32bit branch December 22, 2022 18:33
@mroeschke mroeschke added this to the 2.0 milestone Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants