Skip to content

BUG: to_json segfaults when exception occurs in UTF8 encoding of string #50324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -875,7 +875,7 @@ I/O
- Bug in :func:`DataFrame.to_string` with ``header=False`` that printed the index name on the same line as the first row of the data (:issue:`49230`)
- Fixed memory leak which stemmed from the initialization of the internal JSON module (:issue:`49222`)
- Fixed issue where :func:`json_normalize` would incorrectly remove leading characters from column names that matched the ``sep`` argument (:issue:`49861`)
-
- Bug in :meth:`DataFrame.to_json` where it would segfault when failing to encode a string (:issue:`50307`)

Period
^^^^^^
Expand Down
13 changes: 11 additions & 2 deletions pandas/_libs/src/ujson/python/objToJSON.c
Original file line number Diff line number Diff line change
Expand Up @@ -332,9 +332,18 @@ static char *PyBytesToUTF8(JSOBJ _obj, JSONTypeContext *Py_UNUSED(tc),
return PyBytes_AS_STRING(obj);
}

static char *PyUnicodeToUTF8(JSOBJ _obj, JSONTypeContext *Py_UNUSED(tc),
static char *PyUnicodeToUTF8(JSOBJ _obj, JSONTypeContext *tc,
size_t *_outLen) {
return (char *)PyUnicode_AsUTF8AndSize(_obj, (Py_ssize_t *)_outLen);
char *encoded = (char *)PyUnicode_AsUTF8AndSize(_obj,
(Py_ssize_t *)_outLen);
if (encoded == NULL) {
/* Something went wrong.
Set errorMsg(to tell encoder to stop),
and let Python exception propagate. */
JSONObjectEncoder *enc = (JSONObjectEncoder *)tc->encoder;
enc->errorMsg = "Encoding failed.";
}
return encoded;
}

/* JSON callback. returns a char* and mutates the pointer to *len */
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/io/json/test_ujson.py
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,15 @@ def test_encode_unicode_4bytes_utf8highest(self):
assert enc == json.dumps(four_bytes_input)
assert dec == json.loads(enc)

def test_encode_unicode_error(self):
string = "'\udac0'"
msg = (
r"'utf-8' codec can't encode character '\\udac0' "
r"in position 1: surrogates not allowed"
)
with pytest.raises(UnicodeEncodeError, match=msg):
ujson.dumps([string])

def test_encode_array_in_array(self):
arr_in_arr_input = [[[[]]]]
output = ujson.encode(arr_in_arr_input)
Expand Down