-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Documentation request: Unicode conversions page #591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The quick version is that pybind11 loads and casts between |
Here are some more specific questions whose answers I think should be documented:
|
I agree that it would be good to have this documented. Based on my reading of the code (the
|
By code inspection it looks like Python will raise a However the current behavior from pybind11 2.0.1 (arguably a bug) is to return this kind of error:
It's possibly a bug because it suppresses information that could be used to solve the problem. My test function was:
|
It would also be useful to document what pybind11 does with single character literals and wchar_t in each direction. |
I'm not sure if it should just report a better error, or actually return a |
My thinking is that there should be a 1:1 correspondence between From Python, you almost never want a function that sometimes returns print("I talked to C++ and it said: " + wrapped_cpp_sometimes_returns_bytes()) Ideally the error would be the underlying UnicodeDecodeError rather than that TypeError, because the former gives the byte offset and offending character sequence. Round-tripping bytes (possibly containing NULs) could be done with passing and returning |
PR #624 addresses the error being propagated back to Python. I didn't address the documentation (except to add u16/u32 types to the table). |
Well thanks for this, I think the picture is a lot clearer. I do think pybind11 core devs may want to evaluate whether implicit |
I agree that it would be nice at least to mark a function as disallowing an implicit bytes ->(utf8)-> std::string. (Here "bytes" and "str" have their Py3 meanings.) An example case would be pathnames: if python passes in a str, we want to encode it using the filesystem encoding (not necessarily utf-8), if python passes in a bytes, we should assume os.fsencode() has already been called on it and just pass it accordingly. If pybind11 always does the case, I believe we can't distinguish between the two cases (other than taking a py::object as argument and typechecking ourselves). |
You can mark a function as such by accepting py::bytes as the argument.
Then you can implement any conversion in the lambda before dispatching to
the C++ codebase.
I was thinking the best thing for pathnames would be a special py::pathname
type that takes care of all the cases in a version independent way. (Also
handling os.PathLike and pathlib paths.)
|
Ah, great, thanks. |
I suppose |
I think it would be helpful to have a new section under Type conversions that describes how pybind11 deals with Unicode conversions in Python 2.7 and 3. (I can't find this documented anywhere.)
The text was updated successfully, but these errors were encountered: