Skip to content

std::variant of integral and bool populates wrong alternative #1625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
johnshaw opened this issue Nov 27, 2018 · 15 comments
Closed

std::variant of integral and bool populates wrong alternative #1625

johnshaw opened this issue Nov 27, 2018 · 15 comments

Comments

@johnshaw
Copy link

Populating a variant containing an integral type and a bool from a Python bool may select the wrong alternative, depending on the ordering of alternatives.

Consider the function void foo(std::variant<int, bool> v). When called from Python with the expression foo(True), the value v in C++ will end up with the int alternative populated with a value of 1. If the order of the alternatives are swapped such that bool occurs before int, the boolean alternative will be populated, as expected.

The reason this occurs is that the type_caster for integral types is defined for all types satisfying is_arithmetic, which bool satisfies, combined with the fact that loading a bool into an integer type is not considered a "conversion" in the context of the load method, so it is selected on the first pass.

I think this should be improved so that bool doesn't match the arithmetic type caster on the first pass.

@johnshaw
Copy link
Author

Turns out the arithmetic conversion is even more permissive than this. It will consider any conversion to the arithmetic type as valid. For example, my code has a custom decimal class in c++ with a type_caster defined for conversion to and from decimal.Decimal. Assigning a Python Decimal value into a std::variant<long, MyDecimalType> will result in assignment to the long alternative, truncating the fractional part. I would not expect this conversion to be accepted on the first pass, especially since it's lossy.

@garry-jeromson
Copy link

Also came across this issue - the variant population also doesn't appear to be completely deterministic. When calling a function with a std::variant<bool, int32_t, int64_t, uint32_t, uint64_t> as a parameter and passing in an int from the Python side, the selected alternative is sometimes a bool, and sometimes some flavour of integer - it appears to change randomly.

@tarcisiofischer
Copy link
Contributor

I can confirm that this is still happening, but doesn't seem random.
What I did to "fix" this issue is to change the order of the std::variant, and set the bool type to be the first (I was assuming that this list's order was being used to try the type conversion). - It worked (And I'm amazed).

So, if someone is trying to do this, my workaround suggestion is this: Change the order of the std::variant and set bool, before ints.

Example: Instead of

std::map<std::string, std::variant<
        int,
        bool>> value_map;

Do this:

std::map<std::string, std::variant<
        bool,
        int>> value_map;

@m4ce
Copy link

m4ce commented Jul 26, 2019

Just bumped into this - is this going to be fixed any time soon?

@wjakob
Copy link
Member

wjakob commented Jul 26, 2019

@m4ce: I don't currently have the time to track this down, so one of you will have to do so and create a PR with the fix & a testcase. Then it will be fixed.

@YannickJadoul
Copy link
Collaborator

I don't think there's a lot we can do about this. The same overload resolution happens for e.g.:

void bar(int v) {
        std::cout << "int" << std::endl;
}

void bar(bool v) {
        std::cout << "bool" << std::endl;
}

PYBIND11_MODULE(example, m)
{
        m.def("bar", py::overload_cast<int>(&bar))
         .def("bar", py::overload_cast<bool>(&bar));
}

So std::variant is consistent, there.

The problem is the following Python behavior:

>>> issubclass(bool, int)
True
>>> bool.__bases__
(<class 'int'>,)
>>> isinstance(True, int)
True

I.e., a Python bool is a subclass of int. The only solution I see is to explicitly check for bool in the type_caster for int and consider this to be a conversion. The main question here is: should pybind11 follow C++ or Python in this matter?

@bstaletic
Copy link
Collaborator

The main question here is: should pybind11 follow C++ or Python in this matter?

Note that C++ can get "confused" when it comes to variants that may be holding bools. The following paper was accepted for C++20, implemented in gcc 10, supposedly in clang 9 (but brief testing says that's a lie), and never in msvc.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0608r3.html

However, C++ doesn't get confused about variant<bool, int>.

@YannickJadoul
Copy link
Collaborator

Note that C++ can get "confused" when it comes to variants that may be holding bools.

Interesting, but that's not the problem here, I believe, since it's reproduced with overloads as well. The main difference is that it's easier to reorder overloads than a (potentially internal and not changeable) C++ type.

@YannickJadoul
Copy link
Collaborator

No further reaction. I'm closing this for now. If there's something that pybind11 can do about it, please reopen.

@lahwaacz
Copy link

lahwaacz commented Nov 7, 2020

@YannickJadoul said the problem is that in Python, bool is a subclass of int. But why does this happen? When you have a function with std::variant<bool, int32_t, int64_t, uint32_t, uint64_t> as a parameter and pass an int from the Python side, I would expect that you always get int64_t on the C++ side. The documentation even says:

What this means in practice is that pybind11 will prefer any overload that does not require conversion of arguments to an overload that does, but otherwise prefers earlier-defined overloads to later-defined ones.

So if you have C++ functions bar overloaded for bool, int32_t, int64_t, uint32_t, uint64_t and call bar(42) from Python, it should always call the int64_t overload because it does not require conversion (since Python integers are always signed 64-bit).

In any case, I think the least you can do is describe the issue next to this table, which is currently oriented to passing values from C++ to Python, but the same conversion machinery applies also to passing values from Python to C++. Not only Python types don't map uniquely to C++ types, but pybind11's behaviour is not natural and even non-deterministic (as mentioned above) which is a real problem in my view.

@YannickJadoul
Copy link
Collaborator

(since Python integers are always signed 64-bit)

No they are not: https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex

Integers have unlimited precision.

So pybind11 checks, one by one, if the value fits into a bool, int32_t, int64_t, uint32_t, or finally uint64_t, and goes for the first one that works.

The underlying problem here really is that Python's int type doesn't exactly match C++'s range of integral types.

@lahwaacz
Copy link

lahwaacz commented Nov 7, 2020

Oh, I did not know that - thanks for the link. Is it mentioned somewhere in the documentation that the decision of the overload resolution depends on the value of the (Python) argument? If not, it would be nice to include it, e.g. in the Overload resolution order section. I think it is pretty important, especially for C++ programmers who are used to "type-only" conversions and not unlimited precision types.

Btw. the real problem where I noticed this issue was with integer types nested in std::vector. I have a function with the argument type std::variant< std::vector< std::int8_t >, std::vector< std::uint8_t >, std::vector< std::int16_t >, std::vector< std::uint16_t >, std::vector< std::int32_t >, std::vector< std::uint32_t >, std::vector< std::int64_t >, std::vector< std::uint64_t > > and in one case I was getting std::vector<std::uint8_t> instead of the expected std::vector<std::int64_t>. How does the overload resolution work for std::vector<T> where T is some integral type? Do you check the value of all items in the vector or only the first item or do you simply select the first overload? Are conversions allowed in this case or not?

Also, what happens for floating-point types? Unless I'm doing something wrong, if we have a C++ function bar overloaded for float and double, calling bar(0.3) from Python selects the overload for float. Due to finite precision arithmetic you can't check if a value "fits" into float without a precision loss - and also I think that Python's float type corresponds to double in C/C++. What is happening in this case?

@YannickJadoul
Copy link
Collaborator

Is it mentioned somewhere in the documentation that the decision of the overload resolution depends on the value of the (Python) argument? If not, it would be nice to include it, e.g. in the Overload resolution order section.

Well, it's definitely not specific to overload resolution. It's related to the fact that you can convert a Python 42 to e.g. uint8_t, but you can't convert 1024 to uint8_t. This kind of behavior of what's convertible and not convertible is just transferred to overloads (which are explicitly documented to be tried in order, until the first one that matches is selected!).
How else would things work? A Python int is documented to be convertible to uint8_t, etc (https://pybind11.readthedocs.io/en/stable/advanced/cast/overview.html#list-of-all-builtin-conversions), but of course pybind11 checks if the values actually fits.

How does the overload resolution work for std::vector<T> where T is some integral type? Do you check the value of all items in the vector or only the first item or do you simply select the first overload?

Yes, the first one. Easy to try out this one, if you want to be sure ;-)

Are conversions allowed in this case or not?

What kind of conversions?

Due to finite precision arithmetic you can't check if a value "fits" into float without a precision loss - and also I think that Python's float type corresponds to double in C/C++. What is happening in this case?

Yes, you're right. I believe the Python float (which is double-precision, indeed) to C/C++ float is provided for convenience. The solution should probably be: don't overload on float/double, because it does not make sense for a Python interface. (i.e., as Python does not distinguish between these types, there's absolutely no reason why you would expose foo(double) as well as foo(float) to the Python API, I think?)

@lahwaacz
Copy link

lahwaacz commented Nov 8, 2020

You are right about float/double, removing std::vector<float> and all the small integer types from the variant will be probably the best solution for me. Why I used the variant including all the integer types as well as float and double was because the type is defined in the C++ code and it is already used in the bindings for functions returning a variant. So I thought that the same type could be used for functions taking variant arguments, but apparently it does not work...

Please find a good place to describe the conversion of integers in the documentation, I think it is important to explain that for Python -> C++ it is more of a value conversion than a type conversion. If it does not fit, you can start a new section :)

Thanks for your quick replies here.

@YannickJadoul
Copy link
Collaborator

because the type is defined in the C++ code and it is already used in the bindings for functions returning a variant.

Yep, I see what you mean. But as you can see, it's a complex situation. Another option is changing the order of the members of that variant. If double comes before float, float won't be considered.

I don't immediately see when and how to add something to the cocs. If you get familiar with all aspects of pybind11 regarding int and float and overloading, before starting with the arguably more complicated std::variant, all of this makes sense, as I've explained. But PRs welcome, if you think you can find a place!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants