-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: using dtype='int64' argument of Series causes ValueError: values cannot be losslessly cast to int64 for integer strings #45017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7952825
5d0362e
01af816
730d5a9
204c3c9
814e5ff
a76a34a
8603236
40b3872
13ca6df
c4840da
55410de
eed91cb
ae6ea47
1247f0e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1811,8 +1811,17 @@ def maybe_cast_to_integer_array( | |
# doesn't handle `uint64` correctly. | ||
arr = np.asarray(arr) | ||
|
||
if is_unsigned_integer_dtype(dtype) and (arr < 0).any(): | ||
raise OverflowError("Trying to coerce negative values to unsigned integers") | ||
if is_unsigned_integer_dtype(dtype): | ||
try: | ||
if (arr < 0).any(): | ||
raise OverflowError( | ||
"Trying to coerce negative values to unsigned integers" | ||
) | ||
except TypeError: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a comment here about what cases get here |
||
if (casted < 0).any(): | ||
raise OverflowError( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where is this hit in tests? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i dont understand this. if i pass |
||
"Trying to coerce negative values to unsigned integers" | ||
) | ||
|
||
if is_float_dtype(arr.dtype): | ||
if not np.isfinite(arr).all(): | ||
|
@@ -1823,7 +1832,7 @@ def maybe_cast_to_integer_array( | |
if is_object_dtype(arr.dtype): | ||
raise ValueError("Trying to coerce float values to integers") | ||
|
||
if casted.dtype < arr.dtype: | ||
if casted.dtype < arr.dtype or is_string_dtype(arr.dtype): | ||
# GH#41734 e.g. [1, 200, 923442] and dtype="int8" -> overflows | ||
warnings.warn( | ||
f"Values are too large to be losslessly cast to {dtype}. " | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1895,6 +1895,20 @@ def test_constructor_bool_dtype_missing_values(self): | |
expected = Series(True, index=[0], dtype="bool") | ||
tm.assert_series_equal(result, expected) | ||
|
||
def test_constructor_int64_dtype(self, any_int_dtype): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, pls just use the fixture itself, e.g. no parameterize There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is causing Assertion Error. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The previous code segment is leading to this issue, if we have only int64 there is no issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you need to match the expected value as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback I think I have covered everything? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @shubham11941140 you are not using the fixtures pls do so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just remove the paramterize completely There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. uint -> uint8, uint16, uint32, uint64 are failing due to internal code implementation. Do i fix this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback removed parametrization, now it should be ready. |
||
# GH-44923 | ||
result = Series(["0", "1", "2"], dtype=any_int_dtype) | ||
expected = Series([0, 1, 2], dtype=any_int_dtype) | ||
tm.assert_series_equal(result, expected) | ||
|
||
def test_constructor_float64_dtype(self, any_float_dtype): | ||
# GH-44923 | ||
if any_float_dtype in ["Float32", "Float64"]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there an issue for this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Float -> Float32, Float64 are failing as it is unable to implicitly cast strings. As the implicit cast fails so I am xfailing them,
|
||
pytest.xfail(reason="Cannot be casted to FloatDtype Series") | ||
result = Series(["-1", "0", "1", "2"], dtype=any_float_dtype) | ||
expected = Series([-1.0, 0.0, 1.0, 2.0], dtype=any_float_dtype) | ||
tm.assert_series_equal(result, expected) | ||
|
||
@pytest.mark.filterwarnings( | ||
"ignore:elementwise comparison failed:DeprecationWarning" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this is a try except
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input in arr is ["0", "1", "2"], this is causing a TypeError in the check
(arr < 0).any()
but it can be casted to uint, so I am checking(casted < 0).any()
which it hits and casts correctlyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sense
why does this check matter? is the casted valid?
where is the test that checks the overflow?
strive for minimal code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result = Series(["0", "1", "2"], dtype=uint8)
, this should give a valid cast and casted is valid, but(arr < 0).any()
is giving TypeError.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that string arrays cannot be compared with integers, numpy does not support it, thus it is leading to a Type Error.