-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: read_csv()
silently ignores out-of-range integers
#55232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm that the >>> data = StringIO("x\n-1\n257")
>>> df = pd.read_csv(data, dtype={"x": "UInt8"})
>>> df.x
0 255
1 1
Name: x, dtype: UInt8 |
Hi @paulreece - thanks for your quick reply. Your output is consistent with mine (sorry if I was unclear). The |
I didn't see any notes on this in the whatsnew for 2.0.0 nor 2.1.0. A git bisect should be run to determine where this behavior changed. |
So I think the issue is that pandas/pandas/core/arrays/numeric.py Line 151 in 51f3d03
The old code called e.g. this
|
i think we would need something like #45588 this but AFAIK there hasn't been any movement on that recently |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The
read_csv()
function no longer raises an exception when it encounters an out-of-range integer. Instead, integer overflow silently exhibits a wraparound behavior.Expected Behavior
On pandas 1.5.3,
pd.read_csv()
raises a "cannot cast" exception, which is similar to how this scenario is handled by thepd.Series()
constructor. I expect pandas 2.1.1 to continue this behavior.Installed Versions
python : 3.11.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
pandas : 2.1.1
numpy : 1.26.0
The text was updated successfully, but these errors were encountered: