-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
pandas.ExcelFile ignore parse_dates=False #10001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you would have to show an example and |
In Excel (2013 Windows 7), I created a new Workbook. In Sheet1, I entered A in A1 and 10000000 in B1. I then formatted B1 to be a Short Date which displays the cell as #################. I saved the files as 'test.xlsx'. I then ran the following python code import pandas as pd
pd.show_versions()
xl_file = pd.ExcelFile('test.xlsx')
sheet = xl_file.parse('Sheet1',parse_dates=False) which gives me the following output
|
|
Hi, I am adding this patch here, in case it's useful for those who do not want to parse dates from excel file by setting parse_dates=False. Please review. I had trouble parsing the following excel file from Crunchbase Excel Export which had really old dates which gave OverflowError.
|
@kwantopia It'll be easier to review that if you put it up as a pull request. Then we can comment inline. What's the desired behavior here? In [5]: !cat foo.csv
date,val
1500-01-01,1
1600-01-02,2
1700-01-01,3
1800-01-01,4
1900-01-01,5
2000-01-01,6
In [1]: pd.read_csv('foo.csv', parse_dates='date')
Out[1]:
date val
0 1500-01-01 1
1 1600-01-02 2
2 1700-01-01 3
3 1800-01-01 4
4 1900-01-01 5
5 2000-01-01 6 |
actually maybe @jorisvandenbossche can comment here. IIRC the |
this is related to the issue in #11544 and looks to be a dupe of these (there is a somewhat convoluted chain as to what the original issues actually though). maybe someone can figure this chain out and we can create a master issue so its more clear. |
@TomAugspurger it's a problem in read_excel, but I guess I was also misunderstanding parse_dates field. I was assuming that parse_dates=True means parse the dates and parse_dates=False means do not parse the dates for pandas.read_excel |
Hello, sorry for writing here but it seems a quite common wish the ability of disabling date parsing from the function |
Hi guys, I've also run into this issue. One workaround for this (given that the name of columns that you don't want to convert or their column positions are fixed) is that you can provide dictionary of names/column numbers as keys and desired type as value. So, if you set "str" as a value, columns in dataframe will have "object" type and won't get parsed. |
I am trying to read an Excel file which someone else created and the wrongly formatted a column as "date" when it is not. It has a large integer in it, which triggers an error
OverflowError: normalized days too large to fit in a C int
But I have "parse_dates=False" so I thought pandas.ExcelFile would not try to parse the dates and return a string instead. Is this a bug?
The text was updated successfully, but these errors were encountered: