ENH: Raise ParserWarning when length of names does not match length of data #38587

phofl · 2020-12-19T21:24:03Z

closes pd.read_table: Using space as delimiter on file with trailing space gives cryptic error #21768
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

@gfyoung

Raising ParserWarning now. Could change to FutureWarning, if we would like to deprecate for 2.0

As long as we are only raising a ParserWarning I am inclined to raise for trailing commas too.

…f data

doc/source/whatsnew/v1.3.0.rst

pandas/_libs/parsers.pyx

pandas/tests/io/parser/test_common.py

jreback · 2020-12-21T23:49:25Z

can you merge master.

I think this might be too noisy (in the real world) to raise on a trailing command as this is a common thing to write for csv formats.

� Conflicts: � doc/source/whatsnew/v1.3.0.rst

phofl · 2020-12-23T23:48:57Z

Merged master, so we should avoid raising the warning when one set of trailing commas is given?

jreback · 2020-12-24T17:37:30Z

Merged master, so we should avoid raising the warning when one set of trailing commas is given?

yeah i think so

� Conflicts: � doc/source/whatsnew/v1.3.0.rst � pandas/tests/io/parser/test_common.py

phofl · 2021-01-03T19:59:58Z

This should do the trick. One set of trailing commas is allowed

jreback · 2021-01-03T20:08:12Z

pandas/io/parsers.py

+        """
+        if not self.index_col and len(columns) != len(data) and columns:
+            if len(columns) == len(data) - 1 and np.all(
+                (data[-1] == "") | isna(data[-1])


this seems a very specific condition. can you relax it?

Do we want to allow more than one set of trailing commas? In this case we can remove the len check.
The array representing the last entries has either only nans or empty strings, this check is necessary

don't we want to warn if there is a matchmatch at all?

I thought we wanted to warn if we have more data-columns than names/headers except if we have trailing commas?

do we have a test for that? I would warn regardless

Changed this based on #38587 (comment)

test_no_header_two_extra_columns checks the warning and

pandas/pandas/tests/io/parser/test_common.py

Line 1066 in b5707d6

def test_trailing_delimiters(all_parsers):

checks trailing commas which does not raise a Warning

jreback · 2021-01-04T13:41:17Z

can you merge master.

cc @gfyoung

� Conflicts: � pandas/tests/io/parser/test_common.py

phofl · 2021-01-04T14:03:30Z

Merged

jreback · 2021-01-04T14:18:30Z

pandas/io/parsers.py

+        data: list of array-likes containing the data column-wise
+
+        """
+        if not self.index_col and len(columns) != len(data) and columns:


why do we need to check that data is actually null? IOW when would this situation happen when len(columns) > len(data) ?

len(columns) > len(data) is caught at another place I think.
We run in there when len(columns) < len(data). In case of one set of trailing commas we have len(columns) + 1 = len(data). To see if we really have trailing commas we have to check if array is empty. If array is not empty we do not have trailing commas but data which will be dropped.

have trailing commas but data which will be dropped.

ok ideally we should put these kinds of checks in the same place that is happening if possible.

Bad wording, with caught I meant if we got more columns than len(data), these columns are inserted all nans.

jreback · 2021-04-21T13:07:34Z

this looks reasonable.

any comments @pandas-dev/pandas-core

� Conflicts: � pandas/io/parsers/python_parser.py

jorisvandenbossche · 2021-05-14T22:21:28Z

doc/source/user_guide/io.rst

@@ -757,6 +757,7 @@ the end of each data line, confusing the parser. To explicitly disable the
 index column inference and discard the last column, pass ``index_col=False``:

 .. ipython:: python
+    :okwarning:


If this is going to warn, should the docs here then have to be updated to reflect this change?

(but is this actually going to warn? Below I read "One set of trailing commas is allowed.", which is the case here?)

Good point, this raised a Warning earlier before allowing one set of trailing commas

rhshadrach

Looks good - minor requests. Also, can you add behavior of index_col=None to the docstring as mentioned at the top of #21768 (comment)

pandas/io/parsers/base_parser.py

jreback · 2021-05-21T17:36:27Z

@phofl can you rebase and some questions above

phofl · 2021-05-23T22:47:29Z

I think I have adressed all comments

pandas/io/parsers/base_parser.py

rhshadrach

lgtm

simonjayhawkins · 2021-06-11T13:10:15Z

@phofl can you resolve conflicts.

� Conflicts: � doc/source/whatsnew/v1.3.0.rst

phofl · 2021-06-12T10:28:28Z

resolved conflicts, @jreback ready to merge?

jreback · 2021-06-16T02:14:27Z

thanks @phofl

… names does not match length of data

jreback · 2021-06-16T02:14:37Z

@meeseeksdev backport 1.3.x

lumberbot-app · 2021-06-16T02:14:43Z

Something went wrong ... Please have a look at my logs.

…s not match length of data (#42047) Co-authored-by: Patrick Hoefler <[email protected]>

…f data (pandas-dev#38587)

ENH: Raise ParserWarning when length of names does not match length o…

d98c6fd

…f data

phofl added IO CSV read_csv, to_csv Warnings Warnings that appear or should be added to pandas labels Dec 19, 2020

gfyoung reviewed Dec 19, 2020

View reviewed changes

doc/source/whatsnew/v1.3.0.rst Outdated Show resolved Hide resolved

gfyoung reviewed Dec 19, 2020

View reviewed changes

pandas/_libs/parsers.pyx Outdated Show resolved Hide resolved

gfyoung reviewed Dec 19, 2020

View reviewed changes

pandas/_libs/parsers.pyx Outdated Show resolved Hide resolved

gfyoung reviewed Dec 19, 2020

View reviewed changes

pandas/tests/io/parser/test_common.py Outdated Show resolved Hide resolved

phofl added 4 commits December 19, 2020 22:31

Fix bugs from strg+z

26b07b2

Refactor code

7dd3f1b

Refactor if else

70d5c1c

Add okwarning

76abd33

jreback added this to the 1.3 milestone Dec 21, 2020

Merge branch 'master' of https://github.com/pandas-dev/pandas into 21768

31929f4

� Conflicts: � doc/source/whatsnew/v1.3.0.rst

jreback mentioned this pull request Jan 1, 2021

BUG: Add warning if rows have more columns than expected #33782

Closed

5 tasks

phofl added 2 commits January 3, 2021 20:06

Merge branch 'master' of https://github.com/pandas-dev/pandas into 21768

3813435

� Conflicts: � doc/source/whatsnew/v1.3.0.rst � pandas/tests/io/parser/test_common.py

Allow trailing commas

5b688f7

jreback requested changes Jan 3, 2021

View reviewed changes

Fix dtype bug

56cdd18

phofl added 5 commits January 4, 2021 14:58

Fix npdev bug

ac15a30

Merge branch 'master' of https://github.com/pandas-dev/pandas into 21768

4b08ab6

� Conflicts: � pandas/tests/io/parser/test_common.py

Add missing init file

387b5fa

Remove empty file

53cac93

Add warning

5d142fe

jreback reviewed Jan 4, 2021

View reviewed changes

phofl removed the Stale label Apr 20, 2021

Fix typing

eb77157

jreback approved these changes Apr 21, 2021

View reviewed changes

phofl added 2 commits May 14, 2021 22:29

Merge branch 'master' of https://github.com/pandas-dev/pandas into 21768

9dce995

� Conflicts: � pandas/io/parsers/python_parser.py

Change test

16faf35

jorisvandenbossche requested changes May 14, 2021

View reviewed changes

Remove warning

4b3f63a

rhshadrach requested changes May 15, 2021

View reviewed changes

pandas/io/parsers/base_parser.py Show resolved Hide resolved

pandas/io/parsers/base_parser.py Outdated Show resolved Hide resolved

pandas/io/parsers/base_parser.py Outdated Show resolved Hide resolved

pandas/io/parsers/base_parser.py Outdated Show resolved Hide resolved

phofl added 2 commits May 24, 2021 00:33

Merge branch 'master' of https://github.com/pandas-dev/pandas into 21768

ca2f026

Adress comments

fa6fed0

rhshadrach requested changes May 24, 2021

View reviewed changes

pandas/io/parsers/base_parser.py Show resolved Hide resolved

simonjayhawkins removed this from the 1.3 milestone May 25, 2021

Merge branch 'master' of https://github.com/pandas-dev/pandas into 21768

afb023f

rhshadrach approved these changes Jun 9, 2021

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into 21768

95770d1

� Conflicts: � doc/source/whatsnew/v1.3.0.rst

jreback added this to the 1.3 milestone Jun 16, 2021

jreback merged commit a6a8915 into pandas-dev:master Jun 16, 2021

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jun 16, 2021

Backport PR pandas-dev#38587: ENH: Raise ParserWarning when length of…

7176bee

… names does not match length of data

meeseeksmachine mentioned this pull request Jun 16, 2021

Backport PR #38587 on branch 1.3.x (ENH: Raise ParserWarning when length of names does not match length of data) #42047

Merged

simonjayhawkins pushed a commit that referenced this pull request Jun 16, 2021

Backport PR #38587: ENH: Raise ParserWarning when length of names doe…

76a28a0

…s not match length of data (#42047) Co-authored-by: Patrick Hoefler <[email protected]>

phofl deleted the 21768 branch June 16, 2021 08:45

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

ENH: Raise ParserWarning when length of names does not match length o…

42f9460

…f data (pandas-dev#38587)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Raise ParserWarning when length of names does not match length of data #38587

ENH: Raise ParserWarning when length of names does not match length of data #38587

phofl commented Dec 19, 2020

jreback commented Dec 21, 2020

phofl commented Dec 23, 2020

jreback commented Dec 24, 2020

phofl commented Jan 3, 2021

jreback Jan 3, 2021

phofl Jan 3, 2021

jreback Jan 3, 2021

phofl Jan 3, 2021

jreback Jan 3, 2021

phofl Jan 3, 2021

jreback commented Jan 4, 2021

phofl commented Jan 4, 2021

jreback Jan 4, 2021

phofl Jan 4, 2021

jreback Jan 4, 2021

phofl Jan 4, 2021

jreback commented Apr 21, 2021

jorisvandenbossche May 14, 2021

phofl May 14, 2021

rhshadrach left a comment

jreback commented May 21, 2021

phofl commented May 23, 2021

rhshadrach left a comment

simonjayhawkins commented Jun 11, 2021

phofl commented Jun 12, 2021

jreback commented Jun 16, 2021

jreback commented Jun 16, 2021

lumberbot-app bot commented Jun 16, 2021

ENH: Raise ParserWarning when length of names does not match length of data #38587

ENH: Raise ParserWarning when length of names does not match length of data #38587

Conversation

phofl commented Dec 19, 2020

jreback commented Dec 21, 2020

phofl commented Dec 23, 2020

jreback commented Dec 24, 2020

phofl commented Jan 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 4, 2021

phofl commented Jan 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

jreback commented May 21, 2021

phofl commented May 23, 2021

rhshadrach left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Jun 11, 2021

phofl commented Jun 12, 2021

jreback commented Jun 16, 2021

jreback commented Jun 16, 2021

lumberbot-app bot commented Jun 16, 2021