-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
ENH: Raise ParserWarning when length of names does not match length of data #38587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
d98c6fd
26b07b2
7dd3f1b
70d5c1c
76abd33
31929f4
3813435
5b688f7
56cdd18
ac15a30
4b08ab6
387b5fa
53cac93
5d142fe
764e002
8bd631a
b21b795
5c19c9f
928ad4f
eb77157
9dce995
16faf35
4b3f63a
ca2f026
fa6fed0
afb023f
95770d1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1844,6 +1844,28 @@ def _do_date_conversions(self, names, data): | |
|
||
return names, data | ||
|
||
def _check_data_length(self, columns: List[str], data: List[np.ndarray]): | ||
"""Checks if length of data is equal to length of column names. One set of | ||
trailing commas is allowed. | ||
|
||
Parameters | ||
---------- | ||
columns: list of column names | ||
data: list of array-likes containing the data column-wise | ||
|
||
""" | ||
if not self.index_col and len(columns) != len(data) and columns: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need to check that data is actually null? IOW when would this situation happen when len(columns) > len(data) ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. len(columns) > len(data) is caught at another place I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
ok ideally we should put these kinds of checks in the same place that is happening if possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bad wording, with caught I meant if we got more columns than len(data), these columns are inserted all nans. |
||
if len(columns) == len(data) - 1 and np.all( | ||
(is_object_dtype(data[-1]) and data[-1] == "") | isna(data[-1]) | ||
): | ||
return | ||
warnings.warn( | ||
"Length of header or names does not match length of data. This leads " | ||
"to a loss of data with index_col=False.", | ||
ParserWarning, | ||
stacklevel=6, | ||
) | ||
|
||
|
||
class CParserWrapper(ParserBase): | ||
def __init__(self, src: FilePathOrBuffer, **kwds): | ||
|
@@ -2128,6 +2150,8 @@ def read(self, nrows=None): | |
|
||
# columns as list | ||
alldata = [x[1] for x in data] | ||
if self.usecols is None: | ||
self._check_data_length(names, alldata) | ||
|
||
data = {k: v for k, (i, v) in zip(names, data)} | ||
|
||
|
@@ -2516,6 +2540,8 @@ def _exclude_implicit_index(self, alldata): | |
if self._col_indices is not None and len(names) != len(self._col_indices): | ||
names = [names[i] for i in sorted(self._col_indices)] | ||
|
||
self._check_data_length(names, alldata) | ||
|
||
return {name: alldata[i + offset] for i, name in enumerate(names)}, names | ||
|
||
# legacy | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to warn, should the docs here then have to be updated to reflect this change?
(but is this actually going to warn? Below I read "One set of trailing commas is allowed.", which is the case here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this raised a Warning earlier before allowing one set of trailing commas