-
Notifications
You must be signed in to change notification settings - Fork 15
🚸 Restrict "column 'col' not in dataframe" error to at most 10 columns #2549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2549 +/- ##
==========================================
+ Coverage 92.15% 92.16% +0.01%
==========================================
Files 60 60
Lines 9988 10023 +35
==========================================
+ Hits 9204 9238 +34
- Misses 784 785 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so complicated and so much code, is there a way you can utilize
lamindb/lamindb/models/_from_values.py
Line 301 in 1373325
def _format_values( |
…/lamindb into feature/df_not_col_msg
@sunnyosun thank you. I don't know why I didn't think of it but it makes it much more concise. |
if "column" in err_msg and "not in dataframe" in err_msg: | ||
missing_col = err_msg.split("column '")[1].split("'")[0] | ||
display_cols_str = _format_values( | ||
list(self._dataset.columns), n=10, quotes=True, sep="'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified to display_cols_str = _format_values(self._dataset.columns, n=10)
?
display_cols_str = _format_values( | ||
list(self._dataset.columns), n=10, quotes=True, sep="'" | ||
) | ||
err_msg = f"column '{missing_col}' not in dataframe. {len(list(self._dataset.columns))} columns in dataframe including: {display_cols_str}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err_msg = f"column '{missing_col}' not in dataframe. {len(list(self._dataset.columns))} columns in dataframe: {display_cols_str}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if more than 1 columns are not in dataframe?
This is OK if you need it urgently, but it's a bit patchy, we should ideally have a parser for all pandera errors. |
No this is not urgent so I might revisit this at a later point with more general code. |
Fixes #2547
This is in line with the rest of our UX where we also only show the 10 first hits. The difference becomes bigger of course the more columns we have. I observed that a user got spammed with a stupid amount of columns for their dataset which made the whole notebook unreadable.