-
Notifications
You must be signed in to change notification settings - Fork 367
Add a hidden column __natural_order__.
#1146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
__natural_order__.__natural_order__.
Codecov Report
@@ Coverage Diff @@
## master #1146 +/- ##
==========================================
- Coverage 95.22% 95.22% -0.01%
==========================================
Files 35 35
Lines 7062 7055 -7
==========================================
- Hits 6725 6718 -7
Misses 337 337
Continue to review full report at Codecov.
|
__natural_order__.__natural_order__.
Softagram Impact Report for pull/1146 (head commit: 3d48413)
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that it harms readability in the codes ... Now hidden columns are going around in the Spark DataFrame. But I guess we have no choice to deal with the natural order ..
I am merging this for now but I really think we should improve the readability .. at least we can have, for instance, sdf_without_hidden_columns property or another layer or util to control such things ..
|
|
||
| NATURAL_ORDER_COLUMN_NAME = '__natural_order__' | ||
|
|
||
| HIDDEN_COLUMNS = set([NATURAL_ORDER_COLUMN_NAME]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no big deal but we don't we just use a list to keep the order? I don't think it's likely to have a duplicated columns if that was the concern.
|
|
||
| return_schema = StructType( | ||
| [StructField(SPARK_INDEX_NAME_FORMAT(0), LongType())] + list(sdf.schema)) | ||
| columns = [f.name for f in return_schema] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It think StructType has names property for this line.
|
|
||
| sdf = self._sdf.select( | ||
| self._internal.index_scols + [c._scol for c in applied]) | ||
| self._internal.index_scols + [c._scol for c in applied] + list(HIDDEN_COLUMNS)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to select the hidden columns here explicitly? I suspect it was selected since they are map-like operations (?). Maybe we should just don't do this in all code base if I am not mistaken ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually in most cases we can remove, but in some cases we can't, e.g., right after _cum, and window-like functions after #1151 is merged.
|
I am merging this for now. I will think about how to clean up too ... we can address the comments together later. |


Adding a hidden column
__natural_order__for ordering the rows as it is, especially for window-like operations likecumxxx.