-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown #50684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the docs!
|
||
Other methods such as DataSource.schema() and DataSourceStreamReader.latestOffset() can be stateful. Changes to the object state made in these methods are visible to future invocations. | ||
|
||
Refer to the documentation of each method for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also link to the documentation here?
from pyspark.sql.datasource import EqualTo, Filter, GreaterThan, LessThan | ||
def pushFilters(self, filters: List[Filter]) -> Iterable[Filter]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a complete example here so that people can copy paste and try it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to an example source that returns prime numbers sequentially
python/pyspark/sql/datasource.py
Outdated
Configuration `spark.sql.python.filterPushdown.enabled` must be set to `true` | ||
to implement this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should put this in the doc. Can we throw an warning in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we already show an error if it's disabled. I guess the user can find out when they try to use the source so it's not necessary to put in the doc.
What changes were proposed in this pull request?
Update
python_data_source.rst
to add filter pushdown docs.Why are the changes needed?
Feature was added but documentation was still missing.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Verified locally
Was this patch authored or co-authored using generative AI tooling?
Yes. Initial draft was generated using AI then manually edited.
Generated-by: GitHub Copilot with Claude 3.7 Sonnet