[SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown #50684

wengh · 2025-04-23T17:19:08Z

What changes were proposed in this pull request?

Update python_data_source.rst to add filter pushdown docs.

Why are the changes needed?

Feature was added but documentation was still missing.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Verified locally

Was this patch authored or co-authored using generative AI tooling?

Yes. Initial draft was generated using AI then manually edited.

Generated-by: GitHub Copilot with Claude 3.7 Sonnet

allisonwang-db

Thanks for adding the docs!

allisonwang-db · 2025-04-23T21:27:08Z

python/docs/source/user_guide/sql/python_data_source.rst

+
+Other methods such as DataSource.schema() and DataSourceStreamReader.latestOffset() can be stateful. Changes to the object state made in these methods are visible to future invocations.
+
+Refer to the documentation of each method for more details.


Can we also link to the documentation here?

python/docs/source/user_guide/sql/python_data_source.rst

allisonwang-db · 2025-04-28T18:05:40Z

python/docs/source/user_guide/sql/python_data_source.rst

+
+    from pyspark.sql.datasource import EqualTo, Filter, GreaterThan, LessThan
+
+    def pushFilters(self, filters: List[Filter]) -> Iterable[Filter]:


Can we add a complete example here so that people can copy paste and try it out?

Changed to an example source that returns prime numbers sequentially

allisonwang-db · 2025-04-28T18:07:20Z

python/pyspark/sql/datasource.py

+        Configuration `spark.sql.python.filterPushdown.enabled` must be set to `true`
+        to implement this method.


Not sure if we should put this in the doc. Can we throw an warning in the code?

Yeah, we already show an error if it's disabled. I guess the user can find out when they try to use the source so it's not necessary to put in the doc.

github-actions bot added SQL DOCS PYTHON labels Apr 23, 2025

wengh changed the title ~~[SPARK-51883][DOCS][PYTHON] Python Data Source docs for filter pushdown~~ [SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown Apr 23, 2025

allisonwang-db reviewed Apr 28, 2025

View reviewed changes

wengh added 3 commits April 30, 2025 21:02

add mutating state section

11abb63

add filter pushdown docs

0cc4ed7

address comments and change example to prime numbers source

13df8b4

wengh force-pushed the pyds-docs-pushdown branch from 264ccfc to 13df8b4 Compare May 1, 2025 01:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown #50684

[SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown #50684

wengh commented Apr 23, 2025 •

edited

Loading

allisonwang-db left a comment

allisonwang-db Apr 23, 2025

allisonwang-db Apr 28, 2025

wengh May 1, 2025

allisonwang-db Apr 28, 2025

wengh May 1, 2025


		Other methods such as DataSource.schema() and DataSourceStreamReader.latestOffset() can be stateful. Changes to the object state made in these methods are visible to future invocations.

		Refer to the documentation of each method for more details.


		from pyspark.sql.datasource import EqualTo, Filter, GreaterThan, LessThan

		def pushFilters(self, filters: List[Filter]) -> Iterable[Filter]:

		Configuration `spark.sql.python.filterPushdown.enabled` must be set to `true`
		to implement this method.

[SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown #50684

Are you sure you want to change the base?

[SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown #50684

Conversation

wengh commented Apr 23, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

allisonwang-db left a comment

Choose a reason for hiding this comment

allisonwang-db Apr 23, 2025

Choose a reason for hiding this comment

allisonwang-db Apr 28, 2025

Choose a reason for hiding this comment

wengh May 1, 2025

Choose a reason for hiding this comment

allisonwang-db Apr 28, 2025

Choose a reason for hiding this comment

wengh May 1, 2025

Choose a reason for hiding this comment

wengh commented Apr 23, 2025 •

edited

Loading