[SPARK-51880][ML][PYTHON][CONNECT] Fix ML cache object python client references #50707

WeichenXu123 · 2025-04-25T05:58:57Z

What changes were proposed in this pull request?

Fix ML cache object python client references.

When a model is copied from client, it results in multiple client model objects refer to the same server cached model.
In this case, we need a reference count, only when reference count decreases to zero, we can release the server cached model.

Why are the changes needed?

Bugfix.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

Signed-off-by: Weichen Xu <[email protected]>

zhengruifeng · 2025-04-28T00:28:46Z

merged to master

WeichenXu123 · 2025-04-28T03:02:39Z

python/pyspark/sql/connect/client/core.py

@@ -1981,9 +1981,10 @@ def add_ml_cache(self, cache_id: str) -> None:
        self.thread_local.ml_caches.add(cache_id)

    def remove_ml_cache(self, cache_id: str) -> None:
+        deleted = self._delete_ml_cache([cache_id])
+        # TODO: Fix the code: change thread-local `ml_caches` to global `ml_caches`.


…references ### What changes were proposed in this pull request? Fix ML cache object python client references. When a model is copied from client, it results in multiple client model objects refer to the same server cached model. In this case, we need a reference count, only when reference count decreases to zero, we can release the server cached model. ### Why are the changes needed? Bugfix. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50707 from WeichenXu123/ml-ref-id-fix. Lead-authored-by: Weichen Xu <[email protected]> Co-authored-by: WeichenXu <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

WeichenXu123 added 2 commits April 25, 2025 12:06

global client.ml_caches

9207682

Signed-off-by: Weichen Xu <[email protected]>

init

3d1f51a

Signed-off-by: Weichen Xu <[email protected]>

github-actions bot added SQL ML PYTHON CONNECT labels Apr 25, 2025

WeichenXu123 mentioned this pull request Apr 25, 2025

[SPARK-51880][ML][PYTHON][CONNECT] Avoid eager model removal in meta algorithms when collectSubModel is true #50682

Closed

WeichenXu123 added 2 commits April 25, 2025 14:00

update

7015c1f

Signed-off-by: Weichen Xu <[email protected]>

update

85afb85

Signed-off-by: Weichen Xu <[email protected]>

zhengruifeng approved these changes Apr 25, 2025

View reviewed changes

assert

6fd2449

Signed-off-by: Weichen Xu <[email protected]>

HyukjinKwon approved these changes Apr 25, 2025

View reviewed changes

WeichenXu123 added 7 commits April 25, 2025 17:11

fix

b5176bb

Signed-off-by: Weichen Xu <[email protected]>

fix

7e569a8

Signed-off-by: Weichen Xu <[email protected]>

format

1b89741

Signed-off-by: Weichen Xu <[email protected]>

format

030ec02

Signed-off-by: Weichen Xu <[email protected]>

format

5aa6445

Signed-off-by: Weichen Xu <[email protected]>

Merge branch 'apache:master' into ml-ref-id-fix

6113f00

revert mlcache change

3c73166

Signed-off-by: Weichen Xu <[email protected]>

github-actions bot removed the SQL label Apr 27, 2025

WeichenXu123 added 3 commits April 27, 2025 14:06

update

ad3d3a3

Signed-off-by: Weichen Xu <[email protected]>

update

bbca158

Signed-off-by: Weichen Xu <[email protected]>

update

634699d

Signed-off-by: Weichen Xu <[email protected]>

github-actions bot added the SQL label Apr 27, 2025

zhengruifeng closed this in aeff679 Apr 28, 2025

WeichenXu123 commented Apr 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-51880][ML][PYTHON][CONNECT] Fix ML cache object python client references #50707

[SPARK-51880][ML][PYTHON][CONNECT] Fix ML cache object python client references #50707

Uh oh!

WeichenXu123 commented Apr 25, 2025 •

edited

Loading

Uh oh!

zhengruifeng commented Apr 28, 2025

Uh oh!

WeichenXu123 Apr 28, 2025

Uh oh!

Uh oh!

[SPARK-51880][ML][PYTHON][CONNECT] Fix ML cache object python client references #50707

[SPARK-51880][ML][PYTHON][CONNECT] Fix ML cache object python client references #50707

Uh oh!

Conversation

WeichenXu123 commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng commented Apr 28, 2025

Uh oh!

WeichenXu123 Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WeichenXu123 commented Apr 25, 2025 •

edited

Loading