Skip to content

Investigate and potentially add support for spark connect #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
24 tasks done
razvan opened this issue Sep 18, 2023 · 7 comments
Closed
24 tasks done

Investigate and potentially add support for spark connect #284

razvan opened this issue Sep 18, 2023 · 7 comments

Comments

@razvan
Copy link
Member

razvan commented Sep 18, 2023

Spark Connect

Spark 3.5 introduces a new client called Spark Connect.

The use case seems to be thin clients that connect to a running spark driver.

This probably means that the operator needs to be able to start spark connect servers without spark applications and publish a service for "connect" clients.

Roadmap

Rough roadmap to GA:

  • POC: can set up a spark-connect server with kubernetes as resource manager, basic integration test
  • minimal CRD: drop the stateful set, minimum configuration for the server (jvm props, logging)
    • server
      • deployment with one replica
      • jvm arg overrides
      • config overrides
      • env overrides
      • log configuration and aggregation with vector
      • pod overrides
      • resource requests
      • status and transition events
      • reconciliation operation (paused, stopped, etc)
    • executor
      • jvm arg overrides
      • config overrides
      • env overrides
      • log configuration and aggregation
      • resource requests
      • pod affinity
  • add preliminary documentation
  • expose Prometheus metrics
  • integrate with the history server See: doc: comment on spark history integration #559
  • integrate with the listener op
  • create a new demo

Related PRs

@razvan razvan changed the title Investigate and potentially ad support for spark connect Investigate and potentially add support for spark connect Sep 18, 2023
@adwk67 adwk67 self-assigned this Dec 20, 2023
@adwk67 adwk67 removed their assignment Aug 30, 2024
@timrobertson100
Copy link

We have started exploring Spark Connect at GBIF.org. Our primary use case is to explore having a long running spark cluster hold an in memory cached table, for apps to do filtered data egress with minimal startup cost.

@timrobertson100
Copy link

@razvan - thank you for your work. When you are ready, we will be interested in helping to test.

@razvan
Copy link
Member Author

razvan commented Apr 11, 2025

@timrobertson100 - we merged preliminary support for spark connect deployments in the main branch. Looking forward for your feedback!

@dshershov
Copy link

Awesome! Good job guys!

@lfrancke
Copy link
Member

lfrancke commented May 9, 2025

@razvan This is in "Done" but not closed. Anything left to do?

Could you please add release notes and a link to the docs?

@lfrancke lfrancke moved this from Development: Done to Acceptance: In Progress in Stackable Engineering May 9, 2025
@razvan
Copy link
Member Author

razvan commented May 9, 2025

This release adds experimental support for Spark Connect. The Spark operator watches for SparkConnectServer custom resources. Preliminary documentation is available here [1] and the existing Taxi Data Anomaly Detection demo [2] has been retrofitted to use a JupterLab client running against a Spark Connect server.

[1] https://docs.stackable.tech/home/nightly/spark-k8s/usage-guide/spark-connect/
[2] https://docs.stackable.tech/home/nightly/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data/

@razvan razvan closed this as completed May 9, 2025
@timrobertson100
Copy link

Thank you very much. We have been watching but have not yet had time to test - it's on our backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Status: Acceptance: In Progress
Development

No branches or pull requests

6 participants