Skip to content

Conversation

@bjornjorgensen
Copy link
Contributor

@bjornjorgensen bjornjorgensen commented Jun 18, 2022

Describe your changes

This is part 1 of 2 for updating Apache Spark to 3.3.0 with Scala 2.13 and Java version 17.
This PR will remove the spylon-kernel, which means that there will be no more Scala support.
The Spylon-kernel which was last updated in 2017 and is inactive.
When updating to Spark 3.3.0 with Scala 2.13, tests for the spylon kernel are failing.
[100%] FAILED tests/all-spark-notebook/test_spark_notebooks.py::test_nbconvert[local_spylon]

When we upgrade pyspark (python on spark) image, the tests are running for the all-spark-notebook. The all-spark-notebook are pyspark, R and Scala. Scala is provided thru the not updated spylon-kernel, and this is the reason why we need to remove it.
The only reason why we have the spylon kernel is to have support for notebooks with Scala to use Spark.

However Apache Spark have now voted to make a client The JIRA issue
So for the next Apache Spark 3.4.0 this issue will hopefully be solved by the Spark Client.

Issue ticket if applicable

Upgrading Spark->3.3,Hadoop->3,Scala->2.13,Java->17

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@bjornjorgensen
Copy link
Contributor Author

@mathbunnyru When this PR is merged I will start working on a new PR to upgrade to spark 3.3.0

@mathbunnyru
Copy link
Member

In general, this LGTM 👍
@consideRatio could you take a look, please?

When we get rid of leftovers, I would probably like to squash-merge this, to make it easier to see what needs to be added back in the future (if another kernel will be added).

@consideRatio
Copy link
Member

consideRatio commented Jun 18, 2022

Thank you for your work to maintain these parts, and thank you for making a dedicated PR for something like this - that is great for visibility I think, nice! ❤️ 🎉

I'm not confident on the situation overall, entirely lacking experience with spark, but this looks reasonable to me.

I suggest:

  • updating the title to clarify this relates to the "all spark" image
  • clarifying the assumption currently being hidden: that this image is meant to support work against spark or similar, and that what is removed here is justified based on that.
  • don't assume knowledge that the spylon kernel is coupled with scala (which I've just guessed it is)

@bjornjorgensen bjornjorgensen changed the title Remove scala Remove spylon-kernel from all images. Jun 18, 2022
@bjornjorgensen bjornjorgensen marked this pull request as draft June 18, 2022 20:35
@bjornjorgensen
Copy link
Contributor Author

When updating to Spark 3.3.0 with Scala 2.13 Test for spylon kernel are failing. [100%] FAILED tests/all-spark-notebook/test_spark_notebooks.py::test_nbconvert[local_spylon]
The Spylon-kernal witch wos last updated in 2017 and is inactive. 

The Ammonite-spark have support for Scala 2.13 It then run Spark by using ivy

Toree is another one for Spark and Scala. This got support for  java version 8 and 11. We are going to upgrade to 17 now.  

There are something that tells me that there are something wrong in with the CI 
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.2.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
This means that we are upgrading pyspark image, but the tests running for the all-spark-notebook are using old images. 

I have put this PR in WIP mode so that we can find a proper solution to this problem.

@consideRatio
Copy link
Member

Thanks!

Btw, I think maybe avoiding adding something else for scala is the best course of action until its a clear wish from an active user of these images. Adding it before that seems premature.

@bjornjorgensen
Copy link
Contributor Author

Yes, but when we upgrade pyspark (python on spark) image, the tests are running for the all-spark-notebook. all-spark-notebook are pyspark, r and scala. Scala is provided thru the not updated spylon-kernel, and this is the problem.

Now I don't really wanna remove support for Scala in all images, but it's tested by spark.

@bjornjorgensen
Copy link
Contributor Author

@consideRatio Now I have updated the description for this PR. Are there still any questions?

@Bidek56
Copy link
Contributor

Bidek56 commented Jun 21, 2022

I am little concerned about leaving all the Spark Scala users out in the cold. Hence I would update our documentation to mention Almond or Databricks Thx

@bjornjorgensen
Copy link
Contributor Author

@Bidek56 Yes, but at the moment there are no Scala kernels that have support for Spark 3.3.X with Java 17 and Scala 2.13. That's the reason why your PR failed to build now.. 

You can link to the Almond kernel, but it wont help users that will run the latest Spark version. We can apply this PR which removes Scala. And when the Almond kernel gets support for the latest Spark version, we can make a new PR with Scala support through the Almond kernel. 

@Bidek56
Copy link
Contributor

Bidek56 commented Jun 22, 2022

@Bidek56 Yes, but at the moment there are no Scala kernels that have support for Spark 3.3.X with Java 17 and Scala 2.13. That's the reason why your PR failed to build now.. 

You can link to the Almond kernel, but it wont help users that will run the latest Spark version. We can apply this PR which removes Scala. And when the Almond kernel gets support for the latest Spark version, we can make a new PR with Scala support through the Almond kernel.

It was decided in prior discussions not to include Almond in this repo since it has its own Dockerfile.

@mathbunnyru
Copy link
Member

Thanks!

Btw, I think maybe avoiding adding something else for scala is the best course of action until its a clear wish from an active user of these images. Adding it before that seems premature.

I agree with @consideRatio.
Let's remove spylon-kernel for now because we're not even sure people use scala that much in our images.
Also, this kernel is deprecated and stops us from moving forward, and this is something our users clearly want.

When another kernel works with our images, we will try to document it or merge it into our images.

Merging this one, thank you for your contribution 👍

@mathbunnyru
Copy link
Member

Squash merging this to allow easier reverts if needed.

@mathbunnyru mathbunnyru merged commit 5048b02 into jupyter:master Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants