-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
What's the issue or suggestion?
The example code here leaves out a key requirement: https://docs.dagster.io/integrations/spark#submitting-pyspark-ops-on-emr, all the project python dependencies need to be installed.
For Databricks this can be done in step launcher config. The API config docs are, unfortunately, rather verbose and do not provide an easy example of the syntax for how to do this. Here is an example:
For EMR, all the python dependencies in the dagster project (setup.py and requirements.txt) need to be installed manually. Normally this installation would be done using bootstrap.sh as documented in https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-install-kernels-libs.html
Currently it is painful to figure this out, users report iterating through run launches that cause cryptic log errors (need to view stderr to see the actual message) and then fixing the error messages package by package, run by run 😱
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.