File tree Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Original file line number Diff line number Diff line change @@ -207,7 +207,29 @@ A few suggestions have been made regarding using Docker Stacks with spark.
207
207
208
208
### Using PySpark with AWS S3
209
209
210
+ Using Spark session for hadoop 2.7.3
211
+
212
+ ``` py
213
+ import os
214
+ # !ls /usr/local/spark/jars/hadoop* # to figure out what version of hadoop
215
+ os.environ[' PYSPARK_SUBMIT_ARGS' ] = ' --packages "org.apache.hadoop:hadoop-aws:2.7.3" pyspark-shell'
216
+
217
+ import pyspark
218
+ myAccessKey = input ()
219
+ mySecretKey = input ()
220
+
221
+ spark = pyspark.sql.SparkSession.builder \
222
+ .master(" local[*]" ) \
223
+ .config(" spark.hadoop.fs.s3a.access.key" , myAccessKey) \
224
+ .config(" spark.hadoop.fs.s3a.secret.key" , mySecretKey) \
225
+ .getOrCreate()
226
+
227
+ df = spark.read.parquet(" s3://myBucket/myKey" )
210
228
```
229
+
230
+ Using Spark context for hadoop 2.6.0
231
+
232
+ ``` py
211
233
import os
212
234
os.environ[' PYSPARK_SUBMIT_ARGS' ] = ' --packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'
213
235
You can’t perform that action at this time.
0 commit comments