Skip to content

Commit 6576148

Browse files
authored
Merge pull request #836 from ying-w/master
Added s3 + spark session instructions
2 parents fee7942 + 69f811b commit 6576148

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

docs/using/recipes.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,29 @@ A few suggestions have been made regarding using Docker Stacks with spark.
207207

208208
### Using PySpark with AWS S3
209209

210+
Using Spark session for hadoop 2.7.3
211+
212+
```py
213+
import os
214+
# !ls /usr/local/spark/jars/hadoop* # to figure out what version of hadoop
215+
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages "org.apache.hadoop:hadoop-aws:2.7.3" pyspark-shell'
216+
217+
import pyspark
218+
myAccessKey = input()
219+
mySecretKey = input()
220+
221+
spark = pyspark.sql.SparkSession.builder \
222+
.master("local[*]") \
223+
.config("spark.hadoop.fs.s3a.access.key", myAccessKey) \
224+
.config("spark.hadoop.fs.s3a.secret.key", mySecretKey) \
225+
.getOrCreate()
226+
227+
df = spark.read.parquet("s3://myBucket/myKey")
210228
```
229+
230+
Using Spark context for hadoop 2.6.0
231+
232+
```py
211233
import os
212234
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'
213235

0 commit comments

Comments
 (0)