Skip to content

Commit 69f811b

Browse files
committed
Added s3 + spark session instructions
1 parent 59b402c commit 69f811b

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

docs/using/recipes.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,29 @@ A few suggestions have been made regarding using Docker Stacks with spark.
156156

157157
### Using PySpark with AWS S3
158158

159+
Using Spark session for hadoop 2.7.3
160+
161+
```py
162+
import os
163+
# !ls /usr/local/spark/jars/hadoop* # to figure out what version of hadoop
164+
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages "org.apache.hadoop:hadoop-aws:2.7.3" pyspark-shell'
165+
166+
import pyspark
167+
myAccessKey = input()
168+
mySecretKey = input()
169+
170+
spark = pyspark.sql.SparkSession.builder \
171+
.master("local[*]") \
172+
.config("spark.hadoop.fs.s3a.access.key", myAccessKey) \
173+
.config("spark.hadoop.fs.s3a.secret.key", mySecretKey) \
174+
.getOrCreate()
175+
176+
df = spark.read.parquet("s3://myBucket/myKey")
159177
```
178+
179+
Using Spark context for hadoop 2.6.0
180+
181+
```py
160182
import os
161183
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'
162184

0 commit comments

Comments
 (0)