File tree Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Original file line number Diff line number Diff line change @@ -156,7 +156,29 @@ A few suggestions have been made regarding using Docker Stacks with spark.
156
156
157
157
### Using PySpark with AWS S3
158
158
159
+ Using Spark session for hadoop 2.7.3
160
+
161
+ ``` py
162
+ import os
163
+ # !ls /usr/local/spark/jars/hadoop* # to figure out what version of hadoop
164
+ os.environ[' PYSPARK_SUBMIT_ARGS' ] = ' --packages "org.apache.hadoop:hadoop-aws:2.7.3" pyspark-shell'
165
+
166
+ import pyspark
167
+ myAccessKey = input ()
168
+ mySecretKey = input ()
169
+
170
+ spark = pyspark.sql.SparkSession.builder \
171
+ .master(" local[*]" ) \
172
+ .config(" spark.hadoop.fs.s3a.access.key" , myAccessKey) \
173
+ .config(" spark.hadoop.fs.s3a.secret.key" , mySecretKey) \
174
+ .getOrCreate()
175
+
176
+ df = spark.read.parquet(" s3://myBucket/myKey" )
159
177
```
178
+
179
+ Using Spark context for hadoop 2.6.0
180
+
181
+ ``` py
160
182
import os
161
183
os.environ[' PYSPARK_SUBMIT_ARGS' ] = ' --packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'
162
184
You can’t perform that action at this time.
0 commit comments