-
Notifications
You must be signed in to change notification settings - Fork 310
Description
Hi,
Thanks for providing this package.
I was following the setup guide, which worked well, but I ran into problems trying to use sso for authentication. I have logged into my sso session using the aws cli, set the AWS_PROFILE
environment variable and can use it from the command line, e.g. aws s3 ls
works. But when I submit a gluejob using ./bin/gluesparksubmit simple_glue_script.py
I get the following error:
: java.nio.file.AccessDeniedException: s3://stefan-glue-tests/input/file.csv: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by DefaultAWSCredentialsProviderChain : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider@dd2a19a: Unable to load credentials into profile [profile sandbox]: AWS Access Key ID is not specified., com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@9479be7: Failed to connect to service endpoint: ]
running this script:
from awsglue.context import GlueContext
from pyspark.context import SparkContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
s3_input_path = "s3://stefan-glue-tests/input/file.csv"
dynamic_frame = glueContext.create_dynamic_frame.from_options(
connection_type="s3",
connection_options={"paths": [s3_input_path]},
format="csv",
format_options={"withHeader": True}
)
dynamic_frame.printSchema()
s3_output_path = "s3://stefan-glue-tests/output/"
glueContext.write_dynamic_frame.from_options(
frame=dynamic_frame,
connection_type="s3",
connection_options={"path": s3_output_path},
format="parquet"
)
Following this stack overflow I assumed that the SSO dependency was missing and added them to the pom.xml
:
<!-- AWS SDK SSO Dependency -->
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>sso</artifactId>
<version>2.16.76</version>
</dependency>
<!-- AWS SDK SSO OIDC Dependency -->
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>ssooidc</artifactId>
<version>2.16.76</version>
</dependency>
But the error still prevails. I am suspecting that they are just not used, but I am unable to debug this. Only setting the environment variables AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
and AWS_SESSION_TOKEN
does the trick.
Do you have any idea what could be the issue?