Skip to content

Unable to use single sign on (sso) to run Glue script #224

@Stefan-Dienst

Description

@Stefan-Dienst

Hi,

Thanks for providing this package.

I was following the setup guide, which worked well, but I ran into problems trying to use sso for authentication. I have logged into my sso session using the aws cli, set the AWS_PROFILE environment variable and can use it from the command line, e.g. aws s3 ls works. But when I submit a gluejob using ./bin/gluesparksubmit simple_glue_script.py I get the following error:

: java.nio.file.AccessDeniedException: s3://stefan-glue-tests/input/file.csv: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by DefaultAWSCredentialsProviderChain : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider@dd2a19a: Unable to load credentials into profile [profile sandbox]: AWS Access Key ID is not specified., com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@9479be7: Failed to connect to service endpoint: ]

running this script:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

s3_input_path = "s3://stefan-glue-tests/input/file.csv"
dynamic_frame = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    connection_options={"paths": [s3_input_path]},
    format="csv",
    format_options={"withHeader": True}
)
dynamic_frame.printSchema()

s3_output_path = "s3://stefan-glue-tests/output/"
glueContext.write_dynamic_frame.from_options(
    frame=dynamic_frame,
    connection_type="s3",
    connection_options={"path": s3_output_path},
    format="parquet"
)

Following this stack overflow I assumed that the SSO dependency was missing and added them to the pom.xml:

<!-- AWS SDK SSO Dependency -->
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>sso</artifactId>
		<version>2.16.76</version>
    </dependency>

    <!-- AWS SDK SSO OIDC Dependency -->
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>ssooidc</artifactId>
		<version>2.16.76</version>
    </dependency>

But the error still prevails. I am suspecting that they are just not used, but I am unable to debug this. Only setting the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN does the trick.

Do you have any idea what could be the issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions