Skip to content

Importing requester-pay S3 buckets with FSx #901

Closed
@JiaweiZhuang

Description

@JiaweiZhuang

Environment:

  • AWS ParallelCluster 2.2.1
  • OS: CentOS7
  • Scheduler: Slurm
  • Master instance type: c5n.large
  • Compute instance type: c5n.18xlarge

Bug description and how to reproduce:
I am trying to import AWS Open Data (https://registry.opendata.aws/) with FSx. It works well for most public buckets (quite amazing!), but fails with buckets in requester-pay mode.

This is the specific dataset I am trying to import, managed by my research group at Harvard:
https://registry.opendata.aws/geoschem-input-data/
Our current contract is limited to requester-pay mode and probably won't change in the near term.

By setting import_path = s3://gcgrid (our data), FSx seems to have trouble importing it:

$ pcluster create fsx-requester-pay-s3
Beginning cluster creation for cluster: fsx-requester-pay-s3
Creating stack named: parallelcluster-fsx-requester-pay-s3
Status: parallelcluster-fsx-requester-pay-s3 - ROLLBACK_IN_PROGRESS
Cluster creation failed.  Failed events:
  - AWS::CloudFormation::Stack EBSCfnStack Resource creation cancelled
  - AWS::IAM::InstanceProfile RootInstanceProfile Resource creation cancelled
  - AWS::EC2::EIPAssociation AssociateEIP Resource creation cancelled
  - AWS::CloudFormation::Stack FSXSubstack Embedded stack arn:aws:cloudformation:us-east-1:753979222379:stack/parallelcluster-fsx-requester-pay-s3-FSXSubstack-1WHT3VJ4298XE/ed68b7e0-3c3f-11e9-a755-0e0d3a451244 was not successfully created: The following resource(s) failed to create: [FileSystem].

The FSx section in the config file is:

[fsx fs]
shared_dir = /fsx
storage_capacity = 14400
imported_file_chunk_size = 1024
# import_path = s3://era5-pds  # this works well
import_path = s3://gcgrid

Is it possible to get around this by tweaking the IAM role? I am currently using the default IAM settings. This is probably an edge case that is only specific to my use cases. I can also copy the data manually for now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions