Description
Environment:
- AWS ParallelCluster 2.2.1
- OS: CentOS7
- Scheduler: Slurm
- Master instance type: c5n.large
- Compute instance type: c5n.18xlarge
Bug description and how to reproduce:
I am trying to import AWS Open Data (https://registry.opendata.aws/) with FSx. It works well for most public buckets (quite amazing!), but fails with buckets in requester-pay mode.
This is the specific dataset I am trying to import, managed by my research group at Harvard:
https://registry.opendata.aws/geoschem-input-data/
Our current contract is limited to requester-pay mode and probably won't change in the near term.
By setting import_path = s3://gcgrid
(our data), FSx seems to have trouble importing it:
$ pcluster create fsx-requester-pay-s3
Beginning cluster creation for cluster: fsx-requester-pay-s3
Creating stack named: parallelcluster-fsx-requester-pay-s3
Status: parallelcluster-fsx-requester-pay-s3 - ROLLBACK_IN_PROGRESS
Cluster creation failed. Failed events:
- AWS::CloudFormation::Stack EBSCfnStack Resource creation cancelled
- AWS::IAM::InstanceProfile RootInstanceProfile Resource creation cancelled
- AWS::EC2::EIPAssociation AssociateEIP Resource creation cancelled
- AWS::CloudFormation::Stack FSXSubstack Embedded stack arn:aws:cloudformation:us-east-1:753979222379:stack/parallelcluster-fsx-requester-pay-s3-FSXSubstack-1WHT3VJ4298XE/ed68b7e0-3c3f-11e9-a755-0e0d3a451244 was not successfully created: The following resource(s) failed to create: [FileSystem].
The FSx section in the config file is:
[fsx fs]
shared_dir = /fsx
storage_capacity = 14400
imported_file_chunk_size = 1024
# import_path = s3://era5-pds # this works well
import_path = s3://gcgrid
Is it possible to get around this by tweaking the IAM role? I am currently using the default IAM settings. This is probably an edge case that is only specific to my use cases. I can also copy the data manually for now.