Skip to content

Conversation

tbirch-cyber
Copy link
Contributor

@tbirch-cyber tbirch-cyber commented Aug 14, 2025

Attempted to follow similar style to pyarrow implementation.

Simplifies/cleans up logic for #1146
Part of larger fsspec/pyarrow feature parity #2259
The netloc param may help with any custom logic needed for #2271

@kevinjqliu
Copy link
Contributor

hey @tbirch-cyber thanks for the PR. could you explain whats the reasoning behind this change?

@tbirch-cyber
Copy link
Contributor Author

tbirch-cyber commented Aug 18, 2025

hey @tbirch-cyber thanks for the PR. could you explain whats the reasoning behind this change?

Hi @kevinjqliu thanks for taking the time to review! Sorry I should have put more in the PR message.

Basically, it is very common to have a catalog split across multiple "storage accounts" (similar to an S3 bucket). Because of this, I need a way to provide a credential with an audience of https://storage.azure.com once and have the library determine which storage accounts it needs to access on-the-fly. The fsspec library can only handle one "storage account" per filesystem instantiation, so a separate filesystem needs to be cached for each storage account.

RestCatalog(
    "my-catalog",
    **{
        "uri": "http://my-catalog",
        "header.Authorization": f"Bearer {my_token}",
        "adls.credential": my_cred,
    },
)

I think a similar approach could probably also be used with S3: https://github.com/fsspec/s3fs/blob/main/s3fs/core.py#L203

Please let me know if there’s anything else you’d like changed or if you think this approach isn't maintainable. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants