Skip to content

Url-level glob? #756

@SeguinBe

Description

@SeguinBe

Hello,

It might be because of my lack of comprehension of the package but I can not find an easy way to perform a glob directly on an url.

I see that with fsspec.get_fs_token_paths I can direclty get a path expansion for instance:

fsspec.get_fs_token_paths('gs://my-bucket/test/*.json')
>>> (<gcsfs.core.GCSFileSystem at 0x7f65ec653af0>,
     '90ad04da79e6e943b0f4d3dfba-------------33462f993faa9252',
     ['my-bucket/test/001.jsonl', 'my-bucket/test/002.jsonl'])

But then in order to get expanded urls, I need to reattach the protocol (parsing the original url), or detect if it is a local file. All of that is already done somewhere in get_fs_token_paths so I was wondering if there is an elegant way to just obtain directly: ['gs://my-bucket/test/001.jsonl', 'gs://my-bucket/test/002.jsonl']. Ideally in a way that handles absolute and local paths as well

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions