Skip to content

dvc get from s3 through http fails #9719

@dberenbaum

Description

@dberenbaum

Bug Report

Description

Doing dvc get from an s3 http source fails.

Reproduce

From a fresh repo, try dvc get -v [email protected]:shcheklein/hackathon.git data. Files are downloaded, but the operations fails at the end:

$ dvc get -v [email protected]:shcheklein/hackathon.git data
2023-07-10 09:40:27,301 DEBUG: v3.5.0, CPython 3.11.3 on macOS-13.4.1-arm64-arm-64bit
2023-07-10 09:40:27,301 DEBUG: command: /Users/dave/miniforge3/envs/example-get-started-experiments/bin/dvc get -v [email protected]:shcheklein/hackathon.git data
2023-07-10 09:40:27,436 DEBUG: Creating external repo [email protected]:shcheklein/hackathon.git@None
2023-07-10 09:40:27,436 DEBUG: erepo: git clone '[email protected]:shcheklein/hackathon.git' to a temporary dir
2023-07-10 09:41:38,493 DEBUG: Analytics is disabled.
Traceback (most recent call last):
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/weakref.py", line 666, in _exitfunc
    f()
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/site-packages/fsspec/implementations/http.py", line 121, in close_session
    sync(loop, session.close, timeout=0.1)
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/site-packages/fsspec/asyn.py", line 106, in sync
    raise return_result
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/site-packages/fsspec/asyn.py", line 61, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/site-packages/aiohttp_retry/client.py", line 340, in close
    await self._client.close()
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/site-packages/aiohttp/client.py", line 980, in close
    await self._connector.close()
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/site-packages/aiohttp/connector.py", line 792, in close
    return super().close()
           ^^^^^^^^^^^^^^^
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/site-packages/aiohttp/connector.py", line 409, in close
    self._close()
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/site-packages/aiohttp/connector.py", line 439, in _close
    transport.abort()
  File "/Users/dave/miniforge3/envs/example-get-started-experiments/lib/python3.11/asyncio/sslproto.py", line 247, in abort
    self._ssl_protocol._abort()
    ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute '_abort'
Aiohttp retry client was not closed

Expected

Success, and a better handled error if it fails (related to #9623).

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.5.0
------------------
Platform: Python 3.11.3 on macOS-13.4.1-arm64-arm-64bit
Subprojects:
        dvc_data = 2.5.0
        dvc_objects = 0.23.0
        dvc_render = 0.5.3
        dvc_task = 0.3.0
        scmrepo = 1.0.4
Supports:
        azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.13.0),
        gdrive (pydrive2 = 1.16.0),
        gs (gcsfs = 2023.6.0),
        hdfs (fsspec = 2023.6.0, pyarrow = 12.0.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        oss (ossfs = 2021.8.0),
        s3 (s3fs = 2023.6.0, boto3 = 1.26.161),
        ssh (sshfs = 2023.4.1),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8),
        webhdfs (fsspec = 2023.6.0)
Config:
        Global: /Users/dave/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: local
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/17f51977f7b189bb640c52da9d84e922

Metadata

Metadata

Assignees

Labels

A: data-syncRelated to dvc get/fetch/import/pull/pushfs: httpRelated to the HTTP filesystemp1-importantImportant, aka current backlog of things to doregressionOhh, we broke something :-(

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions