Skip to content

Local mode does not work on EC2 instances #3225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MatthewCaseres opened this issue Jul 11, 2022 · 3 comments
Closed

Local mode does not work on EC2 instances #3225

MatthewCaseres opened this issue Jul 11, 2022 · 3 comments

Comments

@MatthewCaseres
Copy link

Describe the bug
This is the AMI that I am using -
torch-ubuntu

I installed docker-compose by setting up the repository as described here - https://docs.docker.com/engine/install/ubuntu/

It is telling me ImportError: 'docker-compose' is not installed.

To reproduce
Use the same EC2 AMI, install docker-compose, and attempt to run PyTorchProcessor in local mode.

Expected behavior
docker-compose is installed so it should not tell me that there is an error.

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

Job Name:  local_processor_constructor-2022-07-11-18-52-12-390
Inputs:  [{'InputName': 'code', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-507925425112/local_processor_constructor-2022-07-11-18-52-12-390/source/sourcedir.tar.gz', 'LocalPath': '/opt/ml/processing/input/code/', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'entrypoint', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-507925425122/local_processor_constructor-2022-07-11-18-52-12-390/source/runproc.sh', 'LocalPath': '/opt/ml/processing/input/entrypoint', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  []
Traceback (most recent call last):
  File "/home/ubuntu/torch/runner.py", line 14, in <module>
    torch_procesor.run(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/processing.py", line 1608, in run
    return super().run(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py", line 209, in wrapper
    return run_func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/processing.py", line 554, in run
    self.latest_job = ProcessingJob.start_new(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/processing.py", line 778, in start_new
    processor.sagemaker_session.process(**process_args)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/session.py", line 943, in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/session.py", line 4230, in _intercept_create_request
    return create(request)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/session.py", line 941, in submit
    self.sagemaker_client.create_processing_job(**request)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/local/local_session.py", line 115, in create_processing_job
    container = _SageMakerContainer(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/sagemaker/local/image.py", line 91, in __init__
    raise ImportError(
ImportError: 'docker-compose' is not installed. Local Mode features will not work without docker-compose. For more information on how to install 'docker-compose', please, see https://docs.docker.com/compose/install/

System information

This is the code -

from sagemaker.pytorch import PyTorchProcessor

torch_procesor = PyTorchProcessor(
    framework_version="1.9.0",
    role="arn:aws:iam::507925425112:role/sagemaker-studio-execution-role",
    instance_count=1,
    instance_type="local",
    py_version="py38",
    base_job_name="local_processor_constructor",
)

torch_procesor.run(
    code="sleeper.py",
    source_dir=".",
)

This is my environment

PyTorch version: 1.12.0
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.3
Libc version: glibc-2.31

Python version: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)  [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-5.13.0-1031-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==1.12.0
[pip3] torch-model-archiver==0.5.3b20220226
[pip3] torch-workflow-archiver==0.2.4b20220513
[pip3] torchaudio==0.12.0
[pip3] torchdata==0.4.0
[pip3] torchserve==0.6.0b20220513
[pip3] torchtext==0.13.0
[pip3] torchvision==0.13.0
[conda] blas                      2.115                       mkl    conda-forge
[conda] blas-devel                3.9.0            15_linux64_mkl    conda-forge
[conda] captum                    0.5.0                         0    pytorch
[conda] cudatoolkit               11.6.0              hecad31d_10    conda-forge
[conda] libblas                   3.9.0            15_linux64_mkl    conda-forge
[conda] libcblas                  3.9.0            15_linux64_mkl    conda-forge
[conda] liblapack                 3.9.0            15_linux64_mkl    conda-forge
[conda] liblapacke                3.9.0            15_linux64_mkl    conda-forge
[conda] magma-cuda116             2.6.1                         0    pytorch
[conda] mkl                       2022.1.0           h84fe81f_915    conda-forge
[conda] mkl-devel                 2022.1.0           ha770c72_916    conda-forge
[conda] mkl-include               2022.1.0           h84fe81f_915    conda-forge
[conda] numpy                     1.22.4                   pypi_0    pypi
[conda] pytorch                   1.12.0          py3.9_cuda11.6_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch-model-archiver      0.5.3                    py39_0    pytorch
[conda] torch-workflow-archiver   0.2.4                    py39_0    pytorch
[conda] torchaudio                0.12.0               py39_cu116    pytorch
[conda] torchserve                0.6.0                    py39_0    pytorch
[conda] torchtext                 0.13.0                     py39    pytorch
[conda] torchvision               0.13.0               py39_cu116    pytorch
@satyajitghana
Copy link

same here

@MatthewCaseres
Copy link
Author

@satyajitghana The command went from "docker-compose" -> "docker compose" when docker compose went from 1.x.x to 2.x.x. You can pip install 1.x.x https://pypi.org/project/docker-compose/. I'm curious if that works for you.

@trungleduc
Copy link
Collaborator

Fixed in #4111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants