Skip to content

Conversation

@mlin
Copy link
Collaborator

@mlin mlin commented Apr 12, 2025

No description provided.

@mlin
Copy link
Collaborator Author

mlin commented Apr 12, 2025

Since the high-level create method doesn't expose a way to directly insert DeviceRequests into the ContainerSpec it builds, the most straightforward way to achieve your goal is to use the low-level API method client.api.create_service(). This method does expect the fully formed data structure, including the TaskTemplate.

Here's the corrected example using client.api.create_service():

import docker
from docker.types import (TaskTemplate, ContainerSpec, Resources,
                        ServiceMode, RestartPolicy, EndpointSpec) # Import necessary types

# Assume 'client' is an initialized DockerClient instance
# client = docker.from_env()

# 1. Define DeviceRequests
device_requests = [
    {
        "Driver": "nvidia",
        "Count": -1,
        "DeviceIDs": [],
        "Capabilities": [["gpu", "nvidia", "compute", "utility"]],
        "Options": {}
    }
]

# 2. Create ContainerSpec dictionary manually
container_spec_dict = {
    'Image': 'your_gpu_image:latest',
    'Command': ['your', 'command'],
    'DeviceRequests': device_requests,
    'TTY': True,
    # Add other ContainerSpec fields...
}

# 3. Create TaskTemplate dictionary manually
task_template_dict = {
    'ContainerSpec': container_spec_dict,
    # Add other TaskTemplate fields...
    # 'Resources': Resources(mem_limit='4g'),
    # 'RestartPolicy': RestartPolicy(condition='on-failure', max_attempts=3),
}

# --- Define other service-level parameters ---
service_name = 'my-gpu-service'
service_labels = {'app': 'gpu-processor'}
service_mode = ServiceMode('replicated', replicas=1)
# endpoint_spec = EndpointSpec(ports={ 8080: (80, 'tcp') }) # Example

# 4. Call client.api.create_service()
try:
    # Note: Using client.api here, not client.services
    result = client.api.create_service(
        task_template=task_template_dict, # Pass the full task template dict
        name=service_name,
        labels=service_labels,
        mode=service_mode,
        # endpoint_spec=endpoint_spec, # Pass other top-level specs
        # update_config=...,
        # rollback_config=...,
        # networks=...
    )
    service_id = result['ID']
    print(f"Service {service_id} created successfully.")
    # You can get the high-level model object if needed:
    # service = client.services.get(service_id)

except docker.errors.APIError as e:
    print(f"Error creating service using low-level API: {e}")
    print(f"Response: {e.response.text}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

@coveralls
Copy link

coveralls commented Apr 26, 2025

Pull Request Test Coverage Report for Build 14677860390

Details

  • 101 of 114 (88.6%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.04%) to 95.213%

Changes Missing Coverage Covered Lines Changed/Added Lines %
WDL/runtime/task_container.py 1 5 20.0%
WDL/runtime/backend/docker_swarm.py 100 109 91.74%
Totals Coverage Status
Change from base Build 14677854856: -0.04%
Covered Lines: 7459
Relevant Lines: 7834

💛 - Coveralls

@mlin mlin requested a review from Copilot April 26, 2025 04:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces GPU support improvements by adding new API stubs and updating runtime backends to enable NVIDIA GPU integration. Key changes include:

  • Adding new docker API and related container specification classes.
  • Introducing acceleratorCount handling in task_container to map to GPU support.
  • Updating Singularity and Podman backend invocation logic to include GPU-specific flags.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

File Description
stubs/docker/init.py Added APIClient methods and new container-related classes for Docker.
WDL/runtime/task_container.py Added acceleratorCount handling to enable GPU support in runtime tasks.
WDL/runtime/backend/singularity.py Updated GPU flag support by adding the --nv flag.
WDL/runtime/backend/podman.py Updated GPU flag support by appending Podman GPU device options.
Comments suppressed due to low confidence (3)

stubs/docker/init.py:116

  • [nitpick] Consider adding a docstring to ContainerSpec to clarify its intended structure and usage.
class ContainerSpec(dict):

WDL/runtime/backend/singularity.py:91

  • [nitpick] Ensure that using '--nv' properly configures GPU support for Singularity and consider adding a note on its compatibility and limitations if needed.
if self.runtime_values.get("gpu", False):

WDL/runtime/backend/podman.py:92

  • [nitpick] Verify that '--device nvidia.com/gpu=all' is supported in the target Podman versions and document any potential limitations.
if self.runtime_values.get("gpu", False):

ans["gpu"] = runtime_eval["gpu"].value

if "acceleratorCount" in runtime_eval:
# HealthOmics-style acceleratorCount:1 to gpu:true (FIXME for proper multi-GPU support)
Copy link

Copilot AI Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The current workaround for mapping 'acceleratorCount' to GPU support is temporary; consider referencing an issue or adding documentation to track future improvements.

Suggested change
# HealthOmics-style acceleratorCount:1 to gpu:true (FIXME for proper multi-GPU support)
# HealthOmics-style acceleratorCount:1 to gpu:true
# TODO: Reference issue #1234 for proper multi-GPU support and extend this logic
# to handle scenarios where multiple GPUs are required. This is a temporary
# workaround and assumes a single GPU is sufficient.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants