nvidia gpu #751

mlin · 2025-04-12T09:34:03Z

No description provided.

mlin · 2025-04-12T10:02:13Z

Since the high-level create method doesn't expose a way to directly insert DeviceRequests into the ContainerSpec it builds, the most straightforward way to achieve your goal is to use the low-level API method client.api.create_service(). This method does expect the fully formed data structure, including the TaskTemplate.

Here's the corrected example using client.api.create_service():

import docker
from docker.types import (TaskTemplate, ContainerSpec, Resources,
                        ServiceMode, RestartPolicy, EndpointSpec) # Import necessary types

# Assume 'client' is an initialized DockerClient instance
# client = docker.from_env()

# 1. Define DeviceRequests
device_requests = [
    {
        "Driver": "nvidia",
        "Count": -1,
        "DeviceIDs": [],
        "Capabilities": [["gpu", "nvidia", "compute", "utility"]],
        "Options": {}
    }
]

# 2. Create ContainerSpec dictionary manually
container_spec_dict = {
    'Image': 'your_gpu_image:latest',
    'Command': ['your', 'command'],
    'DeviceRequests': device_requests,
    'TTY': True,
    # Add other ContainerSpec fields...
}

# 3. Create TaskTemplate dictionary manually
task_template_dict = {
    'ContainerSpec': container_spec_dict,
    # Add other TaskTemplate fields...
    # 'Resources': Resources(mem_limit='4g'),
    # 'RestartPolicy': RestartPolicy(condition='on-failure', max_attempts=3),
}

# --- Define other service-level parameters ---
service_name = 'my-gpu-service'
service_labels = {'app': 'gpu-processor'}
service_mode = ServiceMode('replicated', replicas=1)
# endpoint_spec = EndpointSpec(ports={ 8080: (80, 'tcp') }) # Example

# 4. Call client.api.create_service()
try:
    # Note: Using client.api here, not client.services
    result = client.api.create_service(
        task_template=task_template_dict, # Pass the full task template dict
        name=service_name,
        labels=service_labels,
        mode=service_mode,
        # endpoint_spec=endpoint_spec, # Pass other top-level specs
        # update_config=...,
        # rollback_config=...,
        # networks=...
    )
    service_id = result['ID']
    print(f"Service {service_id} created successfully.")
    # You can get the high-level model object if needed:
    # service = client.services.get(service_id)

except docker.errors.APIError as e:
    print(f"Error creating service using low-level API: {e}")
    print(f"Response: {e.response.text}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

coveralls · 2025-04-26T04:57:42Z

Pull Request Test Coverage Report for Build 14677860390

Details

101 of 114 (88.6%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.04%) to 95.213%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
WDL/runtime/task_container.py	1	5	20.0%
WDL/runtime/backend/docker_swarm.py	100	109	91.74%

Totals
Change from base Build 14677854856:	-0.04%
Covered Lines:	7459
Relevant Lines:	7834

💛 - Coveralls

Copilot

Pull Request Overview

This PR introduces GPU support improvements by adding new API stubs and updating runtime backends to enable NVIDIA GPU integration. Key changes include:

Adding new docker API and related container specification classes.
Introducing acceleratorCount handling in task_container to map to GPU support.
Updating Singularity and Podman backend invocation logic to include GPU-specific flags.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

File	Description
stubs/docker/init.py	Added APIClient methods and new container-related classes for Docker.
WDL/runtime/task_container.py	Added acceleratorCount handling to enable GPU support in runtime tasks.
WDL/runtime/backend/singularity.py	Updated GPU flag support by adding the `--nv` flag.
WDL/runtime/backend/podman.py	Updated GPU flag support by appending Podman GPU device options.

Comments suppressed due to low confidence (3)

stubs/docker/init.py:116

[nitpick] Consider adding a docstring to ContainerSpec to clarify its intended structure and usage.

class ContainerSpec(dict):

WDL/runtime/backend/singularity.py:91

[nitpick] Ensure that using '--nv' properly configures GPU support for Singularity and consider adding a note on its compatibility and limitations if needed.

if self.runtime_values.get("gpu", False):

WDL/runtime/backend/podman.py:92

[nitpick] Verify that '--device nvidia.com/gpu=all' is supported in the target Podman versions and document any potential limitations.

if self.runtime_values.get("gpu", False):

Copilot · 2025-04-26T04:58:37Z

WDL/runtime/task_container.py

            ans["gpu"] = runtime_eval["gpu"].value

+        if "acceleratorCount" in runtime_eval:
+            # HealthOmics-style acceleratorCount:1 to gpu:true (FIXME for proper multi-GPU support)


[nitpick] The current workaround for mapping 'acceleratorCount' to GPU support is temporary; consider referencing an issue or adding documentation to track future improvements.

Suggested change

# HealthOmics-style acceleratorCount:1 to gpu:true (FIXME for proper multi-GPU support)

# HealthOmics-style acceleratorCount:1 to gpu:true

# TODO: Reference issue #1234 for proper multi-GPU support and extend this logic

# to handle scenarios where multiple GPUs are required. This is a temporary

# workaround and assumes a single GPU is sufficient.

mlin added 2 commits April 11, 2025 22:26

wip

cf184a1

wip

50941da

mlin added 9 commits April 24, 2025 17:08

refactor swarm initialization

79e4032

codex wip

0a94c9f

codex wip

e00f365

wip

73afd46

wip

81939c5

wip

3c5a64a

wip

350ae97

wip

49221f5

Merge remote-tracking branch 'origin/main' into mlin/nvidia

01858db

mlin requested a review from Copilot April 26, 2025 04:57

Copilot AI reviewed Apr 26, 2025

View reviewed changes

runtime

a64221e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nvidia gpu #751

nvidia gpu #751

Uh oh!

mlin commented Apr 12, 2025

Uh oh!

mlin commented Apr 12, 2025

Uh oh!

coveralls commented Apr 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-            # HealthOmics-style acceleratorCount:1 to gpu:true (FIXME for proper multi-GPU support)
+            # HealthOmics-style acceleratorCount:1 to gpu:true
+            # TODO: Reference issue #1234 for proper multi-GPU support and extend this logic
+            # to handle scenarios where multiple GPUs are required. This is a temporary
+            # workaround and assumes a single GPU is sufficient.

nvidia gpu #751

Are you sure you want to change the base?

nvidia gpu #751

Uh oh!

Conversation

mlin commented Apr 12, 2025

Uh oh!

mlin commented Apr 12, 2025

Uh oh!

coveralls commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 14677860390

Details

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Apr 26, 2025 •

edited

Loading