Skip to content

Commit 9a24052

Browse files
authored
Ignore zombie processes when detecting TorchServe status (#166)
* Ignore processes that are not running when detecting TorchServe status * Detect zombie processes and ignore them before calling cmdline API * Update PT DLC framework version to 2.1.0 and 2.2.0 * Update instance type to ensure newer CUDA driver version * upgrade DLAMI version for tests * Revert "Update instance type to ensure newer CUDA driver version" This reverts commit bba00bd.
1 parent 36a842e commit 9a24052

File tree

7 files changed

+11
-8
lines changed

7 files changed

+11
-8
lines changed

buildspec.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ version: 0.2
22

33
env:
44
variables:
5-
FRAMEWORK_VERSIONS: '2.0.0 2.0.1'
5+
FRAMEWORK_VERSIONS: '2.1.0 2.2.0'
66
CPU_INSTANCE_TYPE: 'ml.c4.xlarge'
77
GPU_INSTANCE_TYPE: 'ml.g4dn.12xlarge'
88
ECR_REPO: 'sagemaker-test'
@@ -49,12 +49,12 @@ phases:
4949
- $(aws ecr get-login --registry-ids $DLC_ACCOUNT --no-include-email --region $AWS_DEFAULT_REGION)
5050
- create-key-pair
5151

52-
# launch remote GPU instance with Deep Learning AMI GPU PyTorch 1.9 (Ubuntu 20.04)
52+
# launch remote GPU instance with Deep Learning AMI GPU PyTorch 2.2 (Ubuntu 20.04)
5353
# build DLC GPU image because the base DLC image is too big and takes too long to build as part of the test
5454
- |
5555
for FRAMEWORK_VERSION in $FRAMEWORK_VERSIONS;
5656
do
57-
launch-ec2-instance --instance-type $instance_type --ami-name ami-03e3ef8c92fdb39ad;
57+
launch-ec2-instance --instance-type $instance_type --ami-name ami-081c4092fbff425f0;
5858
DLC_GPU_TAG="$FRAMEWORK_VERSION-dlc-gpu-$BUILD_ID";
5959
build_dir="test/container/$FRAMEWORK_VERSION";
6060
docker build -f "$build_dir/Dockerfile.dlc.gpu" -t $PREPROD_IMAGE:$DLC_GPU_TAG --build-arg region=$AWS_DEFAULT_REGION .;

src/sagemaker_pytorch_serving_container/torchserve.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,9 @@ def _retrieve_ts_server_process():
184184
ts_server_processes = list()
185185

186186
for process in psutil.process_iter():
187+
if process.status() == psutil.STATUS_ZOMBIE:
188+
continue
189+
187190
if TS_NAMESPACE in process.cmdline():
188191
ts_server_processes.append(process)
189192

test/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def pytest_addoption(parser):
5252
parser.addoption('--instance-type')
5353
parser.addoption('--docker-base-name', default='sagemaker-pytorch-inference')
5454
parser.addoption('--region', default='us-west-2')
55-
parser.addoption('--framework-version', default="2.0.0")
55+
parser.addoption('--framework-version', default="2.1.0")
5656
parser.addoption('--py-version', choices=['2', '3'], default='3')
5757
parser.addoption('--processor', choices=['gpu', 'cpu'], default='cpu')
5858
# If not specified, will default to {framework-version}-{processor}-py{py-version}

test/container/2.0.0/Dockerfile.dlc.cpu renamed to test/container/2.1.0/Dockerfile.dlc.cpu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
ARG region
2-
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-inference:2.0.0-cpu-py310-ubuntu20.04-sagemaker
2+
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-inference:2.1.0-cpu-py310-ubuntu20.04-sagemaker
33

44
COPY dist/sagemaker_pytorch_inference-*.tar.gz /sagemaker_pytorch_inference.tar.gz
55

test/container/2.0.0/Dockerfile.dlc.gpu renamed to test/container/2.1.0/Dockerfile.dlc.gpu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
ARG region
2-
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-inference:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker
2+
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker
33

44
COPY dist/sagemaker_pytorch_inference-*.tar.gz /sagemaker_pytorch_inference.tar.gz
55

test/container/2.0.1/Dockerfile.dlc.cpu renamed to test/container/2.2.0/Dockerfile.dlc.cpu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
ARG region
2-
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-inference:2.0.1-cpu-py310-ubuntu20.04-sagemaker
2+
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-inference:2.2.0-cpu-py310-ubuntu20.04-sagemaker
33

44
COPY dist/sagemaker_pytorch_inference-*.tar.gz /sagemaker_pytorch_inference.tar.gz
55

test/container/2.0.1/Dockerfile.dlc.gpu renamed to test/container/2.2.0/Dockerfile.dlc.gpu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
ARG region
2-
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-inference:2.0.1-gpu-py310-cu118-ubuntu20.04-sagemaker
2+
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-inference:2.2.0-gpu-py310-cu118-ubuntu20.04-sagemaker
33

44
COPY dist/sagemaker_pytorch_inference-*.tar.gz /sagemaker_pytorch_inference.tar.gz
55

0 commit comments

Comments
 (0)