Skip to content

Add functionality to configure TorchServe logging levels using the TS_LOG_LEVEL environment variable. #168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

lavaraja
Copy link

@lavaraja lavaraja commented Mar 9, 2025

Issue #, if available: : #83

Description of changes:

In some cases, excessive logging is contributing to CloudWatch logging costs. This change allows users to control the logging verbosity, potentially reducing costs while maintaining the ability to increase verbosity for debugging when needed.

Changes :

  • Added a new function 'configure_logging()' to dynamically set log levels
  • Utilize TS_LOG_LEVEL environment variable to control logging verbosity
  • Supported log levels: OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE. Re-mapped the TS_LOG_LEVEL values to corresponding integer values as follows.
 log_levels = {
         '0': 'off',
         '10': 'fatal',
         '20': 'error',
         '30': 'warn',
         '40': 'info',
         '50': 'debug',
         '60': 'trace'
     }
  • Modify log4j2.xml file using sed command based on TS_LOG_LEVEL
  • Handle potential errors during log configuration gracefully
  • Call configure_logging() before starting TorchServe
  • Aim to reduce excessive logging and associated CloudWatch costs
  • Maintain default logging if TS_LOG_LEVEL is not set or invalid

Tests:

  • I've added unit tests in test_log_config.py to cover various scenarios including valid log levels, invalid log levels, and error conditions. All tests are passing.
% python -m unittest discover -v                                               
test_invalid_log_level (test_log_config.TestLogConfig.test_invalid_log_level) ... Current script path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/serving.py
log4j2.xml path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/etc/log4j2.xml
ok
test_log4j2_file_not_found (test_log_config.TestLogConfig.test_log4j2_file_not_found) ... Current script path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/serving.py
log4j2.xml path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/etc/log4j2.xml
ok
test_no_log_level_set (test_log_config.TestLogConfig.test_no_log_level_set) ... ok
test_subprocess_error (test_log_config.TestLogConfig.test_subprocess_error) ... Current script path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/serving.py
log4j2.xml path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/etc/log4j2.xml
ok
test_valid_log_level (test_log_config.TestLogConfig.test_valid_log_level) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.002s

OK

Steps to test on Pytorch container :

  • Extend existing Pytorch container.
from 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1-cpu-py310
RUN pip install git+https://github.com/lavaraja/sagemaker-pytorch-inference-toolkit.git 
  • Build the image.
    docker build .
  • Tag the image.
    docker tag <image_id> 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1-cpu-py310-extended
  • Use above image and pass TS_LOG_LEVEL as environment variable in the model class. Used this sample example for testing locally.
  • modify pytorch_script_mode_local_model_inference.py and use custom built container as image_uri
    model = PyTorchModel(
        role=role,
        model_data=model_dir,
       # framework_version='2.1',
       # py_version='py310',
        image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1-cpu-py310-extended",
        entry_point='inference.py',
        env={'TS_LOG_LEVEL': log_level}
    )
  • Run python pytorch_script_mode_local_model_inference.py to start the container locally and run the inference.
  • Observe the TS_LOG_LEVEL in effect during the torch serve start process. The same logs will be emitted to customer cloud watch logs when deployed on Sagemaker.

test_output_with_diff_loglevels.log

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

lavaraja added 9 commits March 2, 2025 16:47
- Added a new function 'configure_logging()' to dynamically set log levels
- Utilize TS_LOGLEVEL environment variable to control logging verbosity
- Support log levels: OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE
- Modify log4j2.xml file using sed command based on TS_LOGLEVEL
- Handle potential errors during log configuration gracefully
- Call configure_logging() before starting TorchServe
- Aim to reduce excessive logging and associated CloudWatch costs
- Maintain default logging if TS_LOGLEVEL is not set or invalid
Added missing import.
updating the log4j2 path.
using absolute path to prevent file not found errors.
@lavaraja lavaraja changed the title Add functionality to configure TorchServe logging levels using the TS_LOGLEVEL environment variable. Add functionality to configure TorchServe logging levels using the TS_LOG_LEVEL environment variable. Mar 9, 2025
@lavaraja
Copy link
Author

lavaraja commented Mar 9, 2025

Updated variable name from TS_LOGLEVEL to TS_LOG_LEVEL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant