Skip to content

[API Proposal]: Kubernetes specialized Resource Monitoring #6496

@amadeuszl

Description

@amadeuszl

Brought it here for clarity from comment down below.

Background and motivation

The current implementation of Microsoft.Extensions.Diagnostics.ResourceMonitoring does not support requests.cpu and requests.memory on Windows containers, even though these values are typically available in Kubernetes environments.

  • For limits, it currently reads from the Job Object, which is appropriate for non-Kubernetes environments.
  • In Kubernetes, it would be more accurate to read limits and requests directly from the cluster metadata.
  • On Linux, requests.memory is also currently missing.

In Kubernetes, both requests and limits can be exposed via environment variables if cluster is configured to use Downward API. If the cluster is configured accordingly and assuming that engineer knows that and knows the names of the environment variables, ResourceMonitoring could be extended to use these values utilization calculations.
Example yaml with Downward API that injects values into pods' environment variables

 env:
    - name: MY_CLUSTER_CPU_REQUEST
      valueFrom:
        resourceFieldRef:
          resource: requests.cpu
    - name: MY_CLUSTER_CPU_LIMIT
      valueFrom:
        resourceFieldRef:
          resource: limits.cpu
    - name: MY_CLUSTER_MEMORY_REQUEST
      valueFrom:
        resourceFieldRef:
          resource: requests.memory
    - name: MY_CLUSTER_MEMORY_LIMIT
      valueFrom:
        resourceFieldRef:
          resource: limits.memory

API Proposal

Overview

  1. Introduction of a new packages
  • Microsoft.Extensions.Diagnostics.ResourceMonitoring.Kubernetes
  • Microsoft.Extensions.Diagnostics.ResourceMonitoring.Abstractions
  1. Reference Microsoft.Extensions.ResourceMonitoring.Abstractions in Microsoft.Extensions.Diagnostics.ResourceMonitoring
  2. Abstractions will bring IResourceQuotaProvider interface that will be used to decide on memory/cpu requests/limits used in resource monitoring calculations.

New package Microsoft.Extensions.Diagnostics.ResourceMonitoring.Abstractions

We add new package Microsoft.Extensions.Diagnostics.ResourceMonitoring.Abstractions. Following standard .NET pattern, we will introduce Abstractions package that will contain abstractions that ResourceMonitoring is dependent on.
It brings:

namespace Microsoft.Extensions.Diagnostics.ResourceMonitoring;

/// <summary>
/// Provides resource quota information for resource monitoring purposes.
/// </summary>
/// <remarks>
/// This interface defines a contract for retrieving resource quotas, which include
/// memory and CPU limits and requests that may be imposed by container orchestrators
/// or resource management systems.
/// </remarks>
public interface IResourceQuotaProvider
{
    /// <summary>
    /// Gets the current resource quota containing memory and CPU limits and requests.Returned <see cref="ResourceQuota"/> is used in resource monitoring calculations.
    /// </summary>
    /// <returns>
    /// A <see cref="ResourceQuota"/> instance containing the current resource constraints
    /// including memory limits, CPU limits, memory requests, and CPU requests.
    /// </returns>
    ResourceQuota GetResourceQuota();
}

Above interface is used in resource monitoring to obtain ResourceQuota, which is just data structure holding relevant data for calculations.

namespace Microsoft.Extensions.Diagnostics.ResourceMonitoring;

/// <summary>
/// Represents resource quota information for a container, including CPU and memory limits and requests.
/// Limits define the maximum resources a container can use, while requests specify the minimum guaranteed resources.
/// </summary>
public class ResourceQuota
{
    /// <summary>
    /// Gets or sets the resource memory limit the container is allowed to use.
    /// </summary>
    public ulong LimitsMemory { get; set; }

    /// <summary>
    /// Gets or sets the resource CPU limit the container is allowed to use.
    /// </summary>
    public double LimitsCpu { get; set; }

    /// <summary>
    /// Gets or sets the resource memory request the container is allowed to use.
    /// </summary>
    public ulong RequestsMemory { get; set; }

    /// <summary>
    /// Gets or sets the resource CPU request the container is allowed to use.
    /// </summary>
    public double RequestsCpu { get; set; }
}

New package Microsoft.Extensions.Diagnostics.ResourceMonitoring.Kubernetes

We add new package Microsoft.Extensions.Diagnostics.ResourceMonitoring.Kubernetes. It will implement IResourceQuotaProvider, but it will read container's environment variables that suppose to hold cpu/memory requests/limits, once properly configured. IResourceQuotaProvider is added by following method:

namespace Microsoft.Extensions.DependencyInjection;

    /// <summary>
    /// Configures and adds an Kubernetes resource monitoring components to a service collection alltoghter with necessary basic resource monitoring components.
    /// </summary>
    /// <param name="services">The dependency injection container to add the Kubernetes resource monitoring to.</param>
    /// <param name="environmentVariablePrefix">Value of prefix used to read environment varialbes in the container.</param>
    /// <returns>The value of <paramref name="services" />.</returns>
    /// <remarks>
    /// <para>
    /// If you have configured your Kubernetes container with Downward API to add environment variable <c>MYCLUSTER_LIMITS_CPU</c> with CPU limits,
    /// then you should pass <c>MYCLUSTER_</c> to <paramref name="environmentVariablePrefix"/> parameter. Environment variables will be read during DI Container resolution.
    /// </para>
    /// <para>
    /// <strong>Important:</strong> Do not call <see cref="ResourceMonitoringServiceCollectionExtensions.AddResourceMonitoring(IServiceCollection)"/> 
    /// if you are using this method, as it already includes all necessary resource monitoring components and registers a Kubernetes-specific 
    /// <see cref="IResourceQuotaProvider"/> implementation. Calling both methods may result in conflicting service registrations.
    /// </para>
    /// </remarks>
    public static IServiceCollection AddKubernetesResourceMonitoring(
        this IServiceCollection services,
        string environmentVariablePrefix = "");

As described in the comment AddKubernetesResourceMonitoring() method call replaces AddResourceMonitoring(). It's more convenient and makes sure that it's called in proper order, as it impacts which implementation of IResourceQuotaProvider has priority.

Changes to Microsoft.Extensions.Diagnostics.ResourceMonitoring

No changes to API, only changes to implementation. It will now rely on abstractions to get limits/requests, it will have default implementations for Linux and WindowsContainer reusing old logic.

We should introduce metrics for requests. Here's comparison of metrics:

`container.cpu.time`, `container.cpu.limit.utilization`, `container.memory.limit.utilization`, `process.cpu.utilization`, `dotnet.process.memory.virtual.utilization`
+ `container.cpu.request.utilization`, `container.memory.request.utilization` 

API Usage

Example usage:

using Microsoft.Extensions.Diagnostics.ResourceMonitoring.Kubernetes;

var services = new ServiceCollection();
services.AddKubernetesResourceMonitoring("MYCLUSTER_");

Alternative Designs

We already moved to alternative design. So nothing new here.
Only thing we can decouple AddKubernetesResourceMonitoring() from AddResourceMonitoring() if we really need it. It would make sense, if more abstractions would be available, but so far there's nothing else to configure.

Risks

  1. This API retrieves resource quotas and requests/limits from Kubernetes Downward API environment variables. These values are only populated when the container starts and will not change if quotas are updated while the pod is running. It could be addressed in a future by using Downward API volumes instead of env variables.
  2. Code assumes that environment variable value for CPUs are given in milicores. It requires additional config on user's end.
                    "name": "COSMIC_REQUESTS_CPU",
                    "valueFrom":{
                      "resourceFieldRef":{
                        "containerName": "teams-r9-sampleservice",
                        "divisor": "1m", // NEW NECESSARY THING TO STRINGIFY VALUE INTO MILICORES
                        "resource": "requests.cpu"
                      }
                    }

Metadata

Metadata

Assignees

Labels

api-ready-for-reviewAPI is ready for formal API review - https://github.com/dotnet/apireviewsarea-resourcemonitoring

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions