Skip to content

Excessive allocations of AutoRecoveringFileSystemWatcher leading to high GC activity & CPU #11554

@ryanwilliams-nintex

Description

@ryanwilliams-nintex

We are running the functions host in a container on a Kubernetes cluster, and are using Kubernetes secrets to hold the function app host keys (ie environment variable AzureWebJobsSecretStorageType is set to kubernetes). We are seeing high CPU immediately when the function app starts up, which ramps up to 100% CPU. Oddly enough this high CPU behaviour seems to happen only around 50% of the time, but it seems fairly consistent across repeated deployments.

During such high-CPU activity, invoking the dotnet-gcdump command on the affected pod shows a large number of allocations for the type AutoRecoveringFileSystemWatcher, which originate from the class SimpleKubernetesClient.

It seems to me that this behaviour might be caused by the method SimpleKubernetesClient.RunWatcher, which runs in a loop and continually creates new AutoRecoveringFileSystemWatcher instances over and over again:

https://github.com/Azure/azure-functions-host/blob/1184081de217265998c2e9dd52044bd79bac4da8/src/WebJobs.Script.WebHost/Security/KeyManagement/SimpleKubernetesClient.cs#L97C1-L104C10

private async Task RunWatcher()
{
    while (!_disposed)
    {
        // watch API requests terminate after 4 minutes
        await RunWatcherInternal();
    }
}

RunWatcherInternal instantiates a new AutoRecoveringFileSystemWatcher for each call when the environment variable AzureWebJobsSecretStorageType equals kubernetes and the environment variable AzureWebJobsKubernetesSecretName is not defined.

I imagine that this while loop is not needed as it's already covered in RunWatcherInternal for obtaining secrets via the Kubernetes API, which polls every second:

private async Task RunWatcherInternal()
{
    if (string.IsNullOrEmpty(KubernetesObjectName) && FileUtility.DirectoryExists(KubernetesSecretsDir))
    {
        // ***************************************************************************************************
        // This keeps getting called over and over. Only needs to be called once, if `_fileWatcher` is not set.
        // ***************************************************************************************************

        _fileWatcher = new AutoRecoveringFileSystemWatcher(KubernetesSecretsDir);
        _fileWatcher.Changed += (object sender, FileSystemEventArgs e)
            => _watchCallback?.Invoke();
    }
    else if (!string.IsNullOrEmpty(KubernetesObjectName))
    {
        (var url, _) = await GetObjectUrl(KubernetesObjectName, watchUrl: true);
        using (var noTimeoutClient = CreateHttpClient())
        using (var request = await GetRequest(HttpMethod.Get, url))
        {
            while (!_disposed)
            {
                noTimeoutClient.Timeout = TimeSpan.FromMilliseconds(Timeout.Infinite);
                using (var response = await noTimeoutClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead))
                using (var reader = new StreamReader(await response.Content.ReadAsStreamAsync()))
                {
                    while (!reader.EndOfStream && !_disposed)
                    {
                        reader.ReadLine(); // Read the line-json update
                        _watchCallback?.Invoke();
                    }
                }
                await Task.Delay(TimeSpan.FromSeconds(1));
            }
        }
    }
}

Repro steps

  1. Create a Dockerfile based on mcr.microsoft.com/azure-functions/dotnet:4-dotnet8.0
  2. Configure AzureWebJobsSecretStorageType environment variable to equal kubernetes, and do not have an environment variable AzureWebJobsKubernetesSecretName defined
  3. Configure secrets to be located in the /run/secrets/function-keys folder (i.e. make sure the folder exists on the container filesystem)
  4. Deploy image to Kubernetes
  5. Observe that CPU runs high and excessive AutoRecoveringFileSystemWatcher instance are being allocated

Expected behavior

Only one AutoRecoveringFileSystemWatcher instance should be allocated for an instance of SimpleKubernetesClient, when environment variable AzureWebJobsKubernetesSecretName is not defined, and environment variable AzureWebJobsSecretStorageType has value kubernetes.

Actual behavior

A large amount of AutoRecoveringFileSystemWatcher instances are continually being created and destroyed, creating GC pressure and potentially high CPU load.

Known workarounds

No known workarounds for this particular combination of AzureWebJobsSecretStorageType and AzureWebJobsKubernetesSecretName.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions