-
Notifications
You must be signed in to change notification settings - Fork 476
Description
We are running the functions host in a container on a Kubernetes cluster, and are using Kubernetes secrets to hold the function app host keys (ie environment variable AzureWebJobsSecretStorageType is set to kubernetes). We are seeing high CPU immediately when the function app starts up, which ramps up to 100% CPU. Oddly enough this high CPU behaviour seems to happen only around 50% of the time, but it seems fairly consistent across repeated deployments.
During such high-CPU activity, invoking the dotnet-gcdump command on the affected pod shows a large number of allocations for the type AutoRecoveringFileSystemWatcher, which originate from the class SimpleKubernetesClient.
It seems to me that this behaviour might be caused by the method SimpleKubernetesClient.RunWatcher, which runs in a loop and continually creates new AutoRecoveringFileSystemWatcher instances over and over again:
private async Task RunWatcher()
{
while (!_disposed)
{
// watch API requests terminate after 4 minutes
await RunWatcherInternal();
}
}RunWatcherInternal instantiates a new AutoRecoveringFileSystemWatcher for each call when the environment variable AzureWebJobsSecretStorageType equals kubernetes and the environment variable AzureWebJobsKubernetesSecretName is not defined.
I imagine that this while loop is not needed as it's already covered in RunWatcherInternal for obtaining secrets via the Kubernetes API, which polls every second:
private async Task RunWatcherInternal()
{
if (string.IsNullOrEmpty(KubernetesObjectName) && FileUtility.DirectoryExists(KubernetesSecretsDir))
{
// ***************************************************************************************************
// This keeps getting called over and over. Only needs to be called once, if `_fileWatcher` is not set.
// ***************************************************************************************************
_fileWatcher = new AutoRecoveringFileSystemWatcher(KubernetesSecretsDir);
_fileWatcher.Changed += (object sender, FileSystemEventArgs e)
=> _watchCallback?.Invoke();
}
else if (!string.IsNullOrEmpty(KubernetesObjectName))
{
(var url, _) = await GetObjectUrl(KubernetesObjectName, watchUrl: true);
using (var noTimeoutClient = CreateHttpClient())
using (var request = await GetRequest(HttpMethod.Get, url))
{
while (!_disposed)
{
noTimeoutClient.Timeout = TimeSpan.FromMilliseconds(Timeout.Infinite);
using (var response = await noTimeoutClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead))
using (var reader = new StreamReader(await response.Content.ReadAsStreamAsync()))
{
while (!reader.EndOfStream && !_disposed)
{
reader.ReadLine(); // Read the line-json update
_watchCallback?.Invoke();
}
}
await Task.Delay(TimeSpan.FromSeconds(1));
}
}
}
}Repro steps
- Create a Dockerfile based on
mcr.microsoft.com/azure-functions/dotnet:4-dotnet8.0 - Configure
AzureWebJobsSecretStorageTypeenvironment variable to equalkubernetes, and do not have an environment variableAzureWebJobsKubernetesSecretNamedefined - Configure secrets to be located in the
/run/secrets/function-keysfolder (i.e. make sure the folder exists on the container filesystem) - Deploy image to Kubernetes
- Observe that CPU runs high and excessive
AutoRecoveringFileSystemWatcherinstance are being allocated
Expected behavior
Only one AutoRecoveringFileSystemWatcher instance should be allocated for an instance of SimpleKubernetesClient, when environment variable AzureWebJobsKubernetesSecretName is not defined, and environment variable AzureWebJobsSecretStorageType has value kubernetes.
Actual behavior
A large amount of AutoRecoveringFileSystemWatcher instances are continually being created and destroyed, creating GC pressure and potentially high CPU load.
Known workarounds
No known workarounds for this particular combination of AzureWebJobsSecretStorageType and AzureWebJobsKubernetesSecretName.