Skip to content

Informer spams cluster API after restarting #1933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
feloy opened this issue Oct 14, 2024 · 3 comments
Closed

Informer spams cluster API after restarting #1933

feloy opened this issue Oct 14, 2024 · 3 comments

Comments

@feloy
Copy link
Contributor

feloy commented Oct 14, 2024

Describe the bug

Informer sends tens of watch requests per second to the cluster after the connection has been lost then re-established.

** Client Version **
1.0.0-rc6

** Server Version **
e.g. 1.29.2 (kind version 0.22.0)

To Reproduce
Steps to reproduce the behavior:

index.ts:

import { CoreV1Api, Informer, KubeConfig, KubernetesObject, ListPromise, makeInformer, V1Pod, V1PodList } from "@kubernetes/client-node";

const kc = new KubeConfig();
kc.loadFromDefault();

const k8sApi = kc.makeApiClient(CoreV1Api);
const path = `/api/v1/namespaces/default/pods`;
const listFn = (): Promise<V1PodList> => k8sApi.listNamespacedPod({ namespace: 'default' });

startInformer(kc, path, listFn);

function startInformer(kc: KubeConfig, path: string, listFn: ListPromise<KubernetesObject>): Informer<V1Pod> {
  const informer = makeInformer(kc, path, listFn);
  informer.on('add', (obj: KubernetesObject) => {
    console.log('==> add ', obj.metadata?.name);
  });
  informer.on('error', (err: unknown) => {
    console.log('==> err start', String(err));
    if (String(err) === 'Error: Premature close' || String(err).startsWith('FetchError') || String(err).startsWith('Forbidden')) {
      console.log('=====> restart in 3s');
      setTimeout(() => {
        informer.start();
      }, 3000);  
    }
  });
  informer.start();
  return informer;
}
  • start a kind cluster with audit (see https://kind.sigs.k8s.io/docs/user/auditing/)
  • create a pod pod1 in the default namespace
  • transpile and start the program above (npx tsc && node index.js)
  • the program displays ==> add pod1
  • stop the cluster
  • the program displays the error and tries to reconnect every 3s
  • restart the cluster
  • look at the cluster audit, tens of watch requests are sent:
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"5b4a9039-57c7-4ebc-9d6c-9cf746fc41c9","stage":"RequestReceived","requestURI":"/api/v1/namespaces/default/pods?watch=true\u0026resourceVersion=15241","verb":"watch","user":{"username":"kubernetes-admin","groups":["kubeadm:cluster-admins","system:authenticated"]},"sourceIPs":["192.168.127.1"],"userAgent":"node-fetch/1.0 (+https://github.com/bitinn/node-fetch)","objectRef":{"resource":"pods","namespace":"default","apiVersion":"v1"},"requestReceivedTimestamp":"2024-10-14T12:07:29.391183Z","stageTimestamp":"2024-10-14T12:07:29.391183Z"}
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"5b4a9039-57c7-4ebc-9d6c-9cf746fc41c9","stage":"ResponseStarted","requestURI":"/api/v1/namespaces/default/pods?watch=true\u0026resourceVersion=15241","verb":"watch","user":{"username":"kubernetes-admin","groups":["kubeadm:cluster-admins","system:authenticated"]},"sourceIPs":["192.168.127.1"],"userAgent":"node-fetch/1.0 (+https://github.com/bitinn/node-fetch)","objectRef":{"resource":"pods","namespace":"default","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2024-10-14T12:07:29.391183Z","stageTimestamp":"2024-10-14T12:07:29.391694Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"kubeadm:cluster-admins\" of ClusterRole \"cluster-admin\" to Group \"kubeadm:cluster-admins\""}}
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"5b4a9039-57c7-4ebc-9d6c-9cf746fc41c9","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/default/pods?watch=true\u0026resourceVersion=15241","verb":"watch","user":{"username":"kubernetes-admin","groups":["kubeadm:cluster-admins","system:authenticated"]},"sourceIPs":["192.168.127.1"],"userAgent":"node-fetch/1.0 (+https://github.com/bitinn/node-fetch)","objectRef":{"resource":"pods","namespace":"default","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2024-10-14T12:07:29.391183Z","stageTimestamp":"2024-10-14T12:07:29.391842Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"kubeadm:cluster-admins\" of ClusterRole \"cluster-admin\" to Group \"kubeadm:cluster-admins\""}}
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"9acc508c-d555-4741-bf43-b0da572d5eb9","stage":"RequestReceived","requestURI":"/api/v1/namespaces/default/pods?watch=true\u0026resourceVersion=15241","verb":"watch","user":{"username":"kubernetes-admin","groups":["kubeadm:cluster-admins","system:authenticated"]},"sourceIPs":["192.168.127.1"],"userAgent":"node-fetch/1.0 (+https://github.com/bitinn/node-fetch)","objectRef":{"resource":"pods","namespace":"default","apiVersion":"v1"},"requestReceivedTimestamp":"2024-10-14T12:07:29.395322Z","stageTimestamp":"2024-10-14T12:07:29.395322Z"}
[...]

Expected behavior

The api should not be spammed this way

** Example Code**

See repository https://github.com/feloy/kubernetes-client-issue-1933

Environment (please complete the following information):

  • OS: Mac M3
  • Node.js v20.15.1
  • Cloud runtime: Kind version 0.22.0
@feloy
Copy link
Contributor Author

feloy commented Oct 14, 2024

After some debug, I can see that this error is returned by the API after reconnecting, which makes the watcher terminate, and being restarted immediately:

{"type":"ERROR","object":{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"too old resource version: 28586 (28629)","reason":"Expired","code":410}}

@feloy
Copy link
Contributor Author

feloy commented Oct 14, 2024

I propose this fix, please tell me if it seems ok, I'll add unit tests
#1934

@brendandburns
Copy link
Contributor

closing this as completed via #1934

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants