Skip to content

[JENKINS-75563] First draft on how to kill a windows container #1724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

raul-arabaolaza
Copy link

@raul-arabaolaza raul-arabaolaza commented Jul 28, 2025

https://issues.jenkins.io/browse/JENKINS-75563

Trying to validate my approach on how we could kill a windows container. The problem I have is that AFAIK there is no way to use CMD in windows to get a result similar to

"kill \\`grep -l '" + COOKIE_VAR + "=" + cookie
+ "' /proc/*/environ | cut -d / -f 3 \\`")
which identifies the process to kill based on the existence of the JENKINS_SERVER_COOKIE env variable.

Instead what I am trying to do is find the process that was originated by the original command in

and kill it. Assuming every command run inside container has its own unique Launcher.

Note I have not tested anything yet as I want first to know if this approach makes sense at all or I should go in another direction.

(edited by jglick to use permalinks)

null,
"cmd",
"/Q",
"wmic process where \"CommandLine like '%" + launchCmd + "%'\" call terminate")
Copy link
Author

@raul-arabaolaza raul-arabaolaza Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested yet, provided by ChatGPT. But this should kill any process whose commandline contains launchCmd

@jglick
Copy link
Member

jglick commented Jul 28, 2025

So IIRC the difficulty is that we cannot use https://javadoc.jenkins.io/hudson/util/ProcessTree.html#killAll(java.util.Map) since the processes to be killed reside in a different container (and thus process namespace) than the agent.

null,
"cmd",
"/Q",
"wmic process where \"CommandLine like '%" + launchCmd + "%'\" call terminate")
Copy link
Member

@jglick jglick Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wmic is not available by default in newer versions of Windows Server I am afraid.

(Anyway you would want by cookie env var, not just doing some fuzzy match on the command line.)

@jglick
Copy link
Member

jglick commented Jul 28, 2025

I have not tested anything yet

It is easy to test AFAIK: just run a cluster with a Windows node pool and try unignoring interruptedPodWindows.

#626 (comment) FTR

@jglick
Copy link
Member

jglick commented Jul 28, 2025

I think you just need to retest whether .StartInfo.Environment is accessible from Get-Process output in a current Windows version; it has been six years since I last tried it, and things may have changed. (Assume Powershell is available; I do not think we need to deal with batch scripts.)

Also doLaunch probably does need to be replaced by launchWithCookie.

@raul-arabaolaza
Copy link
Author

Oh, I was under the impression I was limited to CMD, will test if powershell has improved, thanks!

@jglick
Copy link
Member

jglick commented Jul 29, 2025

You would just have to check what is actually available in typical container environments.

@jglick jglick changed the title First draft on how to kill a windows container [JENKINS-75563] First draft on how to kill a windows container Jul 30, 2025
@raul-arabaolaza
Copy link
Author

I may have some code to share today end of my day.

@raul-arabaolaza
Copy link
Author

raul-arabaolaza commented Aug 5, 2025

I have been able to craft (with a bit of AI help) a draft of a script that seems to work on usual usages, testing is yet very limited but something that works sometimes is better that something that never works until we can find something that always works.

Anyway, I have updated the code here in case anyone wants to take a look while I jump into serious testing using #1727.

For the moment tested only with a pipeline like this one which now ends successfully:

pipeline {
  agent {
    kubernetes {
      customWorkspace 'c:/s'
      yaml """
apiVersion: v1
kind: Pod
spec:
  tolerations:
    - key: "node.kubernetes.io/os"
      operator: "Equal"
      value: "windows"
      effect: "NoSchedule"
  containers:
  - name: jnlp
    image: "cloudbees/cloudbees-core-agent:2.492.1.3-windowsservercore-ltsc2019"
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - mountPath: /s
      name: s-volume
    - mountPath: /s@tmp
      name: stmp-volume
    resources:
      cpu:
        request: 1
      memory:
        request: 2Gi
  volumes:
  - emptyDir: {}
    name: s-volume
  - emptyDir: {}
    name: stmp-volume
  nodeSelector:
    dedicated: win
"""
    }
  }

  stages {
    stage('Timeout'){
        options {
        timeout(time: 10, unit: "SECONDS")
    }
        steps{
            container('jnlp') {
                script{
                    try {
                        bat 'ping 127.0.0.1 -n 3601 > test.txt'
                    } catch(Exception ex) {
                        //Catch block
                        println("Exception caught in debugging block")
                        println(ex);
                    } finally {
                        println("Handling final actions in debugging block")
                    }
                }
            }
        }
    }
    stage('Rename'){
        steps{
            container('jnlp') {
                script{
                    bat 'rename test.txt test2.txt'
                }
            }
        }
    } 
  }
}

try {
String remote = copyWindowsKillScript(workspace).getRemote();
exitCode = doLaunch( // Will fail if the script is not present, but it was also failing before in all cases
false,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For development, may move to true or introduce a system property in the future

@jglick
Copy link
Member

jglick commented Aug 5, 2025

a pipeline like this one which now ends successfully

I am not sure what that pipeline proves. The version I saw from a plugin user was doing a rename in the catch block, which is the part that fails since the ping was not terminated and thus held a file lock.

@raul-arabaolaza
Copy link
Author

a pipeline like this one which now ends successfully

I am not sure what that pipeline proves. The version I saw from a plugin user was doing a rename in the catch block, which is the part that fails since the ping was not terminated and thus held a file lock.

Yeah, same with this one, the rename stage fails as the test.txt file was blocked by the unkilled ping. Now that I have something basic working I will do more in depth testing including a simplified pipeline like the one you mention

try {
$envBlock = [ProcessEnvironmentReader]::ReadEnvironmentBlock($id)
if ($envBlock.Contains("JENKINS_SERVER_COOKIE=$($cookie)")) {
Write-Host "Killing $($_.ProcessName) (ID: $($_.Id)) - JENKINS_SERVER_COOKIE matches"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove this? It is not gonna be visible to the end user but may help on local runsa nd debugging

@@ -18,7 +18,11 @@ spec:
''') {
node(POD_LABEL) {
container('shell') {
powershell 'try {Write-Host starting to sleep; Start-Sleep 999999} finally {Write-Host shut down gracefully}'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only work if a user sends a Ctrl-C to the process, Stop-Process is a hard kill that inmediately kills the process. I have not found a way to do a "soft kill" in windows via powershell and I honestly do not believe is needed.

Copy link
Member

@jglick jglick Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it would certainly be better for the plugin to emulate the Ctrl-C-like behavior, providing functional parity with the Linux version. If this is simply impossible, then a hard kill is I guess better than the nothing we have now.

What happens if the pod is terminated? (gracefully, e.g. kubectl delete pod) Does that run any Powershell finally code? Tricky to test since you would have to look for some effect outside the pod itself.

Copy link
Author

@raul-arabaolaza raul-arabaolaza Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought something similar yesterday while trying to sleep, I have found some examples of C# code that seems to emulate the ctrl+c that I want to try. I will also try to test the pod deletion.

Still not able to deal with confirmation dialogs for cmd for example  but succesfully killed a ping in my tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants