Skip to content

feat: add Delve remote debugging support#201

Open
creydr wants to merge 2 commits into
kubernetes-sigs:mainfrom
creydr:feat/delve-debug-support
Open

feat: add Delve remote debugging support#201
creydr wants to merge 2 commits into
kubernetes-sigs:mainfrom
creydr:feat/delve-debug-support

Conversation

@creydr
Copy link
Copy Markdown
Contributor

@creydr creydr commented May 27, 2026

Summary

  • Adds debug-builder and debug stages to the Dockerfile with Delve and debug-compiled binary
  • Adds config/manager-debug/ kustomize overlay with minimal patches (remove probes, disable leader election, override entrypoint, increase memory)
  • Adds docker-build-debug and deploy-debug Makefile targets
  • Documents the debugging workflow in CONTRIBUTING.md

Hint: This came out, while I was debugging through the operator. I thought this might be helpful for others too.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 27, 2026

Deploy Preview for mcp-lifecycle-operator ready!

Name Link
🔨 Latest commit 14aba8e
🔍 Latest deploy log https://app.netlify.com/projects/mcp-lifecycle-operator/deploys/6a1eafe35b9205000895ec43
😎 Deploy Preview https://deploy-preview-201--mcp-lifecycle-operator.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: creydr
Once this PR has been reviewed and has the lgtm label, please assign jaideepr97 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from jaideepr97 May 27, 2026 14:42
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 27, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @creydr. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 27, 2026
@Cali0707
Copy link
Copy Markdown
Member

@creydr these changes look super useful for debugging in cluster, but I'm wondering if maybe it would be simpler to just debug the make run target (runs on your host, connecting to the cluster) rather than changing the in cluster deployment for debugging? That way the dockerfile default entrypoint also doesn't have delve in it

@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented May 27, 2026

@creydr these changes look super useful for debugging in cluster, but I'm wondering if maybe it would be simpler to just debug the make run target (runs on your host, connecting to the cluster) rather than changing the in cluster deployment for debugging? That way the dockerfile default entrypoint also doesn't have delve in it

Ah. I didn't see the make run command. Sure this would be easier in case it behaves like in-cluster. Let me test it that way.

@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented May 27, 2026

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2026
@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented May 27, 2026

Seems we can also debug it locally (I didn't know that it will pick up the correct kubeconfig). And as we don't have any specific volumes or such, this should be enough and we can close this one here.
Thanks @Cali0707 for pointing me to it

@creydr creydr closed this May 27, 2026
@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented May 28, 2026

@Cali0707 I run into some issues with the local debugging. When I create an MCP server (e.g. like in the example), I run into a reconcile loop, because it can't resolve locally the MCP url for the handshake:

{"level":"info","ts":"2026-05-28T12:13:48+02:00","msg":"MCP endpoint handshake failed","controller":"mcpserver","controllerGroup":"mcp.x-k8s.io","controllerKind":"MCPServer","MCPServer":{"name":"test-server","namespace":"default"},"namespace":"default","name":"test-server","reconcileID":"fd73f131-1e8e-46f6-ab9d-020e666b46a3","url":"http://test-server.default.svc.cluster.local:8080/mcp","error":"calling \"initialize\": sending \"initialize\": rejected by transport: Post \"http://test-server.default.svc.cluster.local:8080/mcp\": dial tcp: lookup test-server.default.svc.cluster.local: no such host"}

#208 should address the endless loop on reconciles, but I still have the issue with the handshake. Did you solve this locally (e.g. with something like telepresence)?

@Cali0707
Copy link
Copy Markdown
Member

@creydr I chatted with @matzew this morning and we both realized we hadn't used the local run target since the handshake merged. IMO let's re-open and clean up this PR, and go forwards with the approach you have here

@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented May 29, 2026

@creydr I chatted with @matzew this morning and we both realized we hadn't used the local run target since the handshake merged. IMO let's re-open and clean up this PR, and go forwards with the approach you have here

sure 👍

@creydr creydr reopened this May 29, 2026
@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented May 29, 2026

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 29, 2026
@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented May 29, 2026

/cc @Cali0707 @matzew

@k8s-ci-robot k8s-ci-robot requested review from Cali0707 and matzew May 29, 2026 06:15
@creydr creydr force-pushed the feat/delve-debug-support branch from 5b2e03d to 71e56b2 Compare May 29, 2026 10:27
@matzew
Copy link
Copy Markdown
Member

matzew commented Jun 1, 2026

/ok-to-test
/retest

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 1, 2026
@matzew
Copy link
Copy Markdown
Member

matzew commented Jun 1, 2026

@creydr can u rebase?

Adds a debug build target and kustomize overlay to enable remote
debugging of the controller running inside a Kubernetes cluster
using Delve.
@creydr creydr force-pushed the feat/delve-debug-support branch from 71e56b2 to bbaff53 Compare June 1, 2026 14:50
@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented Jun 1, 2026

@creydr can u rebase?

@matzew rebased

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in “debug build + debug deploy” workflow for running the controller under Delve inside a Kubernetes cluster, making it easier to remote-debug the operator without running it locally.

Changes:

  • Adds Dockerfile debug-builder / debug stages (Delve + debug-compiled manager) and names the distroless stage production.
  • Adds config/manager-debug kustomize overlay to deploy a debug-friendly controller manager configuration.
  • Adds docker-build-debug / deploy-debug Make targets and documents the workflow in CONTRIBUTING.md.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

File Description
Makefile Adds docker-build-debug and deploy-debug; adjusts docker-build to target the production stage.
Dockerfile Introduces debug build stages and a Delve-based debug runtime stage.
CONTRIBUTING.md Documents how to build/deploy and connect an IDE debugger via port-forward.
config/manager-debug/* Adds a kustomize overlay to disable probes, disable leader election, override entrypoint, and increase memory for debugging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Dockerfile
Comment on lines +50 to +52
# Debug image with Delve
FROM golang:1.26.3 AS debug
RUN go install github.com/go-delve/delve/cmd/dlv@latest
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll check on the buildx issue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment thread Dockerfile

# Debug image with Delve
FROM golang:1.26.3 AS debug
RUN go install github.com/go-delve/delve/cmd/dlv@latest
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is only used for development and won't be pushed, we should be fine IMO

Comment on lines +9 to +11
containers:
- name: manager
command: null
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some other patches in place too. Regarding potential missing patches: I was able to run the debug image with the current config

Move the production stage after the debug stage so that tools
defaulting to the last Dockerfile stage (e.g. docker-buildx) produce
the production image. Also add --target production to docker-buildx
for explicitness.
@creydr
Copy link
Copy Markdown
Contributor Author

creydr commented Jun 3, 2026

@aliok can you recheck?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants