Skip to content

[supervisor] improve docker activation #18143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 11, 2023
Merged

[supervisor] improve docker activation #18143

merged 9 commits into from
Jul 11, 2023

Conversation

akosyakov
Copy link
Member

@akosyakov akosyakov commented Jul 3, 2023

Description

This PR:

  • fix logging of docker-up to GCP that we can detect errors
  • logs docker-up attempts to a file
  • if docker-up or daemon is failing notifies a user about the issue and allow to inspect logs
  • fix memory leaks with net epoll library
  • stop activation if docker activation logging is bursting

Related Issue(s)

Fix #18091
Fix IDE-205

How to test

  • Open this PR on gitpod.io
  • Change dev/ide/example/workspace/.gitpod.yml to use default workspace image
  • Run ./validate.sh --gitpod-env DOCKERD_ARGS=1 from components/supervisor
  • When workspace is started
    • In the host workspace run sudo lsof > before.log
    • In the inner workspace run watch docker run hello-world to trigger a docker activation, it should not hang anymore, but say that failed to connect to docker daemon
    • In the host workspace you will see a lot of errors like cannot provide Docker activation socket but they should stop after time and then you will see a notification
    • In the host workspace run sudo lsof > after.log, check that the file does not become humongous (over 100mb in seconds) and there are not leaking file descriptors to docker socket and epoll from inner supervisor run compare to before.log
  • Now stop inner workspace and run ./validate.sh normally
  • In the inner worksapce run docker run hello-world. It should still work.

Documentation

Preview status

Gitpod was successfully deployed to your preview environment.

Build Options

Build
  • /werft with-werft
    Run the build with werft instead of GHA
  • leeway-no-cache
  • /werft no-test
    Run Leeway with --dont-test
Publish
  • /werft publish-to-npm
  • /werft publish-to-jb-marketplace
Installer
  • analytics=segment
  • with-dedicated-emulation
  • workspace-feature-flags
    Add desired feature flags to the end of the line above, space separated
Preview Environment
  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-large-vm
  • /werft with-gce-vm
    If enabled this will create the environment on GCE infra
  • with-integration-tests=all
    Valid options are all, workspace, webapp, ide, jetbrains, vscode, ssh

/hold

@akosyakov akosyakov force-pushed the ak/docker_up_leak branch 3 times, most recently from de24fda to eb2c2b5 Compare July 4, 2023 09:28
@akosyakov akosyakov changed the title [docker-up] don't leak passed socket [supervisor] fix leaking docker sockets Jul 4, 2023
@akosyakov akosyakov force-pushed the ak/docker_up_leak branch from eb2c2b5 to 845fde0 Compare July 4, 2023 09:29
@roboquat roboquat added size/XS and removed size/M labels Jul 4, 2023
@akosyakov
Copy link
Member Author

akosyakov commented Jul 4, 2023

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5453418319

  • recreate_vm: true

@akosyakov akosyakov force-pushed the ak/docker_up_leak branch from 845fde0 to 4a14c52 Compare July 4, 2023 13:14
@akosyakov akosyakov marked this pull request as ready for review July 4, 2023 13:14
@akosyakov akosyakov requested a review from a team July 4, 2023 13:14
@akosyakov akosyakov requested a review from iQQBot July 4, 2023 13:14
@roboquat roboquat added size/M and removed size/S labels Jul 4, 2023
@akosyakov akosyakov marked this pull request as draft July 5, 2023 08:42
@akosyakov akosyakov changed the title [supervisor] fix leaking docker sockets [supervisor] improve docker activation Jul 5, 2023
@akosyakov akosyakov force-pushed the ak/docker_up_leak branch from ee9d47a to 25a173f Compare July 5, 2023 09:36
@roboquat roboquat added size/XL and removed size/M labels Jul 5, 2023
@akosyakov akosyakov force-pushed the ak/docker_up_leak branch from 25a173f to 29a628b Compare July 5, 2023 11:04
@akosyakov akosyakov marked this pull request as ready for review July 5, 2023 11:04
@akosyakov akosyakov force-pushed the ak/docker_up_leak branch from 29a628b to d095988 Compare July 5, 2023 14:39
@akosyakov akosyakov force-pushed the ak/docker_up_leak branch from d095988 to 6eb74b7 Compare July 5, 2023 15:01
}

var notificationSent bool
var notificationDisabled bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto disabled while in headless task and logFile == nil?

var lastExitErrorTime time.Time

for ctx.Err() == nil {
err := listenToDockerSocket(ctx, term, cfg, telemetry, logFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the retry logic should not be here, it because once we call listenToDockerSocket it will recreate the listener, and there is no new connection handle

retry logic should be only around activation.Listen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we discussed in the. morning to recreate listener completely on any error to reject all stale incoming requests, if we move it back it will stale again

or I misunderstand something?

@akosyakov
Copy link
Member Author

akosyakov commented Jul 10, 2023

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5505519958

  • recreate_vm: true

@akosyakov
Copy link
Member Author

/unhold

@roboquat roboquat merged commit 0f9d319 into main Jul 11, 2023
@roboquat roboquat deleted the ak/docker_up_leak branch July 11, 2023 09:50
@roboquat roboquat added deployed: IDE IDE change is running in production deployed Change is completely running in production labels Jul 11, 2023
filiptronicek added a commit that referenced this pull request Jul 11, 2023
roboquat pushed a commit that referenced this pull request Jul 11, 2023
iQQBot added a commit that referenced this pull request Jul 14, 2023
roboquat pushed a commit that referenced this pull request Jul 17, 2023
* Revert "Revert "[supervisor] improve docker activation (#18143)" (#18236)"

This reverts commit 9b21205.

* only start docker activation if content is ready
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: IDE IDE change is running in production deployed Change is completely running in production size/XL team: IDE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gp validate: container workspace completed; containers of a workspace pod are not supposed to do that
3 participants