Skip to content

[slo] start prebuild error budget is burning #15721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kylos101 opened this issue Jan 12, 2023 · 7 comments
Closed

[slo] start prebuild error budget is burning #15721

kylos101 opened this issue Jan 12, 2023 · 7 comments
Labels
priority: high team: workspace Issue belongs to the Workspace team type: bug Something isn't working

Comments

@kylos101
Copy link
Contributor

Bug description

We're recently burning through our start prebuild error budget too quickly.

image

image

Steps to reproduce

n/a

Workspace affected

No response

Expected behavior

The error budget should not be going down, unless we expect it to.

Example repository

No response

Anything else?

Reassign the issue to @Furisto (on-call now) during his normal working hours on Friday, but, while he sleeps, please research why it is burning, reach out to the on-call team as needed.

I assume our metric is flawed in gen82, or, there was a change (IDE or webapp) that is causing an unusually high # of prebuilds to fail starting.

@kylos101 kylos101 added type: bug Something isn't working team: workspace Issue belongs to the Workspace team priority: high labels Jan 12, 2023
@kylos101 kylos101 moved this to Scheduled in 🌌 Workspace Team Jan 12, 2023
@jenting
Copy link
Contributor

jenting commented Jan 13, 2023

@jenting jenting self-assigned this Jan 13, 2023
@jenting jenting moved this from Scheduled to In Progress in 🌌 Workspace Team Jan 13, 2023
@jenting
Copy link
Contributor

jenting commented Jan 13, 2023

The GCP log for workspace failed and workspace pod never ready. Mostly errors related to

  • No Git binary. -> this one requires user to update the base image with git binary inside. Related PR.
  • cannot initialize workspace: cannot find snapshot.

@jenting
Copy link
Contributor

jenting commented Jan 13, 2023

cannot initialize workspace: cannot find snapshot.

The problem started on Jan 8, 6 PM.

I did an investigation, according to the Jaeger tracing [1], [2], and [3]. I can't find the related snapshot on the object storage.

@kylos101
Copy link
Contributor Author

Start workspace success ratio is dropping as well.

Yes @jenting , however, burn rate is so low that I think it is okay? At least given it's current "burn".

@kylos101
Copy link
Contributor Author

@jenting given the traces that you shared (thank you), I suspect what is happening, is that the URL to a snapshot was being used to start a new workspace, however the underlying snapshot is no longer valid (was probably deleted).

Why do I suspect a snapshot that has had its source worksapce deleted? The workspace ID (like fuchsia-fly-xkuqh3dh5fh) in all three cases was random, instead of org-repo-random.

When I check this trace, the initializer looks like this:

"initializer": {
        "snapshot": {
            "snapshot": "workspaces/<redacted>/snapshot-1670002948719429142.tar@gitpod-prod-user-<redacted>"
        }
    },

I created this snapshot from an empty workspace, which can be used to start a workspace. In my case, I just created harlequin-cat-uifimy1moab, which you can see a trace for here. The initializer is structured similarly as above:

  "initializer": {
        "snapshot": {
            "snapshot": "workspaces/<redacted>/snapshot-1673630874408709453.tar@gitpod-prod-user-<redacted>"
        }
    },

I just deleted the source workspace, but, I suspect server hasn't done the actual garbage collection (because I can still create new workspaces from the snapshot URL). So, that GC will likely take two weeks from now.

May I ask you to inbox an issue to webapp team? Ideally server would check a snapshot is valid, prior to trying to start a workspace. It might delegate that decision to content-service, but, should check before asking ws-manager to start a workspace.

cc: @geropl @svenefftinge

@kylos101
Copy link
Contributor Author

@jenting please close this issue, after inboxing an issue to the webapp team? 🙏

@jenting
Copy link
Contributor

jenting commented Jan 16, 2023

@jenting please close this issue, after inboxing an issue to the webapp team? 🙏

Create two issues as follow up:

closing this issue.

@jenting jenting closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Awaiting Deployment in 🌌 Workspace Team Jan 16, 2023
@jenting jenting removed their assignment Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high team: workspace Issue belongs to the Workspace team type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants