-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Prototype using persistent volumes for storage of user data #8104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver Pod creation speed: Ran another test. Setup storage class to create PVC immediately instead of waiting for first consumer. With objective of having workspace start up time to be less then 40 seconds, allocating 15-20 seconds for volume attachment is too much. There is an easy workaround for this though. It works like this: Alternative solution: If we instead use ceph/rook or anything else, it will still incur same volume attachment penalty. So we have following solutions: All solutions that would not use local ssd will incur extra cost though. Though it might be mitigated, for example right now each node allocates 2x375GB of local ssd for 2x375x0.08=$60 per hour. PVC solution for fully maxed out node running 20 workspaces using pd-balanced disks: 20x30GBx0.12=$72 per hour. But during low peak, we would use less PVCs, unlike local disks that burn at constant rate no matter what. |
Personally I would vote for an option of using PVC with a pod that absorbs attachment time cost. It adds a bit of complexity, since each ws-daemon now need to keep track of it and add them ahead of time, but that doesn't seem like it adds too much complexity. And to me it looks like pros outweigh cons. |
Thanks for the in depth analysis and testing @sagor999 ! I have a few questions:
|
|
The PVC approach indeed looks promising. Do I understand correctly that we'd need to delete the temp pod to free up the PV for the workspace? If so, that would incur the deletion time of that pod, which may include time for the CSI driver to detach the PV. Are there other means to pre-provision/pre-attach the PVs to nodes? I'm asking because we just got rid of "ghosts" which we'd essentially need to reintroduce otherwise. |
@sagor999 Do you know what the impact on read/write performance would be? |
one other disadvantage of pvc with temp pod approach: @csweichel yes, we would need delete temp pod when ws have terminated already. temp pod would be eating up those costs. as for other means: we can go with a route of allocating disk per node instead of local ssd. which actually might be similar cost to what we have right now. we use two 375G local disks, instead we can use one ssd network attached disk (since google provides reliability for those already afaik) and it will be similar cost, and no changes to our setup. and if node dies - disk is still there that can be re-attached to a different node. @Furisto no, i did not do that test just yet. but it will be slower then local ssd for sure, just not sure by how much. |
https://github.com/gitpod-io/ops/blob/ff7e87b7a7ced2425cd9110160270369be003ee5/deploy/workspace/cluster-up.sh#L263 But that also means that pd-balanced disks seem to be sufficient for our workspace clusters. (I was under impression that we had to use local-ssd for perf and iops) |
So tl;dr of sorts:
Potential solutions:
Cons:
B. Update current pd-balanced disks to use
Cons:
|
Thank you for the tl;dr @sagor999 ! 🚀
Does Google have any way for us to get priority and reduce this time, like reserving a certain amount of disks or paying for the storage ahead of time? Regarding the initial questions:
In hindsight, we know we can do this, I did this with Mo a while back on a separate incident.
It would be good to your thoughts on the design for how // backupWorkspace backs up a running workspace
rpc BackupWorkspace(BackupWorkspaceRequest) returns (BackupWorkspaceResponse) {}
If yes, how would the user get access to their data again?
In other words, would it still make sense to restore from object storage (which could be old), instead of restoring from the disk (the working copy, which is more recent)? |
No. Allocating disk takes about 2 seconds. It is attachment of that disk to the node that takes a long time. One solution to this is what I proposed above, using temp pod that have pvc attached to it, and then workspace uses same pvc. But that means we are back in business of scheduling pods on the node (instead of letting scheduler do that).
same way as it is right now, nothing would change in any drastic way. we might add our own finalizer to PVC to make sure we back it up before it disappears into ether.
Yes. Disks would still be in GCP. We would need to add some tooling into automatic recovery of them via creating a PVC binding for them and running backup.
This depends on the timing. If we have some automated process, I would assume we would have to wait for it to complete so that user will have access to it. I don't we would be able to somehow give direct access to the disk to the user (plus I think it is not great UX for user to recover their files on their own, but we would definitely need some automatic process for this) |
on GCP it seems like limit is 128 disks per VM: |
@sagor999 Option B (Update current pd-balanced disks to use auto-delete=false) looks appealing to me. It requires very few changes to our current setup, and building the service that looks after those disks should not be too involved for the GCP case. The self-hosted drawback is real though. However, this may be a question of time: if we can move with this approach quickly, we have the issue solved in SaaS. It will take some time until SH installations reach that size, which gives us time to investigate building that "backup service" or other alternatives. Note: it would also mean that we really need to clean up our state management of ws-daemon :) Right now it's leaking workspace state on the disk. |
Closing this issue, as PVC approach is out of scope for now. Will concentrate on approach described in #8202 |
Is your feature request related to a problem? Please describe
This is a test and learn, so we can explore using persistent volumes to store
/workspace
data. https://www.notion.so/gitpod/Ensure-durability-and-availability-of-user-workspace-files-9edff6bbf87248d5ac73a7d4548ee4b3Describe the behaviour you'd like
Happy path:
Store working copy files,
/workspace
, on a distinct persistent volume. When a workspace is stopped, it's data must then be backed up, and the persistent volume removed.Questions:
The text was updated successfully, but these errors were encountered: