Skip to content

Improve devops #576

@joepio

Description

@joepio

Current situation

  • Github CI action is triggered manually.
  • Binary is built
  • Binary is sent to server over SSH to a VPS on vultr.
  • We use systemctl to stop atomic-server
  • We create an export
  • We use systemctl to start atomic-server

What I like about this approach

  • It's pretty simple to run. Just two clicks from github.
  • It gives me status updates and error notifications
  • It's pretty standard, which means it looks like what many other devs might do. That means I catch problems that others may encounter, which is a good thing.
  • No vendor-lock in. I don't rely on any AWS / Azure / Google stuff
  • Lots of control over hardware. I can move to a local machine if needed and little changes.

What went wrong

AtomicData.dev was just down for longer than I'd like to admit. Let's evaluate what went wrong, and how to tackle the problems.

  • I replaced the binary on my VPS, which made it harder to revert to a backup. I've fixed that since then by creating backups in the CI.
  • A change upstream updated OpenSSL in Rust, but not on my VPS. Still not sure where this came from. Maybe I should use fixed versions for github actions and ubuntu images.
  • I don't have a staging machine / environment. I should have this. It should resemble production as much as possible (although it could be more resource constrained).
  • My built binary wasn't tested before it was deployed. I should have used a docker image that was pre-tested, and designed to run on the same OS. Ideally I run at least some tests on staging.

Things that can be improved

  • Use (tested) images instead of binaries. (to prevent stuff like this)
  • I'd like to use tools that improve observability. Think Grafana / Prometheus / Jaeger. Add metrics / Prometheus support #420. I think I'd like to run these on the same machine, to save costs.
  • cattle vs pets. In the future, I'd like to not be dependent on single machines. But as of now, I focus on a cost effective single node setup. Also, the performance right now is pretty much amazing, so I don't think I need multi-node for perf scaling reasons anytime soon.
  • performance regression tests.
  • Setup staging Staging environment #588

What tech to use for deployments

How do I approach these different goals? What tools could help me?

  • Docker. I'm pretty sure the answer will involve running images instead of running directly on ubuntu.
  • Docker-Compose. I'm familiar with this, and it seems like a decent pick for a single node setup. But I suppose it doesn't really scale or offer lots of flexibility. Not sure how easy it is to deploy.
    • Kubernetes. Definitely powerful, but I'm not sure if I need it. As of now, everything is just one node.
  • Terraform / Pulumi. Allows for a lot of configuration! Can deploy to pretty much anything, but Pulumi will probably require kubernetes.
  • Earthly is a build tool that uses docker
  • sup is for running a command on multiple machines.
  • monit for monitoring a single unix system and mmonit for multiple
  • seaweedfs for multi-node fs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions