Skip to content

Conversation

@dakotabenjamin
Copy link
Member

What type of PR is this? (check all applicable)

  • πŸ• Feature
  • πŸ› Bug Fix
  • πŸ“ Documentation
  • πŸ§‘β€πŸ’» Refactor
  • βœ… Test
  • πŸ€– Build or CI
  • ❓ Other (please specify)

Describe this PR

The raster renderer has been crashing with a single user (me) attempting to use calls such as

[https://api.imagery.hotosm.org/raster/searches/867568a0734dd1f665da93a5a5e2a5ec/tiles/WebMercatorQuad/{z}/{x}/{y}@1x.png?assets=visual](https://api.imagery.hotosm.org/raster/searches/867568a0734dd1f665da93a5a5e2a5ec/tiles/WebMercatorQuad/%7Bz%7D/%7Bx%7D/%7By%[email protected]?assets=visual)

Default cpu/mem config for the pods:

     Limits:
      cpu:     768m
      memory:  4Gi
    Requests:
      cpu:      256m
      memory:   3Gi

0.25 of a vCPU is probably way too small, new config (below) should fix most of the problems.

      requests:
        cpu: "1024m"
        memory: "3Gi"
      limits:
        cpu: "2048m"
        memory: "4Gi"

Additionally, we enable autoscaling for the raster-eoapi pod. Current scaling target is 85% CPU usage- This needs discussion, and potentially load-testing.

Screenshots

grafana dashboard showing some of the metrics leading to these changes:
image

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

tofu plan -chdir=terraform -var-file=vars/production.tfvars
No changes. Your infrastructure matches the configuration.
By @dakotabenjamin at 2025-11-05T17:23:47Z (view log).
No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

enabled: false

raster:
autoscaling:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configures autoscaling for the 'raster' pod instances in eoAPI, but note I dont think we have any autoscaling for the underlying nodes yet.

So it will try to run scale the pods, but likely run out of resources on the single worker node we have attached.

We probably need Karpenter installed to autoscale nodes in AWS πŸ‘

resources:
requests:
cpu: "1024m"
memory: "3Gi"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that each pod will require 3GB available for scaling to work. The only way to scale to 10 is to have 30GB RAM available between worker nodes.

It could be OK once we have a lot more nodes and resources, but I would keep the reservations low and the limits higher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants