Skip to content

WIP: Add per-queue-type disk limits #14086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

the-mikedavis
Copy link
Collaborator

This is an extension of the idea of the free disk space alarm. That option enables blocking publishers when the free disk space on the data dir's disk falls below some threshold. This feature blocks publishers of individual queue types when the disk space taken by the queue type exceeds the configured threshold.

This change is incomplete: it only affects QQs and streams and AMQP 0-9-1 so far. I thought I'd open this as a draft early though to hear if anyone has thoughts on this as a feature.

How to demo the changes...
  1. make run-broker RABBITMQ_CONFIG_FILE=example.conf
  2. In that shell start the monitor rabbit_queue_type_disk_monitor:start_link() (should become part of a sup tree eventually)
  3. Start a QQ producer/consumer perf-test -qq -u qq -x 1 -y 1 --rate 5 --confirm 5
  4. Start a stream producer/consumer perf-test -sq -u sq -x 1 -y 1 --rate 5 --confirm 5
  5. Go to the QQ data dir: cd /tmp/rabbitmq-test-instances/<node>/mnesia/<node>/quorum/<node>/
  6. Create a big file larger than the QQ disk limit: dd if=/dev/zero of=ballast.bin bs=1G count=3
  7. Wait a few seconds and notice that the QQ perf-test command has stopped but not the stream one.
  8. rm ballast.bin and the QQ perf-test command should continue

This is an extension of the idea of the free disk space alarm. That
option enables blocking publishers when the free disk space on the data
dir's disk falls below some threshold. This feature blocks publishers
of individual queue types when the disk space taken by the queue type
exceeds the configured threshold.

This change is incomplete: it only affects QQs and streams and AMQP
0-9-1 so far.
@the-mikedavis the-mikedavis self-assigned this Jun 16, 2025
@ikavgo
Copy link
Contributor

ikavgo commented Jun 16, 2025

I like this.
I would change configuration from quorum_queue_disk_limit.absolute = 2GiB to queue_disk_limit.quorum.absolute = 2GiB and work together with type registry.

I wonder if it is possible to set such limits via policies and configs for individual queues?
If this alarm reported via status() we could also highlight blocked queues in the Management UI.

%%
%% On especially large directories this can be an expensive operation since
%% each sub-directory is scanned recursively and each file's metadata must be
%% read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better for queue types to increment/decrement an internal counter (that may be stored in persistent_term).

@kjnilsson
Copy link
Contributor

This is an extension of the idea of the free disk space alarm. That option enables blocking publishers when the free disk space on the data dir's disk falls below some threshold. This feature blocks publishers of individual queue types when the disk space taken by the queue type exceeds the configured threshold.

This change is incomplete: it only affects QQs and streams and AMQP 0-9-1 so far. I thought I'd open this as a draft early though to hear if anyone has thoughts on this as a feature.

How to demo the changes...

I am wondering if it would be better (by some definition of better) that if you want this type of feature the different data directories were mounted as separate disk volumes which should make usage querying a lot simpler and faster.

Also making this queue type specific isn't necessarily always going to work. For example we have experimented with running multiple quorum queue Ra systems which would use different data directories (ideally on different volumes). In this case it it is much harder to work out which channels etc related to particular quorum queues in different systems. Not impossible but not a straight map to a queue type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants