Skip to content

Conversation

@cretz
Copy link
Member

@cretz cretz commented Apr 1, 2025

What was changed

Added Temporalio::Runtime::MetricBuffer users can create with a size and set on Runtime to buffer metrics. They can call retrieve_updates on the buffer to get buffered metrics. This gives more freedom for users to do what they want with the metrics.

Checklist

  1. Closes [Feature Request] Buffered metrics for user use #176

@cretz cretz requested a review from a team April 1, 2025 13:37
}
}

pub fn retrieve_buffered_metrics(&self, durations_as_seconds: bool) -> Result<RArray, Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not store whether durations are seconds on the buffer itself? Feels weird to pass it every time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no real reason, just seemed to be a conversion-time parameter as opposed to struct state and is how we did it in Python. It is class state from a user POV though.

Comment on lines 283 to 287
// We can't use Ruby Opaque because it doesn't protect the object from being
// GC'd, but we trust ourselves not to access this value outside of Ruby
// context (which has global GVL to ensure thread safety).
unsafe impl Send for BufferedMetricRef {}
unsafe impl Sync for BufferedMetricRef {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clear to me this is only ever used from one thread at a time (Sync), but it's not clear to me that it will always be from the same thread (Send).

Since it's stored on the runtime, and then cloned in the retrieve_buffered_metrics call, if a user calls that from a different thread, the Rc could have counting errors. Ideally we somehow force it to always be the same thread, or failing that we need to warn the user that they can only ever call it from the same thread that the runtime was created on, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with GVL yes we can trust only one thread at a time. As for which thread, for our use case, any Magnus Ruby Value is accessible via any Ruby thread. There is a utility that does mark it as Send called Opaque (https://docs.rs/magnus/latest/magnus/value/struct.Opaque.html). But that has other downsides in that it doesn't prevent GC'ing and we can't combine them due to their constraints.

So we have to decide, do we want BoxValue which prevents GC but we have to mark Send/Sync ourselves, or do we want Opaque which has Send/Sync but does not prevent GC. Looking at the implementations of these and seeing that Opaque was just a transparent Send/Sync to prevent access without a Ruby object (so requires the GVL to be captured), I chose the BoxValue approach. But I could have used Opaque and manually did a rb_gc_register_address on new and rb_gc_unregister_address on drop, but this seems better.

Is there a Rust-side concern with Send I am missing if I trust how I use the value itself to be safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so Opaque can be Send because it's only declared as such when wrapping a Ruby value - and it makes the statement that Ruby values are fine to move across threads - cool.

The problem here with our thing is almost unrelated to Ruby - it's about the Rc. The implementation of Rc's refcount relies on the fact that it never leaves the thread it was created on (IE: it uses something like thread local storage for refcounting). So, if our handler that is using it gets called from a thread that wasn't the one it was created on, since clone changes the refcount, that's UB

Copy link
Member Author

@cretz cretz Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes of course, I should not use Rc if I want to be Send safe, forgot I used it. This is my bad coding where I forced Rc to be send/sync when it's not. I should change my code to:

#[derive(Clone, Debug)]
pub struct BufferedMetricRef {
    value: Arc<SendSyncBoxValue<Value>>,
}

struct SendSyncBoxValue<T>(BoxValue<T>);

unsafe impl<T> Send for SendSyncBoxValue<T> {}
unsafe impl<T> Sync for SendSyncBoxValue<T> {}

Or something like that. That way I can inherit send/sync as a property of the Ruby child only. I will fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be fine

Copy link
Member Author

@cretz cretz Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added this utility in 7ab900b, but since I applied it to another part where we were doing the same thing, it kinda expanded the Rust code that was changed. Basically, like Opaque, we require a &Ruby reference to prove that they are on a Ruby thread before accessing the value.

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! The safe box value makes sense to me

@cretz cretz merged commit 89945c2 into temporalio:main Apr 9, 2025
7 checks passed
@cretz cretz deleted the metric-buffer branch April 9, 2025 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Buffered metrics for user use

2 participants