-
Notifications
You must be signed in to change notification settings - Fork 22
Buffered metric support #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,19 +1,22 @@ | ||
| use super::{error, id, ROOT_MOD}; | ||
| use crate::metric::{convert_metric_events, BufferedMetricRef}; | ||
| use crate::util::{without_gvl, Struct}; | ||
| use magnus::{ | ||
| class, function, method, prelude::*, DataTypeFunctions, Error, Ruby, TypedData, Value, | ||
| class, function, method, prelude::*, DataTypeFunctions, Error, RArray, Ruby, TypedData, Value, | ||
| }; | ||
| use std::collections::HashMap; | ||
| use std::net::SocketAddr; | ||
| use std::str::FromStr; | ||
| use std::sync::mpsc::{channel, Receiver, Sender}; | ||
| use std::time::Duration; | ||
| use std::{future::Future, sync::Arc}; | ||
| use temporal_sdk_core::telemetry::{build_otlp_metric_exporter, start_prometheus_metric_exporter}; | ||
| use temporal_sdk_core::telemetry::{ | ||
| build_otlp_metric_exporter, start_prometheus_metric_exporter, MetricsCallBuffer, | ||
| }; | ||
| use temporal_sdk_core::{CoreRuntime, TokioRuntimeBuilder}; | ||
| use temporal_sdk_core_api::telemetry::{ | ||
| Logger, MetricTemporality, OtelCollectorOptionsBuilder, OtlpProtocol, | ||
| PrometheusExporterOptionsBuilder, TelemetryOptionsBuilder, | ||
| metrics::MetricCallBufferer, Logger, MetricTemporality, OtelCollectorOptionsBuilder, | ||
| OtlpProtocol, PrometheusExporterOptionsBuilder, TelemetryOptionsBuilder, | ||
| }; | ||
| use tracing::error as log_error; | ||
| use url::Url; | ||
|
|
@@ -24,6 +27,10 @@ pub fn init(ruby: &Ruby) -> Result<(), Error> { | |
| .define_class("Runtime", class::object())?; | ||
| class.define_singleton_method("new", function!(Runtime::new, 1))?; | ||
| class.define_method("run_command_loop", method!(Runtime::run_command_loop, 0))?; | ||
| class.define_method( | ||
| "retrieve_buffered_metrics", | ||
| method!(Runtime::retrieve_buffered_metrics, 1), | ||
| )?; | ||
| Ok(()) | ||
| } | ||
|
|
||
|
|
@@ -33,6 +40,7 @@ pub struct Runtime { | |
| /// Separate cloneable handle that can be referenced in other Rust objects. | ||
| pub(crate) handle: RuntimeHandle, | ||
| async_command_rx: Receiver<AsyncCommand>, | ||
| metrics_call_buffer: Option<Arc<MetricsCallBuffer<BufferedMetricRef>>>, | ||
| } | ||
|
|
||
| #[derive(Clone)] | ||
|
|
@@ -94,9 +102,10 @@ impl Runtime { | |
| .map_err(|err| error!("Failed initializing telemetry: {}", err))?; | ||
|
|
||
| // Create metrics (created after Core runtime since it needs Tokio handle) | ||
| let mut metrics_call_buffer = None; | ||
| if let Some(metrics) = telemetry.child(id!("metrics"))? { | ||
| let _guard = core.tokio_handle().enter(); | ||
| match (metrics.child(id!("opentelemetry"))?, metrics.child(id!("prometheus"))?, metrics.child(id!("buffered_with_size"))?) { | ||
| match (metrics.child(id!("opentelemetry"))?, metrics.child(id!("prometheus"))?, metrics.member::<Option<usize>>(id!("buffered_with_size"))?) { | ||
| // Build OTel | ||
| (Some(opentelemetry), None, None) => { | ||
| let mut opts_build = OtelCollectorOptionsBuilder::default(); | ||
|
|
@@ -148,8 +157,11 @@ impl Runtime { | |
| |err| error!("Failed building starting Prometheus exporter: {}", err), | ||
| )?.meter); | ||
| }, | ||
| // TODO(cretz): Metric buffering | ||
| (None, None, Some(_buffer_size)) => return Err(error!("Metric buffering not yet supported")), | ||
| (None, None, Some(buffer_size)) => { | ||
| let buffer = Arc::new(MetricsCallBuffer::new(buffer_size)); | ||
| core.telemetry_mut().attach_late_init_metrics(buffer.clone()); | ||
| metrics_call_buffer = Some(buffer); | ||
| }, | ||
| _ => return Err(error!("One and only one of opentelemetry, prometheus, or buffered_with_size must be set")) | ||
| }; | ||
| } | ||
|
|
@@ -163,6 +175,7 @@ impl Runtime { | |
| async_command_tx, | ||
| }, | ||
| async_command_rx, | ||
| metrics_call_buffer, | ||
| }) | ||
| } | ||
|
|
||
|
|
@@ -193,6 +206,16 @@ impl Runtime { | |
| } | ||
| } | ||
| } | ||
|
|
||
| pub fn retrieve_buffered_metrics(&self, durations_as_seconds: bool) -> Result<RArray, Error> { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not store whether durations are seconds on the buffer itself? Feels weird to pass it every time
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, no real reason, just seemed to be a conversion-time parameter as opposed to struct state and is how we did it in Python. It is class state from a user POV though. |
||
| let ruby = Ruby::get().expect("Not in Ruby thread"); | ||
| let buff = self | ||
| .metrics_call_buffer | ||
| .clone() | ||
| .expect("Attempting to retrieve buffered metrics without buffer"); | ||
| let updates = convert_metric_events(&ruby, buff.retrieve(), durations_as_seconds)?; | ||
| Ok(ruby.ary_new_from_values(&updates)) | ||
| } | ||
| } | ||
|
|
||
| impl RuntimeHandle { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's clear to me this is only ever used from one thread at a time (
Sync), but it's not clear to me that it will always be from the same thread (Send).Since it's stored on the runtime, and then cloned in the
retrieve_buffered_metricscall, if a user calls that from a different thread, theRccould have counting errors. Ideally we somehow force it to always be the same thread, or failing that we need to warn the user that they can only ever call it from the same thread that the runtime was created on, I think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with GVL yes we can trust only one thread at a time. As for which thread, for our use case, any Magnus Ruby
Valueis accessible via any Ruby thread. There is a utility that does mark it asSendcalledOpaque(https://docs.rs/magnus/latest/magnus/value/struct.Opaque.html). But that has other downsides in that it doesn't prevent GC'ing and we can't combine them due to their constraints.So we have to decide, do we want
BoxValuewhich prevents GC but we have to markSend/Syncourselves, or do we wantOpaquewhich hasSend/Syncbut does not prevent GC. Looking at the implementations of these and seeing thatOpaquewas just a transparentSend/Syncto prevent access without a Ruby object (so requires the GVL to be captured), I chose theBoxValueapproach. But I could have usedOpaqueand manually did arb_gc_register_addresson new andrb_gc_unregister_addresson drop, but this seems better.Is there a Rust-side concern with
SendI am missing if I trust how I use the value itself to be safe?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so Opaque can be
Sendbecause it's only declared as such when wrapping a Ruby value - and it makes the statement that Ruby values are fine to move across threads - cool.The problem here with our thing is almost unrelated to Ruby - it's about the
Rc. The implementation ofRc's refcount relies on the fact that it never leaves the thread it was created on (IE: it uses something like thread local storage for refcounting). So, if our handler that is using it gets called from a thread that wasn't the one it was created on, since clone changes the refcount, that's UBUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes of course, I should not use
Rcif I want to beSendsafe, forgot I used it. This is my bad coding where I forcedRcto be send/sync when it's not. I should change my code to:Or something like that. That way I can inherit send/sync as a property of the Ruby child only. I will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be fine
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I added this utility in 7ab900b, but since I applied it to another part where we were doing the same thing, it kinda expanded the Rust code that was changed. Basically, like
Opaque, we require a&Rubyreference to prove that they are on a Ruby thread before accessing the value.