-
Notifications
You must be signed in to change notification settings - Fork 634
OpenAI component doesn't setup telemetry correctly #5451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@lmolkova @annelo-msft - is OpenAI and OpenTelemetry piped all the way through yet? |
@eerhardt OpenAI is instrumented with OTel, but partially (not all APIs). I was thinking about changing the approach slightly on how we let users opt into experimental semconv and wanted to get your and @samsp-msft opinions. So today OpenAI does what Azure SDKs do: app context switch + What if we did Pros:
Cons:
wdyt? |
It would definitely make it easier for us. And still keeps the signal to users that this is experimental. So I'd be supportive of the change. |
I think it's one less thing that the user needs to configure. TBH, I think we may be being overly cautious - there are too many moving pieces to be able to enable telemetry that I wonder if we are just making getting it working too hard. Getting some telemetry that may change over time is probably better than not getting any because you didn't find the docs and therefore missed some semi-hidden configuration parameter. For the above scenario, I only got it working because I know what @lmolkova had checked in, and followed the dependency graph to see what was actually being used. Most customer's won't do that, and they'll just assume that telemetry isn't enabled. I almost wonder if the component should just emit a log message once per process about telemetry being preview and not having any flags at all. |
As someone who gets "why this attribute on a span got changed 2 years ago" support tickets every once in a while, I want to have an explicit opt-in into experimental stuff. Also OTel has some opinions on what telemetry stability is (i.e. if I had a alert on a stable thing it should keep working, if it was broken and I lost $10B because of it it's a terrible issue). It sounds like you both support my proposal - I'll send the PR to OpenAI to change it and remove app-context-switch. We can totally keep a bigger discussion open on what are the stability guarantees on telemetry. |
There is still work required in the Aspire component to pickup the new version and push the right strings for metrics and tracing. |
Since this issue was create we now have telemetry supported by default in both Azure AI OpenAI and OpenAI client integrations. This still requires the app switch to be set (or ENV), but as soon as it's done the telemetry is flowing without any other intervention. This can then be disable using settings like any other OTEL integration in Aspire. https://github.com/dotnet/aspire/blob/main/src/Components/Aspire.Azure.AI.OpenAI/README.md#experimental-telemetry |
I have been fiddling with the playground sample for the playground OpenAI component trying to get it to emit telemetry. I think the following things are missing;
I also had to add a source to nuget as that version is pulling in a more recent OpenAI SDK version than is available internally at the moment
When these are all in place, we get the metrics and telemetry that @lmolkova added to OpenAI recently.
I believe we should include the app context switch - its there because the semanic conventions are not stable, but IMHO for Aspire we should be showing what's available rather than hiding it.
The text was updated successfully, but these errors were encountered: