Skip to content

OpenAI component doesn't setup telemetry correctly #5451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
samsp-msft opened this issue Aug 26, 2024 · 7 comments
Open

OpenAI component doesn't setup telemetry correctly #5451

samsp-msft opened this issue Aug 26, 2024 · 7 comments
Assignees
Labels
ai area-integrations Issues pertaining to Aspire Integrations packages
Milestone

Comments

@samsp-msft
Copy link
Contributor

I have been fiddling with the playground sample for the playground OpenAI component trying to get it to emit telemetry. I think the following things are missing;

  • Updating Directory.Packages.props to use beta3 for the Azure.AI.OpenAI package
<PackageVersion Include="Azure.AI.OpenAI" Version="2.0.0-beta.3" />

I also had to add a source to nuget as that version is pulling in a more recent OpenAI SDK version than is available internally at the moment

  • In the Aspire.Azure.AI.OpenAI component:
    • Update the activity source names to include OpenAI.*
protected virtual string[] ActivitySourceNames => new[] { $"{typeof(TClient).Namespace}.*", "OpenAI.*" }; 
  • Set the app context switch for the OpenAI instrumentation
        if (GetTracingEnabled(settings))
        {
            AppContext.SetSwitch("OpenAI.Experimental.EnableOpenTelemetry", true);
            builder.Services.AddOpenTelemetry()
                .WithTracing(traceBuilder => traceBuilder.AddSource(ActivitySourceNames));

        }

When these are all in place, we get the metrics and telemetry that @lmolkova added to OpenAI recently.

image

I believe we should include the app context switch - its there because the semanic conventions are not stable, but IMHO for Aspire we should be showing what's available rather than hiding it.

@ghost ghost added the area-integrations Issues pertaining to Aspire Integrations packages label Aug 26, 2024
@eerhardt
Copy link
Member

@lmolkova @annelo-msft - is OpenAI and OpenTelemetry piped all the way through yet?

@lmolkova
Copy link
Contributor

lmolkova commented Aug 26, 2024

@eerhardt OpenAI is instrumented with OTel, but partially (not all APIs).

I was thinking about changing the approach slightly on how we let users opt into experimental semconv and wanted to get your and @samsp-msft opinions.

So today OpenAI does what Azure SDKs do: app context switch + AddSource(name).

What if we did AddSource("Experimental.OpenAI") instead?

Pros:

  • it's one-step enablement instead of two
  • it's obvious and explicit that things are experimental

Cons:

  • code-change is necessary when telemetry goes stable, but we can enable both sources right away AddSource("Experiemental.OpenAI*").AddSource(OpenAI*)

wdyt?

@eerhardt
Copy link
Member

What if we did AddSource("Experimental.OpenAI") instead?

It would definitely make it easier for us. And still keeps the signal to users that this is experimental. So I'd be supportive of the change.

@samsp-msft
Copy link
Contributor Author

I think it's one less thing that the user needs to configure. TBH, I think we may be being overly cautious - there are too many moving pieces to be able to enable telemetry that I wonder if we are just making getting it working too hard. Getting some telemetry that may change over time is probably better than not getting any because you didn't find the docs and therefore missed some semi-hidden configuration parameter.

For the above scenario, I only got it working because I know what @lmolkova had checked in, and followed the dependency graph to see what was actually being used. Most customer's won't do that, and they'll just assume that telemetry isn't enabled.

I almost wonder if the component should just emit a log message once per process about telemetry being preview and not having any flags at all.

@lmolkova
Copy link
Contributor

lmolkova commented Aug 27, 2024

As someone who gets "why this attribute on a span got changed 2 years ago" support tickets every once in a while, I want to have an explicit opt-in into experimental stuff. Also OTel has some opinions on what telemetry stability is (i.e. if I had a alert on a stable thing it should keep working, if it was broken and I lost $10B because of it it's a terrible issue).

It sounds like you both support my proposal - I'll send the PR to OpenAI to change it and remove app-context-switch. We can totally keep a bigger discussion open on what are the stability guarantees on telemetry.

@samsp-msft
Copy link
Contributor Author

There is still work required in the Aspire component to pickup the new version and push the right strings for metrics and tracing.

@sebastienros
Copy link
Member

Since this issue was create we now have telemetry supported by default in both Azure AI OpenAI and OpenAI client integrations. This still requires the app switch to be set (or ENV), but as soon as it's done the telemetry is flowing without any other intervention. This can then be disable using settings like any other OTEL integration in Aspire. https://github.com/dotnet/aspire/blob/main/src/Components/Aspire.Azure.AI.OpenAI/README.md#experimental-telemetry

@davidfowl davidfowl removed the bug label Oct 16, 2024
@eerhardt eerhardt added this to the Backlog milestone Jan 14, 2025
@davidfowl davidfowl added the ai label May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai area-integrations Issues pertaining to Aspire Integrations packages
Projects
None yet
Development

No branches or pull requests

7 participants
@davidfowl @sebastienros @lmolkova @eerhardt @joperezr @samsp-msft and others