Skip to content

JetBrains dotMemory and dotTrace fail on 3.0 preview #11672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Wraith2 opened this issue Dec 16, 2018 · 26 comments
Closed

JetBrains dotMemory and dotTrace fail on 3.0 preview #11672

Wraith2 opened this issue Dec 16, 2018 · 26 comments

Comments

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 16, 2018

I've reported this as an issue through JetBrains support because it looks like an internal assumption in their software which is no longer true in 3.0. I realised it might be a good idea to post here in case it isn't.

When attempting to do a cpu or memory profile against CoreRun with one of the tools in the title the run will fail attempting to use the profiling api with the error in the screenshot below:
capture

I interpret this to be a request for a specific type in the runtime and that the failure is unexpected and unhandled. My assumption is that the type is no longer named as expected or no longer present because of changed to internal structures (probably to do with unloadable assemblies).

@AaronRobinsonMSFT
Copy link
Member

This looks like it may be the result of some interop call. Assigning the interop label until something proves otherwise.

@jkotas
Copy link
Member

jkotas commented Dec 17, 2018

If possible, could you please add a link to the to the JetBrains support issue or tag the JetBrains engineers looking into this?

My guess is that this is likely side-effect of one of the AppDomain-related cleanups (cc @janvorli).

@janvorli
Copy link
Member

The error says "the shared domain have to be detected there. The shared domain was removed from the runtime in dotnet/coreclr#21031. If JetBrains calls ClrDataAccess::GetAppDomainStoreData from DAC and expects the DacpAppDomainStoreData.sharedDomain to be non-NULL, it can be causing the issue.

@jkotas
Copy link
Member

jkotas commented Dec 17, 2018

ClrDataAccess::GetAppDomainStoreData is internal SOS/debugger interface. I doubt that the profiler calls this API.

My guess is that the profiler calls ProfToEEInterfaceImpl::GetAppDomainInfo and gets confused because of it does not see the shared domain anymore. This old comment touches on this: https://github.com/dotnet/coreclr/blob/49ca3db92a48da71d25c607af9716a30bafb3ff8/src/vm/proftoeeinterfaceimpl.cpp#L5672

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 17, 2018

Thanks for the info. I've updated the ticket with the info and link to this thread.

@ww898
Copy link
Contributor

ww898 commented Dec 17, 2018

Hi there, I'm took a look in JB memory profiler sources. It's too complicated to debug the issue without repro case (our test team can't reproduce it now). However, I gathered following information. The exception appears during GarbageCollectionFinished() callback when profiler is fully active. At this moment I expect that the shared application domain was loaded and initialized. However, the shared app domain detector failed in the JB profiler, so, we got the exception. It's possible in following cases:

  • GC appears before first ModuleLoadFinished call (unreal on my opinion)
  • CORPROF_E_DATAINCOMPLETE during GetModuleInfo in ModuleLoadFinished callback
  • the name of the shared app domain changed, I expect EE Shared Assembly Repository.
  • GetAssemblyInfo in ModuleLoadFinished callback returns zero AppDomainID
    ...

@Wraith2 Could you please send to [email protected] the repro case? It's very importnat for fixing.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 17, 2018

Just working on putting the replication together now, will upload to the ticket as soon as I can.

It's CoreRun specific at the moment because that's the only thing pulling in the latest clr bits, the preview runtime publicly available doesn't have the changes that cause the issue. It will affect very few people unless they're on nightly builds and those people know what they've getting into.

file uploaded with readme, it's just called replication.zip (I expected a random filename but it just stayed the same)

@benaadams
Copy link
Member

It's CoreRun specific at the moment because

Can reference the dotnet myget in NuGet.config https://dotnet.myget.org/F/dotnet-core/api/v3/index.json and add the latest runtime to the .csproj via RuntimeFrameworkVersion?

e.g.

<Project Sdk="Microsoft.NET.Sdk.Web">
  <PropertyGroup>
    <TargetFramework>netcoreapp3.0</TargetFramework>
    <OutputType>Exe</OutputType>
    <LangVersion>latest</LangVersion>
    <TieredCompilation>false</TieredCompilation>
    <RuntimeFrameworkVersion>3.0.0-preview-27217-02</RuntimeFrameworkVersion>
  </PropertyGroup>
</Project>

Goes funny when capturing a snapshot in dotMemory

@benaadams
Copy link
Member

Install latest 3.0.x runtime sdk https://github.com/dotnet/core-sdk and/or latest runtime https://github.com/dotnet/core-setup#daily-builds

You'll want to reference the version number of the latest runtime (currently 3.0.0-preview-27217-02)

Then this project will repo it https://aoa.blob.core.windows.net/aspnet/dotMemory.zip

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 18, 2018

Public issue is here: https://youtrack.jetbrains.com/issue/PROF-790

@Anna-Guseva
Copy link

Hi, I'm a member of JetBrains memory profiler team. We've reproduced this issue in our environment. Profiling works properly if run application via dotnet.exe 3.0.27122.1 but it fails if run via dotnet.exe 3.0.27214.1. We are debugging it now and you can follow the ticket in our bug tracker.

@ww898
Copy link
Contributor

ww898 commented Dec 18, 2018

The issue appears after commit b6d47b3a1b5b05c25968701615707e381f35a7ce "Delete code related to LoaderOptimization and SharedDomain (#21031)".

It surprised me a lot. Could you please tell me how can I collect static app domain variables in this case?

@jkotas
Copy link
Member

jkotas commented Dec 18, 2018

@ww898 Thanks for looking into this. Could you please explain why you mean by "static app domain variables"? Which profiler API are you calling that does not return the expected result anymore?

@ww898
Copy link
Contributor

ww898 commented Dec 18, 2018

@jkotas Our profiler collect values of static variables for application domain by GetAppDomainStaticAddress() where one of argument is AppDomainID. The list of AppDomainID's we collect by ICorProfileInfo3::GetAppDomainsContainingModule() which executes in the separate thread because this method can't be called during GC. The module in the shared domain can't be unloaded, every static variable has separate value in subset of normal domain. The is theory.

Our implementation requires detection that the module is in shared domain because we want to support static variables collection (normal domains only) for CLR v2 where GetAppDomainsContainingModule() isn't yet implemented. We compare the domain name to EE Shared Assembly Repository to detect shared domain and now get the fail here.

Now there is no more shared domain as I understand, isn't it. I need to redesign domain subsystem if it's true.

@jkotas
Copy link
Member

jkotas commented Dec 19, 2018

Correct, there is no shared domain in CoreCLR.

Do you think it would be useful to change CoreCLR to return fake extra domain from GetAppDomainsContainingModule API and make GetAppDomainInfo to return EE Shared Assembly Repository name for it to mimic the old behavior? Or would you prefer a clean design and rather just update your profiler implementation?

@ww898
Copy link
Contributor

ww898 commented Dec 19, 2018

Surely clean design. I'm writing the fix.

@ww898
Copy link
Contributor

ww898 commented Dec 20, 2018

Fix is ready. We need time to test it on all frameworks.

@davidfowl
Copy link
Member

@ww898 Any ETA? Will it be in the next EAP?

@ww898
Copy link
Contributor

ww898 commented Jan 5, 2019

@davidfowl The fix is in 2019 EAP1 now.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jan 5, 2019

what is EAP?

@benaadams
Copy link
Member

Early Access Program (prerelease)

@davidfowl
Copy link
Member

Where do I find that? I keep finding the Version: 2018.3.1 EAP

@Anna-Guseva
Copy link

@davidfowl 2019 EAP1 will be available not earlier than February 2019.

@davidfowl
Copy link
Member

😞 I can wait I guess, I'm using dotTrace timeline mode to look at memory allocations for now.

@davidfowl
Copy link
Member

@Anna-Guseva any word?

@maartenba
Copy link

FYI latest 2019.1 EAP has 3.0 support

@Wraith2 Wraith2 closed this as completed Apr 14, 2019
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants