Skip to content

SerDe referenced assemblies #180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jul 26, 2019
Merged

SerDe referenced assemblies #180

merged 22 commits into from
Jul 26, 2019

Conversation

suhsteve
Copy link
Member

@suhsteve suhsteve commented Jul 19, 2019

A delegate references assemblies that is not easily ascertained by inspecting its target, method and fields. This PR addresses this by adding a assembly resolve event handler which will search the probing path and look for some combination of the assembly's simple name and known assembly extensions.

@suhsteve suhsteve changed the title [WIP] Serialize referenced assembly names and files [WIP] Serialize referenced assembly/manifestmodule names Jul 19, 2019
@suhsteve suhsteve changed the title [WIP] Serialize referenced assembly/manifestmodule names [WIP] Serialize referenced assembly and manifestmodule names Jul 19, 2019
@suhsteve suhsteve changed the title [WIP] Serialize referenced assembly and manifestmodule names [WIP] SerDe referenced assemblies Jul 19, 2019
@suhsteve suhsteve changed the title [WIP] SerDe referenced assemblies SerDe referenced assemblies Jul 20, 2019
@suhsteve
Copy link
Member Author

@stephentoub can you take a look and let me know if this is on the right track or if there may be better alternatives?

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the scenario you are solving here, nor why you need to serialize all the reference assemblies to the worker.

Typically, the referenced assemblies should either be next to the UDF assembly, or contained in the .NET core/framework directory. We should just be able to resolve the referenced assemblies on the worker by looking in one folder: where the UDF assembly is.

@suhsteve
Copy link
Member Author

suhsteve commented Jul 22, 2019

I don't fully understand the scenario you are solving here, nor why you need to serialize all the reference assemblies to the worker.

Below are some examples where class ExternalClass is located in an external assembly:

In this case the ExternalClass is created outside the udf and it is part of the udf1's closure. When we inspect udf1, we can eventually find a reference to ExternalClass as part of the delegate target's fields.
We can serialize that information and eventually send it to the worker.

var ec = new ExternalClass();
Func<Column, Column> udf1 = Udf<string, string>(
    str =>
    {
         ec.Method1();
         $"hello {str}!"
    });

In this example where the ExternalClass is called as part of the udf2's method instructions, I was not able to find a reference to the ExternalClass. (Although we should be able to find the reference if we inspect each instruction in the body of the method)

Func<Column, Column> udf2 = Udf<string, string>(
    str =>
    {
         var ec = new ExternalClass();
         ec.Method1();
         $"hello {str}!"
    });

Typically, the referenced assemblies should either be next to the UDF assembly, or contained in the .NET core/framework directory. We should just be able to resolve the referenced assemblies on the worker by looking in one folder: where the UDF assembly is.

I think there are a couple ways to approach this problem.

  1. if the Assembly FullName is something like ExampleAssembly, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null where the name is ExampleAssembly. If it is guaranteed that the file that contains this assembly starts with ExampleAssembly and ends with either .dll, .exe, or has no extension, then we can have the Worker attempt to load just these files when it needs to resolve the assembly.
  2. If the above is not true, then we can have the Worker attempt to load, *.dll, *.exe, and all files without extension from within the probing path.
  3. The main reason I chose the GetReferencedAssemblies() approach, is mainly because it may not always be the case where there is a direct mapping between the Assembly FullName to the file that actually contains the assembly. I'm okay with going with either 1 or 2 if what we're currently doing is unnecessary.

@eerhardt
Copy link
Member

If it is guaranteed that the file that contains this assembly starts with ExampleAssembly and ends with either .dll, .exe, or has no extension, then we can have the Worker attempt to load just these files when it needs to resolve the assembly.

That's what MSBuild does when it tries resolving assemblies that a Task depends on:

https://github.com/microsoft/msbuild/blob/3457a9f5436442283d707726cd49ae3334b93334/src/Shared/CoreCLRAssemblyLoader.cs#L128

@eerhardt
Copy link
Member

using System;

(nit) copyright header


Refers to: src/csharp/Microsoft.Spark/Utils/AssemblyLoader.cs:1 in 7b4eeca. [](commit_id = 7b4eeca, deletion_comment = False)

@suhsteve
Copy link
Member Author

I believe I've addressed all the comments. Let me know if there's anything else I should revisit.

@@ -0,0 +1,8 @@
<Project Sdk="Microsoft.NET.Sdk">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just put this project next to the unit tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to src/csharp

eerhardt
eerhardt previously approved these changes Jul 25, 2019
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@imback82 imback82 merged commit b043695 into dotnet:master Jul 26, 2019
@suhsteve suhsteve deleted the stsuh/serde branch October 18, 2019 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants