Skip to content
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# source-indexer
This repo contains the code for building http://source.dot.net

## Documentation
- [Source Selection Algorithm](docs/source-selection-algorithm.md) - How the indexer chooses the best implementation when multiple builds exist for the same assembly

## Build Status
[![Build Status](https://dev.azure.com/dnceng/internal/_apis/build/status/dotnet-source-indexer/dotnet-source-indexer%20CI?branchName=main)](https://dev.azure.com/dnceng/internal/_build/latest?definitionId=612&branchName=main)

Expand Down
75 changes: 75 additions & 0 deletions docs/source-selection-algorithm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Source Selection Algorithm

When the source indexer processes multiple builds for the same assembly (e.g., generic builds, platform-specific builds, or builds with different target frameworks), it uses a scoring algorithm to select the "best" implementation to include in the final source index.

## Overview

The deduplication process groups all compiler invocations by `AssemblyName` and then calculates a score for each build. The build with the highest score is selected and included in the generated solution file.

## Scoring Priorities

The scoring algorithm evaluates builds using the following criteria, ordered by priority from highest to lowest:

### 1. UseForSourceIndex Property (Highest Priority)
- **Score**: `int.MaxValue` (2,147,483,647)
- **Description**: When a project explicitly sets the `UseForSourceIndex` property to `true`, it receives the maximum possible score, ensuring it will always be selected regardless of other factors.
- **Use Case**: Provides an escape hatch for projects that should definitely be included in the source index.

### 2. Platform Support Status (Second Priority)
- **Score**: `-10,000` penalty for platform-not-supported assemblies
- **Description**: If a project has the `IsPlatformNotSupportedAssembly` property set to `true`, it receives a heavy penalty.
- **Use Case**: Ensures that stub implementations containing mostly `PlatformNotSupportedException` are avoided in favor of real implementations.

### 3. Target Framework Version (Third Priority)
- **Score**: `Major * 1000 + Minor * 100`
- **Description**: Newer framework versions receive higher scores. For example:
- .NET 8.0 = 8,000 + 0 = 8,000 points
- .NET 6.0 = 6,000 + 0 = 6,000 points
- .NET Framework 4.8 = 4,000 + 80 = 4,080 points
- **Use Case**: Prefers more recent implementations that are likely to contain the latest features and bug fixes.

### 4. Platform Specificity (Fourth Priority)
- **Score**: `+500` for platform-specific frameworks
- **Additional**: `+100` bonus for Linux platforms, `+50` bonus for Unix platforms
- **Description**: Platform-specific builds (e.g., `net8.0-linux`, `net8.0-windows`) receive bonuses over generic builds.
- **Use Case**: Platform-specific implementations often contain more complete functionality than generic implementations.

### 5. Source File Count (Lowest Priority)
- **Score**: `+1` per source file
- **Description**: Builds with more source files receive higher scores.
- **Use Case**: Acts as a tiebreaker when other factors are equal, assuming more source files indicate a more complete implementation.

## Example Scoring

Consider these hypothetical builds for `System.Net.NameResolution`:

| Build | UseForSourceIndex | IsPlatformNotSupported | Framework | Platform | Source Files | Total Score |
|-------|-------------------|------------------------|-----------|----------|--------------|-------------|
| Generic Build | false | true | net8.0 | none | 45 | -1,955 |
| Linux Build | false | false | net8.0-linux | linux | 127 | 8,727 |
| Windows Build | false | false | net8.0-windows | windows | 98 | 8,598 |
| Override Build | true | false | net6.0 | none | 23 | 2,147,483,647 |

In this example:
- The **Override Build** would be selected due to `UseForSourceIndex=true`
- Without the override, the **Linux Build** would be selected with the highest score
- The **Generic Build** receives a massive penalty for being platform-not-supported

## Implementation Details

The scoring logic is implemented in the `CalculateInvocationScore` method in `BinLogToSln/Program.cs`. The method:

1. Reads project properties from the binlog file
2. Applies scoring rules in priority order
3. Handles parsing errors gracefully
4. Returns a base score of 1 for builds that fail scoring to avoid complete exclusion

## Configuration

The algorithm can be influenced through MSBuild project properties:

- **UseForSourceIndex**: Set to `true` to force selection of this build
- **IsPlatformNotSupportedAssembly**: Set to `true` to indicate this is a stub implementation
- **TargetFramework**: Automatically detected from the project file

These properties are captured from the binlog during the build analysis phase.
16 changes: 11 additions & 5 deletions src/SourceBrowser/SourceBrowser.sln
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SourceIndexServer.Tests", "
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "BinLogParser", "src\BinLogParser\BinLogParser.csproj", "{4EF5052C-7D88-49C6-B940-5190CECD070D}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "BinLogToSln", "src\BinLogToSln\BinLogToSln.csproj", "{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}"
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "BinLogToSln", "src\BinLogToSln\BinLogToSln.csproj", "{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "BinLogToSln.Tests", "src\BinLogToSln.Tests\BinLogToSln.Tests.csproj", "{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{C0B9CC1C-1EF1-4086-9532-E8679CBA4E62}"
ProjectSection(SolutionItems) = preProject
Expand Down Expand Up @@ -65,10 +67,14 @@ Global
{4EF5052C-7D88-49C6-B940-5190CECD070D}.Debug|Any CPU.Build.0 = Debug|Any CPU
{4EF5052C-7D88-49C6-B940-5190CECD070D}.Release|Any CPU.ActiveCfg = Release|Any CPU
{4EF5052C-7D88-49C6-B940-5190CECD070D}.Release|Any CPU.Build.0 = Release|Any CPU
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Debug|Any CPU.Build.0 = Debug|Any CPU
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Release|Any CPU.ActiveCfg = Release|Any CPU
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Release|Any CPU.Build.0 = Release|Any CPU
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Debug|Any CPU.Build.0 = Debug|Any CPU
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Release|Any CPU.ActiveCfg = Release|Any CPU
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Release|Any CPU.Build.0 = Release|Any CPU
{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand Down
219 changes: 145 additions & 74 deletions src/SourceBrowser/src/BinLogParser/BinLogReader.cs
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using Microsoft.Build.Framework;
using Microsoft.Build.Framework;
using Microsoft.Build.Logging.StructuredLogger;
using Microsoft.CodeAnalysis;
using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

namespace Microsoft.SourceBrowser.BinLogParser
{
Expand Down Expand Up @@ -33,26 +37,80 @@ public static IEnumerable<CompilerInvocation> ExtractInvocations(string binLogFi

var lazyResult = m_binlogInvocationMap.GetOrAdd(binLogFilePath, new Lazy<List<CompilerInvocation>>(() =>
{
// for old format logs, use the legacy reader - this is less desireable because it loads everything into memory
if (binLogFilePath.EndsWith(".buildlog", StringComparison.OrdinalIgnoreCase))
{
return ExtractInvocationsFromBuild(binLogFilePath);
}

// for new format logs, replay the log to avoid loading everything into memory
var invocations = new List<CompilerInvocation>();
var reader = new Microsoft.Build.Logging.StructuredLogger.BinLogReader();
var taskIdToInvocationMap = new Dictionary<(int, int), CompilerInvocation>();
var projectEvaluationToPropertiesMap = new Dictionary<int, Dictionary<string, string>>();
var projectInstanceToEvaluationMap = new Dictionary<int, int>();

void TryGetInvocationFromEvent(object sender, BuildEventArgs args)
{
var invocation = TryGetInvocationFromRecord(args, taskIdToInvocationMap);
Dictionary<string, string> projectProperties = null;
if (projectInstanceToEvaluationMap.TryGetValue(args.BuildEventContext?.ProjectInstanceId ?? -1, out var evaluationId) &&
projectEvaluationToPropertiesMap.TryGetValue(evaluationId, out var properties))
{
projectProperties = properties;

if (args is PropertyReassignmentEventArgs propertyReassignment)
{
properties[propertyReassignment.PropertyName] = propertyReassignment.NewValue;
}
else if (args is PropertyInitialValueSetEventArgs propertyInitialValueSet)
{
properties[propertyInitialValueSet.PropertyName] = propertyInitialValueSet.PropertyValue;
}
}


var invocation = TryGetInvocationFromRecord(args, taskIdToInvocationMap, projectProperties);
if (invocation != null)
{
invocation.SolutionRoot = Path.GetDirectoryName(binLogFilePath);
invocations.Add(invocation);
}
}

reader.TargetStarted += TryGetInvocationFromEvent;
reader.StatusEventRaised += (object sender, BuildStatusEventArgs e) =>
{
if (e?.BuildEventContext?.EvaluationId >= 0 &&
e is ProjectEvaluationFinishedEventArgs projectEvalArgs)
{
if (projectEvalArgs?.Properties is IDictionary<string, string> propertiesDict)
{
projectEvaluationToPropertiesMap[e.BuildEventContext.EvaluationId] =
new Dictionary<string, string>(propertiesDict, StringComparer.OrdinalIgnoreCase);
}
else
{
var properties = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);

foreach (KeyValuePair<string, string> property in projectEvalArgs.Properties)
{
properties[property.Key] = property.Value;
}

projectEvaluationToPropertiesMap[e.BuildEventContext.EvaluationId] = properties;
}
}
};

reader.ProjectStarted += (object sender, ProjectStartedEventArgs e) =>
{
if (e?.BuildEventContext?.EvaluationId >= 0 &&
e?.BuildEventContext?.ProjectInstanceId >= 0)
{
projectInstanceToEvaluationMap[e.BuildEventContext.ProjectInstanceId] = e.BuildEventContext.EvaluationId;
}
};

reader.TargetStarted += TryGetInvocationFromEvent;
reader.MessageRaised += TryGetInvocationFromEvent;

reader.Replay(binLogFilePath);
Expand All @@ -65,56 +123,61 @@ void TryGetInvocationFromEvent(object sender, BuildEventArgs args)
return result;
}

private static List<CompilerInvocation> ExtractInvocationsFromBuild(string logFilePath)
{
var build = Microsoft.Build.Logging.StructuredLogger.Serialization.Read(logFilePath);
var invocations = new List<CompilerInvocation>();
build.VisitAllChildren<Microsoft.Build.Logging.StructuredLogger.Task>(t =>
{
var invocation = TryGetInvocationFromTask(t);
if (invocation != null)
{
invocations.Add(invocation);
}
});
return invocations;
private static List<CompilerInvocation> ExtractInvocationsFromBuild(string logFilePath)
{
var build = Microsoft.Build.Logging.StructuredLogger.Serialization.Read(logFilePath);
var invocations = new List<CompilerInvocation>();
build.VisitAllChildren<Microsoft.Build.Logging.StructuredLogger.Task>(t =>
{
var invocation = TryGetInvocationFromTask(t, build);
if (invocation != null)
{
invocations.Add(invocation);
}
});

return invocations;
}

private static CompilerInvocation TryGetInvocationFromRecord(BuildEventArgs args, Dictionary<(int, int), CompilerInvocation> taskIdToInvocationMap)
{
int targetId = args.BuildEventContext?.TargetId ?? -1;
int projectId = args.BuildEventContext?.ProjectInstanceId ?? -1;
if (targetId < 0)
{
return null;
}

var targetStarted = args as TargetStartedEventArgs;
if (targetStarted != null && targetStarted.TargetName == "CoreCompile")
{
var invocation = new CompilerInvocation();
taskIdToInvocationMap[(targetId, projectId)] = invocation;
invocation.ProjectFilePath = targetStarted.ProjectFile;
return null;
}

var commandLine = GetCommandLineFromEventArgs(args, out var language);
if (commandLine == null)
{
return null;
}

CompilerInvocation compilerInvocation;
if (taskIdToInvocationMap.TryGetValue((targetId, projectId), out compilerInvocation))
{
compilerInvocation.Language = language == CompilerKind.CSharp ? LanguageNames.CSharp : LanguageNames.VisualBasic;
compilerInvocation.CommandLineArguments = commandLine;
Populate(compilerInvocation);
taskIdToInvocationMap.Remove((targetId, projectId));
}

return compilerInvocation;
private static CompilerInvocation TryGetInvocationFromRecord(BuildEventArgs args,
Dictionary<(int, int), CompilerInvocation> taskIdToInvocationMap,
Dictionary<string,string> projectProperties)
{
int targetId = args.BuildEventContext?.TargetId ?? -1;
int projectId = args.BuildEventContext?.ProjectInstanceId ?? -1;

if (targetId < 0)
{
return null;
}

if (args is TargetStartedEventArgs targetStarted && targetStarted.TargetName == "CoreCompile")
{
var invocation = new CompilerInvocation()
{
ProjectFilePath = targetStarted.ProjectFile,
ProjectProperties = projectProperties,
};
taskIdToInvocationMap[(targetId, projectId)] = invocation;
return null;
}

var commandLine = GetCommandLineFromEventArgs(args, out var language);
if (commandLine == null)
{
return null;
}

CompilerInvocation compilerInvocation;
if (taskIdToInvocationMap.TryGetValue((targetId, projectId), out compilerInvocation))
{
compilerInvocation.Language = language == CompilerKind.CSharp ? LanguageNames.CSharp : LanguageNames.VisualBasic;
compilerInvocation.CommandLineArguments = commandLine;
Populate(compilerInvocation);
taskIdToInvocationMap.Remove((targetId, projectId));
}

return compilerInvocation;
}

private static void Populate(CompilerInvocation compilerInvocation)
Expand All @@ -125,25 +188,33 @@ private static void Populate(CompilerInvocation compilerInvocation)
}
}

private static CompilerInvocation TryGetInvocationFromTask(Microsoft.Build.Logging.StructuredLogger.Task task)
{
var name = task.Name;
if (name != "Csc" && name != "Vbc" || ((task.Parent as Microsoft.Build.Logging.StructuredLogger.Target)?.Name != "CoreCompile"))
{
return null;
}

var language = name == "Csc" ? LanguageNames.CSharp : LanguageNames.VisualBasic;
var commandLine = task.CommandLineArguments;
commandLine = TrimCompilerExeFromCommandLine(commandLine, name == "Csc"
? CompilerKind.CSharp
: CompilerKind.VisualBasic);
return new CompilerInvocation
{
Language = language,
CommandLineArguments = commandLine,
ProjectFilePath = task.GetNearestParent<Microsoft.Build.Logging.StructuredLogger.Project>()?.ProjectFile
};
private static CompilerInvocation TryGetInvocationFromTask(Microsoft.Build.Logging.StructuredLogger.Task task, Microsoft.Build.Logging.StructuredLogger.Build build)
{
var name = task.Name;
if (name != "Csc" && name != "Vbc" || ((task.Parent as Microsoft.Build.Logging.StructuredLogger.Target)?.Name != "CoreCompile"))
{
return null;
}

var language = name == "Csc" ? LanguageNames.CSharp : LanguageNames.VisualBasic;
var commandLine = task.CommandLineArguments;
commandLine = TrimCompilerExeFromCommandLine(commandLine, name == "Csc"
? CompilerKind.CSharp
: CompilerKind.VisualBasic);

// Get the project once and reuse it
var project = task.GetNearestParent<Microsoft.Build.Logging.StructuredLogger.Project>();

var invocation = new CompilerInvocation
{
Language = language,
CommandLineArguments = commandLine,
ProjectFilePath = project?.ProjectFile,
ProjectProperties = project?.GetEvaluation(build)?.GetProperties() ?? new Dictionary<string, string>(),
};


return invocation;
}

public static string TrimCompilerExeFromCommandLine(string commandLine, CompilerKind language)
Expand Down
Loading
Loading