Skip to content

Commit bc21f08

Browse files
committed
Fix some logic
1 parent 3d5254c commit bc21f08

File tree

3 files changed

+210
-70
lines changed

3 files changed

+210
-70
lines changed

docs/detectors/sbt-technical-deep-dive.md

Lines changed: 97 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,10 @@ protected override async Task OnPrepareDetectionAsync(IObservableDirectoryWalker
5858

5959
**CLI Detection Logic** (`SbtCommandService.SbtCLIExistsAsync`):
6060
- Primary command: `sbt`
61-
- Fallback commands: `sbt.bat` (Windows)
62-
- Verification: Runs `sbt sbtVersion` to confirm functional installation
61+
- Coursier fallback: `C:\Users\{user}\AppData\Local\Coursier\data\bin\sbt.bat` (Windows)
62+
- Verification: Runs `sbt --version` to confirm functional installation
63+
- **Critical Fix**: Uses `--version` (not `sbtVersion`) because `--version` works without a project directory
64+
- `sbtVersion` command requires an active project context, causing failures in subprocess environment
6365

6466
This prevents expensive file processing if SBT isn't available.
6567

@@ -81,21 +83,30 @@ var buildDirectory = new DirectoryInfo(Path.GetDirectoryName(buildSbtFile.Locati
8183
#### Command Execution
8284

8385
```csharp
84-
var cliParameters = new[] {
85-
$"\"dependencyTree; export compile:dependencyTree > {this.BcdeSbtDependencyFileName}\""
86-
};
86+
var cliParameters = new[] { "dependencyTree" };
8787
```
8888

8989
**Command Breakdown**:
90-
- `dependencyTree` - Invokes the sbt-dependency-graph plugin to analyze dependencies
91-
- `;` - SBT command separator (sequential execution)
92-
- `export compile:dependencyTree` - Exports the compile-scope dependency tree as text
93-
- `> bcde.sbtdeps` - Redirects output to a temporary file
90+
- `dependencyTree` - Invokes the built-in dependency tree analysis task
91+
- Outputs dependency tree to stdout in a format compatible with Maven's tree format
92+
- Each line contains tree structure markers (`|`, `+-`) followed by coordinates
93+
94+
**SBT Output Example**:
95+
```
96+
[info] test-project:test-project_2.13:1.0.0 [S]
97+
[info] +-com.google.guava:guava:32.1.3-jre
98+
[info] | +-com.google.code.findbugs:jsr305:3.0.2
99+
[info] | +-com.google.guava:failureaccess:1.0.1
100+
[info] |
101+
[info] +-org.apache.commons:commons-lang3:3.14.0
102+
[info]
103+
[success] Total time: 0 s
104+
```
94105

95106
**Why This Approach?**:
96-
- SBT's dependency tree output is too verbose for stdout parsing (includes SBT's own startup messages, warnings, etc.)
97-
- The `export` task generates clean, parseable output without SBT metadata
98-
- Writing to a file allows reliable parsing and cleanup
107+
- `dependencyTree` is a standard SBT task (no plugin required)
108+
- Output includes SBT metadata (`[info]` prefixes, startup messages) which is filtered downstream
109+
- Captures compile-scope dependencies which are the most relevant for security scanning
99110

100111
#### Timeout Management
101112

@@ -130,7 +141,38 @@ if (result.ExitCode != 0)
130141

131142
**Failure Registration**: The detector records parse failures instead of crashing, allowing the scan to continue with other files.
132143

133-
### 4. Dependency Parsing (`ParseDependenciesFile`)
144+
### 4. Output Filtering (`GenerateDependenciesFileAsync` - cleanup phase)
145+
146+
After SBT execution, the raw output is cleaned to prepare for Maven parsing:
147+
148+
```csharp
149+
var cleanedLines = allLines
150+
.Select(line => Regex.Replace(line, @"\s*\[.\]$", string.Empty)) // Remove [S] suffixes
151+
.Select(line => Regex.Replace(line, @"^\[info\]\s*|\[warn\]\s*|\[error\]\s*", string.Empty))
152+
.Select(line => Regex.Replace(line, @"_\d+\.\d+(?=:)", string.Empty)) // Remove Scala version _2.13
153+
.Where(line => Regex.IsMatch(line, @"^[\s|\-+]*[a-z0-9\-_.]*\.[a-z0-9\-_.]+:[a-z0-9\-_.,]+:[a-z0-9\-_.]+"))
154+
.Select(line => /* Insert packaging 'jar' in correct position */)
155+
.ToList();
156+
```
157+
158+
**Filtering Pipeline**:
159+
1. **Remove `[S]` suffixes**: Root component markers (e.g., `test-project:test-project_2.13:1.0.0 [S]``test-project:test-project_2.13:1.0.0`)
160+
2. **Remove `[info]`/`[warn]`/`[error]` prefixes**: SBT metadata prefixes
161+
3. **Remove Scala version suffixes**: Artifact names include Scala version (e.g., `guava_2.13``guava`)
162+
4. **Filter to valid Maven coordinates**: Keep only lines matching pattern (requires dot in groupId per Maven convention)
163+
5. **Insert default packaging**: Convert `group:artifact:version` to `group:artifact:jar:version` for Maven parser compatibility
164+
165+
**Key Insight**: Tree structure characters (`|`, `+-`) are PRESERVED because the Maven parser needs them to understand dependency relationships.
166+
167+
**Output After Filtering**:
168+
```
169+
+-com.google.guava:guava:jar:32.1.3-jre
170+
| +-com.google.code.findbugs:jsr305:jar:3.0.2
171+
| +-com.google.guava:failureaccess:jar:1.0.1
172+
| +-org.apache.commons:commons-lang3:jar:3.14.0
173+
```
174+
175+
### 5. Dependency Parsing (`ParseDependenciesFile`)
134176

135177
```csharp
136178
public void ParseDependenciesFile(ProcessRequest processRequest)
@@ -145,28 +187,28 @@ public void ParseDependenciesFile(ProcessRequest processRequest)
145187

146188
#### Why This Works
147189

148-
SBT outputs dependency trees in a format similar to Maven's `mvn dependency:tree`:
190+
SBT outputs dependency trees in a format compatible with Maven's `mvn dependency:tree`:
149191

150192
```
151-
org.scala-lang:scala-library:2.13.8
152-
+-com.typesafe:config:1.4.2
153-
+-org.scala-lang.modules:scala-parser-combinators_2.13:2.1.1
154-
+-org.scala-lang:scala-library:2.13.6
193+
com.google.guava:guava:jar:32.1.3-jre
194+
| +-com.google.code.findbugs:jsr305:jar:3.0.2
195+
| +-com.google.guava:failureaccess:jar:1.0.1
155196
```
156197

157198
**Maven Parser Compatibility**:
158-
- Tree structure uses `+-` and `\-` for branches
159-
- Artifacts use Maven coordinates: `groupId:artifactId:version`
160-
- Indentation represents dependency hierarchy
161-
- Supports scope modifiers (compile, test, provided)
199+
- Tree structure uses `|` and `+-` for branches (preserved from SBT output)
200+
- Artifacts use Maven coordinates: `groupId:artifactId:jar:version`
201+
- Indentation and branch markers determine dependency hierarchy
202+
- Root component is the project itself; nested components are dependencies
162203

163204
The `MavenStyleDependencyGraphParserService`:
164-
1. Parses each line to extract group:artifact:version
165-
2. Uses indentation to determine parent-child relationships
166-
3. Creates `MavenComponent` instances
167-
4. Registers components with the `IComponentRecorder` with proper graph edges
205+
1. Parses first non-empty line as root component
206+
2. For subsequent lines, extracts tree depth from indentation/markers
207+
3. Uses depth to determine parent-child relationships
208+
4. Creates `MavenComponent` instances with proper Maven coordinates
209+
5. Registers components with the `IComponentRecorder` with proper graph edges
168210

169-
### 5. Component Registration
211+
### 6. Component Registration
170212

171213
Inside `MavenStyleDependencyGraphParserService.Parse()`:
172214

@@ -184,30 +226,21 @@ singleFileComponentRecorder.RegisterUsage(
184226
- **Transitive dependencies**: Indirect dependencies pulled in by root deps (linked via `parentComponentId`)
185227
- **Component Identity**: Uses Maven's `groupId:artifactId:version` as the unique identifier
186228

187-
### 6. Cleanup (`OnDetectionFinished`)
229+
### 7. Cleanup (File Deletion in `OnFileFoundAsync`)
188230

189231
```csharp
190-
protected override Task OnDetectionFinished()
232+
protected override async Task OnFileFoundAsync(ProcessRequest processRequest, IDictionary<string, string> detectorArgs, CancellationToken cancellationToken = default)
191233
{
192-
foreach (var processRequest in this.processedRequests)
193-
{
194-
var dependenciesFilePath = Path.Combine(
195-
Path.GetDirectoryName(processRequest.ComponentStream.Location),
196-
this.sbtCommandService.BcdeSbtDependencyFileName);
197-
198-
if (File.Exists(dependenciesFilePath))
199-
{
200-
this.Logger.LogDebug("Deleting {DependenciesFilePath}", dependenciesFilePath);
201-
File.Delete(dependenciesFilePath);
202-
}
203-
}
234+
this.sbtCommandService.ParseDependenciesFile(processRequest);
235+
File.Delete(processRequest.ComponentStream.Location);
236+
await Task.CompletedTask;
204237
}
205238
```
206239

207240
**Temporary File Management**:
208-
- Each `build.sbt` generates a `bcde.sbtdeps` file in its directory
209-
- All temporary files are tracked in `processedRequests`
210-
- Cleanup occurs after all detectors finish (via `FileComponentDetectorWithCleanup` lifecycle)
241+
- The detector does NOT create temporary files on disk during normal operation
242+
- File writing is internal to `GenerateDependenciesFileAsync()` but files are deleted immediately after parsing
243+
- This approach keeps the filesystem clean and prevents accumulation of temp files
211244

212245
## Dependency Injection
213246

@@ -331,36 +364,45 @@ Per Component Detection lifecycle, all new detectors start as `IDefaultOffCompon
331364

332365
**Alternative**: Parse stdout directly
333366

334-
**Problem**: SBT stdout is polluted with:
367+
**Problem**: SBT stdout is polluted with metadata that needs filtering:
335368
```
336369
[info] Loading settings for project...
337370
[info] Compiling 1 Scala source...
338371
[info] Done compiling.
339-
org.scala-lang:scala-library:2.13.8 <-- Actual data we want
372+
[info] +-com.google.guava:guava:32.1.3-jre <-- Actual data with [info] prefix
340373
```
341374

342-
**Solution**: `export` task + file redirection gives clean, parseable output
375+
**Solution**: Capture stdout, then apply multi-stage filtering to clean output before parsing
343376

344377
## Performance Characteristics
345378

346379
### Bottlenecks
347380

348-
1. **SBT Startup**: 2-5 seconds per invocation (JVM warmup)
349-
2. **Dependency Resolution**: First run downloads artifacts (can be minutes)
350-
3. **Plugin Compilation**: `dependencyTree` plugin must compile on first use
381+
1. **SBT Startup**: 10-15 seconds per invocation (JVM warmup + dependency resolution)
382+
2. **First Build**: Downloads SBT, plugins, and dependencies (can be minutes on first run)
383+
3. **Dependency Traversal**: Building the complete dependency tree for complex projects
384+
385+
### Observed Performance (Test Project)
386+
387+
For a simple Scala project with 8 direct/transitive dependencies:
388+
- **Total detection time**: ~14 seconds
389+
- **SBT execution time**: ~13 seconds (majority of time)
390+
- **Parsing time**: <100ms
391+
- **Components detected**: 8 (7 explicit + 1 implicit)
351392

352393
### Optimizations
353394

354395
- **CLI Availability Check**: Short-circuits if SBT missing (avoids processing all files)
355-
- **Timeout Configuration**: Prevents hanging on problematic projects
356-
- **Batch Cleanup**: Deletes temp files once at end instead of per-file
396+
- **Timeout Configuration**: Prevents hanging on problematic projects via `SbtCLIFileLevelTimeoutSeconds`
397+
- **Efficient Filtering**: Regex-based filtering reduces memory usage on large dependency trees
357398

358399
### Scaling Considerations
359400

360401
For monorepos with 100+ SBT projects:
361-
- Total scan time ≈ N × (SBT startup time + dependency resolution)
362-
- Recommended: Use `SbtCLIFileLevelTimeoutSeconds` to cap max time per project
363-
- Potential future enhancement: Parallel execution of independent projects
402+
- Total scan time ≈ N × 13-15 seconds per project
403+
- **Recommendation**: Use `SbtCLIFileLevelTimeoutSeconds` (e.g., 60 seconds) to cap max time per project
404+
- **Future enhancement**: Parallel execution of independent projects (detector already supports async)
405+
- **Cache potential**: Could cache `.ivy2` directory between runs to skip artifact downloads
364406

365407
## Error Scenarios Handled
366408

docs/detectors/sbt.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,24 @@
44

55
SBT detection depends on the following to successfully run:
66

7-
- SBT CLI as part of your PATH. `sbt` should be runnable from a given command line.
8-
- sbt-dependency-graph plugin (recommended to be added globally or in the project's `project/plugins.sbt`).
9-
- One or more `build.sbt` files.
7+
- SBT CLI available via system PATH or Coursier distribution
8+
- On Windows, detector checks: `sbt` command, then `C:\Users\{user}\AppData\Local\Coursier\data\bin\sbt.bat`
9+
- On other platforms, checks system PATH for `sbt` command
10+
- One or more `build.sbt` files
11+
12+
**Note**: The `sbt-dependency-graph` plugin is no longer required. The detector uses SBT's built-in `dependencyTree` task.
1013

1114
## Detection strategy
1215

13-
SBT detection is performed by running `sbt "dependencyTree; export compile:dependencyTree > bcde.sbtdeps"` for each build.sbt file and parsing the results. The detector leverages the same Maven-style dependency graph parser used by the Maven detector, as SBT dependencies use Maven coordinates (groupId:artifactId:version).
16+
SBT detection is performed by running `sbt dependencyTree` for each `build.sbt` file and parsing the tree output. The detector applies a multi-stage filtering process to clean the output:
17+
18+
1. Removes SBT metadata (`[info]`, `[warn]`, `[error]` prefixes)
19+
2. Removes Scala version suffixes from artifact names (e.g., `_2.13`)
20+
3. Removes root component markers (`[S]` suffix)
21+
4. Validates Maven coordinates (requires at least one dot in groupId per Maven convention)
22+
5. Inserts default `jar` packaging to match Maven coordinate format: `group:artifact:jar:version`
23+
24+
The detector leverages the same Maven-style dependency graph parser used by the Maven detector, as SBT dependencies use Maven coordinates (groupId:artifactId:version) and output in a compatible tree format.
1425

1526
Components are registered as Maven components since Scala projects publish to Maven repositories and use the same artifact coordinate system.
1627

@@ -20,13 +31,10 @@ Full dependency graph generation is supported.
2031

2132
## Known limitations
2233

23-
- SBT detection will not run if `sbt` is unavailable in the PATH.
24-
- The sbt-dependency-graph plugin must be available. For best results, install it globally in `~/.sbt/1.0/plugins/plugins.sbt`:
25-
```scala
26-
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.10.0-RC1")
27-
```
28-
- Only the `compile` configuration is scanned by default. Test dependencies may be detected as development dependencies if they appear in the dependency tree output.
29-
- Multi-project builds (nested `build.sbt` files) are detected, with parent projects taking precedence.
34+
- SBT detection will not run if `sbt` CLI is not available in the system PATH or Coursier distribution
35+
- Only the compile-scope dependencies are scanned by default (test dependencies may be detected as development dependencies if they appear in the dependency tree output)
36+
- Multi-project builds (nested `build.sbt` files) are detected, with parent projects taking precedence
37+
- First invocation of SBT may be slow due to JVM startup and dependency resolution; subsequent runs benefit from cached dependencies
3038

3139
## Environment Variables
3240

0 commit comments

Comments
 (0)