-
Notifications
You must be signed in to change notification settings - Fork 0
Spec wagon perf #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| try { | ||
| ensureCacheDirectory(); | ||
| Resource jarResource = new Resource(jarResourceName); | ||
| InputStream input = getInputStream(jarResource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefetching JAR files opens an HTTP InputStream without ever closing it, leading to leaked HTTP connections under both success and failure scenarios.
The code calls getInputStream(jarResource) and writes its contents to disk without using a try-with-resources or finally block to close the stream.
Consider wrapping the InputStream in a try-with-resources statement so that it is automatically closed after use, preventing connection leaks.
- InputStream input = getInputStream(jarResource);
+ try (InputStream input = getInputStream(jarResource)) {
@@
- while ((bytesRead = input.read(buffer)) != -1) {
+ while ((bytesRead = input.read(buffer)) != -1) {
}
+ }🚀 Reply to ask Macroscope to explain or update this suggestion.
👍 Helpful? React to give us feedback.
| if (resourceName.endsWith(".sha1")) { | ||
| String originalResource = resourceName.substring(0, resourceName.length() - 5); // Remove .sha1 suffix | ||
| String cachedSha1 = sha1Cache.get(originalResource); | ||
| if (cachedSha1 != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getIfNewer method always returns true on cache hits, failing to honor the timestamp parameter. This violates the Wagon getIfNewer contract, causing callers to believe a download occurred (or is needed) even when the resource is up-to-date.
The method returns true unconditionally on cache hits (.sha1 and .jar cases) without comparing the provided timestamp against any remote or cached last-modified time.
Consider returning false when a cache hit occurs and no new download is needed (i.e., the resource is not newer than the timestamp). You may also need to fetch and compare a last-modified timestamp (via a HEAD request) before deciding to download or skip.
- return true;
+ return false;
- return true;
+ return false;🚀 Reply to ask Macroscope to explain or update this suggestion.
👍 Helpful? React to give us feedback.
| cmd="mvn $MVN_ACTION -B -Dmaven.test.skip=true $MVN_FLAGS" | ||
|
|
||
| touch $run_dir/settings.xml | ||
| mkdir -p $run_dir/.mvn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In run_test.sh, multiple unquoted variable expansions and command constructions can lead to word-splitting, globbing, and unexpected behavior when paths or arguments contain spaces or special characters:
run_test.sh:125uses unquoted$run_dir,$SCRIPT_DIR, and$repo_dirin commands likemkdir -pand Docker volume mounts.run_test.sh:123builds a single unquotedcmdstring and executes it via$cmd, allowing argument splitting or injection.run_test.sh:160invokescp -rf $TEST_DIR/. "$RUN_DIR"without quoting$TEST_DIR, validating its value, or using--, which can misinterpret options or copy unintended files.
Consider quoting all variable expansions, using arrays for command construction (e.g. cmd=(...) and executing with "${cmd[@]}"), adding -- to cp, and validating inputs to ensure paths and arguments are handled safely.
- mkdir -p $run_dir/.mvn
+ mkdir -p "$run_dir/.mvn"
- (cd $SCRIPT_DIR && ./gradlew publishToMavenLocal -Dmaven.repo.local="$repo_dir")
+ (cd "$SCRIPT_DIR" && ./gradlew publishToMavenLocal -Dmaven.repo.local="$repo_dir")
- echo "$EXTENSIONS_XML" > $run_dir/.mvn/extensions.xml
+ echo "$EXTENSIONS_XML" > "$run_dir/.mvn/extensions.xml"
- echo "$SETTINGS_XML" > $run_dir/settings.xml
+ echo "$SETTINGS_XML" > "$run_dir/settings.xml"
- -v $run_dir:/src \
+ -v "$run_dir:/src" \
- cd $run_dir
+ cd "$run_dir"
- cmd="$cmd -Dmaven.repo.local=$repo_dir"
+ cmd="$cmd -Dmaven.repo.local=$repo_dir"
- $cmd | tee -a "$run_dir"/test.log
+ eval "$cmd" | tee -a "$run_dir"/test.log🚀 Reply to ask Macroscope to explain or update this suggestion.
👍 Helpful? React to give us feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue on line in artifactregistry-maven-wagon/src/main/java/com/google/cloud/artifactregistry/wagon/ArtifactRegistryWagon.java:80:
Concurrent calls to prefetchJarFile for the same POM can submit multiple prefetch tasks that both see the cache miss and write to the same .tmp file, leading to corruption of the temporary cache file.
Because the exists() check and submission are not synchronized, two threads can both decide to prefetch and use the same temp file path concurrently.
Consider tracking in-progress prefetches using a concurrent set so that only the first request for a given JAR submits a task. Remove the resource from the set once prefetch completes or fails.
+private static final Set<String> prefetchInProgress = ConcurrentHashMap.newKeySet();
+ private void prefetchJarFile(String pomResourceName) {
+ if (!pomResourceName.endsWith(".pom")) {
+ return;
+ }
+ String jarResourceName = pomResourceName.substring(0, pomResourceName.length() - 4) + ".jar";
+ if (!prefetchInProgress.add(jarResourceName)) {
+ return;
+ }
+ File cachedJarFile = getCachedJarFile(jarResourceName);
@@
jarPrefetchExecutor.submit(() -> {
try {
@@
System.out.println("PREFETCHED JAR: " + jarResourceName);
+ prefetchInProgress.remove(jarResourceName);
@@
} catch (Exception e) {
@@
if (tempFile.exists()) {
tempFile.delete();
}
+ prefetchInProgress.remove(jarResourceName);
}
});🚀 Reply to ask Macroscope to explain or update this suggestion.
👍 Helpful? React to give us feedback.
Add local caching, SHA-1 in-memory cache, background JAR prefetching, and Bilt-specific URL routing in
com.google.cloud.artifactregistry.wagon.ArtifactRegistryWagonto improve spec wagon perf/tmp/biltarwagonlocal on-disk cache, and a backgroundExecutorServicefor JAR prefetching inArtifactRegistryWagon.java.GoogleRepository.constructURLto routecom/bilt/,com/biltrewards/, andcom/biltcard/artifacts to the repository host path/single-scholar-280421/bilt-maven/<artifactPath>and send all other artifacts to Artifact Registryv1download forprojects/single-scholar-280421/locations/us/repositories/maven-central-cache.ArtifactRegistryWagon.getIfNewerfor.sha1from memory and.jarfrom disk; trigger asynchronous JAR prefetching on.pom.ArtifactRegistryWagon.getInputStreamand introduce helpers for SHA-1 computation, cache directory creation, cached JAR path mapping, and JAR prefetch.REPO_URLoverride and always use an explicit settings file inrun_test.sh, add documentation inREADME_bilt.md, and add a sample project intests/bilt-1-artifact/pom.xml.📍Where to Start
Start with
GoogleRepository.constructURLand the cache/prefetch flow inArtifactRegistryWagon.getIfNewerinartifactregistry-maven-wagon/src/main/java/com/google/cloud/artifactregistry/wagon/ArtifactRegistryWagon.java.Macroscope summarized d7486d6.