Skip to content

Conversation

@kenkania
Copy link

@kenkania kenkania commented Sep 23, 2025

Add local caching, SHA-1 in-memory cache, background JAR prefetching, and Bilt-specific URL routing in com.google.cloud.artifactregistry.wagon.ArtifactRegistryWagon to improve spec wagon perf

  • Implement an in-memory SHA-1 cache, a /tmp/biltarwagon local on-disk cache, and a background ExecutorService for JAR prefetching in ArtifactRegistryWagon.java.
  • Modify GoogleRepository.constructURL to route com/bilt/, com/biltrewards/, and com/biltcard/ artifacts to the repository host path /single-scholar-280421/bilt-maven/<artifactPath> and send all other artifacts to Artifact Registry v1 download for projects/single-scholar-280421/locations/us/repositories/maven-central-cache.
  • Add cache-serving logic in ArtifactRegistryWagon.getIfNewer for .sha1 from memory and .jar from disk; trigger asynchronous JAR prefetching on .pom.
  • Add URL print in ArtifactRegistryWagon.getInputStream and introduce helpers for SHA-1 computation, cache directory creation, cached JAR path mapping, and JAR prefetch.
  • Update test script to allow REPO_URL override and always use an explicit settings file in run_test.sh, add documentation in README_bilt.md, and add a sample project in tests/bilt-1-artifact/pom.xml.

📍Where to Start

Start with GoogleRepository.constructURL and the cache/prefetch flow in ArtifactRegistryWagon.getIfNewer in artifactregistry-maven-wagon/src/main/java/com/google/cloud/artifactregistry/wagon/ArtifactRegistryWagon.java.


Macroscope summarized d7486d6.

try {
ensureCacheDirectory();
Resource jarResource = new Resource(jarResourceName);
InputStream input = getInputStream(jarResource);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefetching JAR files opens an HTTP InputStream without ever closing it, leading to leaked HTTP connections under both success and failure scenarios.

The code calls getInputStream(jarResource) and writes its contents to disk without using a try-with-resources or finally block to close the stream.

Consider wrapping the InputStream in a try-with-resources statement so that it is automatically closed after use, preventing connection leaks.

-         InputStream input = getInputStream(jarResource);
+         try (InputStream input = getInputStream(jarResource)) {
@@
-         while ((bytesRead = input.read(buffer)) != -1) {
+           while ((bytesRead = input.read(buffer)) != -1) {
         }
+         }

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

if (resourceName.endsWith(".sha1")) {
String originalResource = resourceName.substring(0, resourceName.length() - 5); // Remove .sha1 suffix
String cachedSha1 = sha1Cache.get(originalResource);
if (cachedSha1 != null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getIfNewer method always returns true on cache hits, failing to honor the timestamp parameter. This violates the Wagon getIfNewer contract, causing callers to believe a download occurred (or is needed) even when the resource is up-to-date.

The method returns true unconditionally on cache hits (.sha1 and .jar cases) without comparing the provided timestamp against any remote or cached last-modified time.

Consider returning false when a cache hit occurs and no new download is needed (i.e., the resource is not newer than the timestamp). You may also need to fetch and compare a last-modified timestamp (via a HEAD request) before deciding to download or skip.

-          return true;
+          return false;
-        return true;
+        return false;

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

cmd="mvn $MVN_ACTION -B -Dmaven.test.skip=true $MVN_FLAGS"

touch $run_dir/settings.xml
mkdir -p $run_dir/.mvn
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In run_test.sh, multiple unquoted variable expansions and command constructions can lead to word-splitting, globbing, and unexpected behavior when paths or arguments contain spaces or special characters:

  • run_test.sh:125 uses unquoted $run_dir, $SCRIPT_DIR, and $repo_dir in commands like mkdir -p and Docker volume mounts.
  • run_test.sh:123 builds a single unquoted cmd string and executes it via $cmd, allowing argument splitting or injection.
  • run_test.sh:160 invokes cp -rf $TEST_DIR/. "$RUN_DIR" without quoting $TEST_DIR, validating its value, or using --, which can misinterpret options or copy unintended files.

Consider quoting all variable expansions, using arrays for command construction (e.g. cmd=(...) and executing with "${cmd[@]}"), adding -- to cp, and validating inputs to ensure paths and arguments are handled safely.

-    mkdir -p $run_dir/.mvn        
+    mkdir -p "$run_dir/.mvn"        
-        (cd $SCRIPT_DIR && ./gradlew publishToMavenLocal -Dmaven.repo.local="$repo_dir")
+        (cd "$SCRIPT_DIR" && ./gradlew publishToMavenLocal -Dmaven.repo.local="$repo_dir")
-        echo "$EXTENSIONS_XML" > $run_dir/.mvn/extensions.xml
+        echo "$EXTENSIONS_XML" > "$run_dir/.mvn/extensions.xml"
-    echo "$SETTINGS_XML" > $run_dir/settings.xml
+    echo "$SETTINGS_XML" > "$run_dir/settings.xml"
-            -v $run_dir:/src \
+            -v "$run_dir:/src" \
-        cd $run_dir
+        cd "$run_dir"
-        cmd="$cmd -Dmaven.repo.local=$repo_dir"
+        cmd="$cmd -Dmaven.repo.local=$repo_dir"
-    $cmd | tee -a "$run_dir"/test.log
+    eval "$cmd" | tee -a "$run_dir"/test.log

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue on line in artifactregistry-maven-wagon/src/main/java/com/google/cloud/artifactregistry/wagon/ArtifactRegistryWagon.java:80:

Concurrent calls to prefetchJarFile for the same POM can submit multiple prefetch tasks that both see the cache miss and write to the same .tmp file, leading to corruption of the temporary cache file.

Because the exists() check and submission are not synchronized, two threads can both decide to prefetch and use the same temp file path concurrently.

Consider tracking in-progress prefetches using a concurrent set so that only the first request for a given JAR submits a task. Remove the resource from the set once prefetch completes or fails.

+private static final Set<String> prefetchInProgress = ConcurrentHashMap.newKeySet();
+  private void prefetchJarFile(String pomResourceName) {
+    if (!pomResourceName.endsWith(".pom")) {
+      return;
+    }
+    String jarResourceName = pomResourceName.substring(0, pomResourceName.length() - 4) + ".jar";
+    if (!prefetchInProgress.add(jarResourceName)) {
+      return;
+    }
+    File cachedJarFile = getCachedJarFile(jarResourceName);
@@
       jarPrefetchExecutor.submit(() -> {
         try {
@@
         System.out.println("PREFETCHED JAR: " + jarResourceName);
+        prefetchInProgress.remove(jarResourceName);
@@
       } catch (Exception e) {
@@
         if (tempFile.exists()) {
           tempFile.delete();
         }
+        prefetchInProgress.remove(jarResourceName);
       }
     });

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants