Skip to content

[Experiment] Introduce DependencyFile#precedence to control graph generation #12816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

brrygrdn
Copy link
Contributor

@brrygrdn brrygrdn commented Aug 11, 2025

What are you trying to accomplish?

The Dependabot updater needs to know about quite a lot of files to do its job, often needing information from lockfiles about dependencies in order to affect changes in manifests.

When it comes to graphing the dependencies used by a project, we generally only need to examine lockfiles for the relevant information as they will know things manifest do not provide like:

  • the exact versions used
  • any transitive dependencies that are pulled in by direct dependencies
  • the relationships between dependencies as its ancestors decedents

Example

If we look at the snapshots being generated for a Ruby project by Dependency Graph's existing source-based parser for the Sinatra example in our tests, we see 28 dependencies, all of which are attributed to the Gemfile.lock:
Screenshot 2025-08-11 at 15 03 27

If we take the submission payload generated by our experiment so far we see 32 dependencies, with 4 duplicates attributed to the Gemfile:
Screenshot 2025-08-11 at 14 30 34

Which is the right approach?

The Dependency Submission API is mainly used by projects which do not have exhaustive lockfiles to fill in blanks our source-based parsing cannot cover - as a result Bundler is in a funny spot where we accept snapshots for it but until now there wasn't much real-world incentive build them so as our source-based parsing was good enough and used bundler internals in much the same way Dependabot does.

Bundler is also something of a bystander in this experiment as we're less interested in making it work fully than other ecosystems, but all things being equal it makes most sense to align with the current source-based parsing in terms of which files are submitted.

Anything you want to highlight for special attention from reviewers?

Rather than bubble Bundler-specific knowledge up into the submission components, I've introduced a simple precedence flag on Dependabot::DependencyFile. When we encounter a lockfile with 1 or more dependencies, we set it to precedence of 1 so it supersedes any manifests.

This is an effort to have a simple but generic way for us to limit the files used for graphing as required within the ecosystem's file parsers.

Dependabot currently fails fast when attempting to parse Bundler projects without a Gemfile ( or gemspec ), so I haven't added tests for these scenarios and just updated our existing tests to validate that only the lockfiles make it into the submission.

How will you know you've accomplished your goal?

When we merge this, the payloads we submit will only show 28 dependencies instead of 32, omitting the Gemfile records.

Checklist

  • I have run the complete test suite to ensure all tests and linters pass.
  • I have thoroughly tested my code changes to ensure they work as expected, including adding additional tests for new functionality.
  • I have written clear and descriptive commit messages.
  • I have provided a detailed description of the changes in the pull request, including the problem it addresses, how it fixes the problem, and any relevant details about the implementation.
  • I have ensured that the code is well-documented and easy to understand.

@brrygrdn brrygrdn requested a review from a team as a code owner August 11, 2025 14:13
@github-actions github-actions bot added the L: ruby:bundler RubyGems via bundler label Aug 11, 2025
@brrygrdn brrygrdn force-pushed the brrygrdn/dg-7449-ignore-gemfiles branch 2 times, most recently from edffa0a to d8ee1c9 Compare August 11, 2025 14:20
@@ -217,6 +217,9 @@ def lockfile_dependencies
dependencies << dep
end

# If the lockfile has at least one dependency, then it should precedent over any other files
# for its directory when building a graph.
T.must(lockfile).precedence = 1 if T.must(lockfile).dependencies.any?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts:

  1. as we discussed over zoom, precedence is a bit of a hack / proxy for lockfile but without having to use that name which in this context is confusing. that said it's still what it means – i feel like this ought to be set when the DepFile is initialized, because we should know at init whether a file is a lockfile or not. In this file we can set it in the lockfile method i think.
  2. this is straightforward for bundler and maybe not genericizable to other ecosystems
  3. also, thoughts on maybe calling it priority? cos then we could be like,
def priority?
  dependencies.any? && self.priority
end
  1. (Obv this method still works if we call it precendence)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed this in d4e3195, but it has caused quite a lot of collateral test failures on first pass

i feel like this ought to be set when the DepFile is initialized

This ends up being the complexity - the DependencyFile is actually initialised in a separate process and then results of it are serialized into the job_definition and then deserialised back into an object in the Updater process.

Relaying the priority via this serde roundtrip is totally doable and sort of feels like the 'right' way, but I'm nervous about making this deep change. This is something we only really care about parse time1 so we can just assign it via a setter and punt on dealing with the side effects of the 'right' way.

In this file we can set it in the lockfile method i think

This is probably the better option, it's slightly less precise in that the priority remains unset from fetch-time to parse-time but other than it feeling weird to set an attribute via a setter vs init, it's the just-in-time for us using it 🤷🏻

def priority?
  dependencies.any? && self.priority
end

This method turned out to be tricky to implement due to the serde roundtrip where we need to pass the @priority / self.priority carefully since dependencies.any? is false until the parsing is actually done.

This wrinkle does make me think that 'promoting' the lockfile to priority 1 once we know it is a/ there and b/ has dependencies is kind of useful semantically but, this is offset by the advantages of just filtering out any file without dependencies assigned to it in my follow-up commit: fc41afd

This ends up giving us vendor/support file filtering 'for free' without the grapher having to start to learn about those concepts - if it doesn't have dependencies assigned, we don't see it.

Footnotes

  1. It's a sidebar / deep cut, but the separate fetch/update processes isn't necessary anymore and could be gotten rid of so it's also slightly throwaway work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg okay yeah way too much work to do it this way. maybe split the baby in half for now? i.e.

  • don't modify the initializer because that's a big annoying change
  • but maybe still set the priority when the @lockfile is first fetched & cached? that way we're not setting it within the dependency parser method

either way i approve

@brrygrdn brrygrdn force-pushed the brrygrdn/dg-7449-ignore-gemfiles branch from f3d7a8f to f41a8cc Compare August 13, 2025 15:24
@brrygrdn brrygrdn force-pushed the brrygrdn/dg-7449-ignore-gemfiles branch from 4bc309d to d4e3195 Compare August 13, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L: ruby:bundler RubyGems via bundler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants