-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[Experiment] Introduce DependencyFile#precedence to control graph generation #12816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
edffa0a
to
d8ee1c9
Compare
d56da06
to
f3d7a8f
Compare
@@ -217,6 +217,9 @@ def lockfile_dependencies | |||
dependencies << dep | |||
end | |||
|
|||
# If the lockfile has at least one dependency, then it should precedent over any other files | |||
# for its directory when building a graph. | |||
T.must(lockfile).precedence = 1 if T.must(lockfile).dependencies.any? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts:
- as we discussed over zoom,
precedence
is a bit of a hack / proxy forlockfile
but without having to use that name which in this context is confusing. that said it's still what it means – i feel like this ought to be set when the DepFile is initialized, because we should know at init whether a file is a lockfile or not. In this file we can set it in thelockfile
method i think. - this is straightforward for bundler and maybe not genericizable to other ecosystems
- also, thoughts on maybe calling it
priority
? cos then we could be like,
def priority?
dependencies.any? && self.priority
end
- (Obv this method still works if we call it precendence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this in d4e3195, but it has caused quite a lot of collateral test failures on first pass
i feel like this ought to be set when the DepFile is initialized
This ends up being the complexity - the DependencyFile is actually initialised in a separate process and then results of it are serialized into the job_definition
and then deserialised back into an object in the Updater process.
Relaying the priority
via this serde roundtrip is totally doable and sort of feels like the 'right' way, but I'm nervous about making this deep change. This is something we only really care about parse time1 so we can just assign it via a setter and punt on dealing with the side effects of the 'right' way.
In this file we can set it in the lockfile method i think
This is probably the better option, it's slightly less precise in that the priority remains unset from fetch-time to parse-time but other than it feeling weird to set an attribute via a setter vs init, it's the just-in-time for us using it 🤷🏻
def priority?
dependencies.any? && self.priority
end
This method turned out to be tricky to implement due to the serde roundtrip where we need to pass the @priority
/ self.priority
carefully since dependencies.any?
is false until the parsing is actually done.
This wrinkle does make me think that 'promoting' the lockfile to priority 1 once we know it is a/ there and b/ has dependencies is kind of useful semantically but, this is offset by the advantages of just filtering out any file without dependencies assigned to it in my follow-up commit: fc41afd
This ends up giving us vendor/support file filtering 'for free' without the grapher having to start to learn about those concepts - if it doesn't have dependencies assigned, we don't see it.
Footnotes
-
It's a sidebar / deep cut, but the separate fetch/update processes isn't necessary anymore and could be gotten rid of so it's also slightly throwaway work. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg okay yeah way too much work to do it this way. maybe split the baby in half for now? i.e.
- don't modify the initializer because that's a big annoying change
- but maybe still set the priority when the
@lockfile
is first fetched & cached? that way we're not setting it within the dependency parser method
either way i approve
f3d7a8f
to
f41a8cc
Compare
4bc309d
to
d4e3195
Compare
What are you trying to accomplish?
The Dependabot updater needs to know about quite a lot of files to do its job, often needing information from lockfiles about dependencies in order to affect changes in manifests.
When it comes to graphing the dependencies used by a project, we generally only need to examine lockfiles for the relevant information as they will know things manifest do not provide like:
Example
If we look at the snapshots being generated for a Ruby project by Dependency Graph's existing source-based parser for the Sinatra example in our tests, we see 28 dependencies, all of which are attributed to the Gemfile.lock:

If we take the submission payload generated by our experiment so far we see 32 dependencies, with 4 duplicates attributed to the Gemfile:

Which is the right approach?
The Dependency Submission API is mainly used by projects which do not have exhaustive lockfiles to fill in blanks our source-based parsing cannot cover - as a result Bundler is in a funny spot where we accept snapshots for it but until now there wasn't much real-world incentive build them so as our source-based parsing was good enough and used bundler internals in much the same way Dependabot does.
Bundler is also something of a bystander in this experiment as we're less interested in making it work fully than other ecosystems, but all things being equal it makes most sense to align with the current source-based parsing in terms of which files are submitted.
Anything you want to highlight for special attention from reviewers?
Rather than bubble Bundler-specific knowledge up into the submission components, I've introduced a simple precedence flag on
Dependabot::DependencyFile
. When we encounter a lockfile with 1 or more dependencies, we set it to precedence of1
so it supersedes any manifests.This is an effort to have a simple but generic way for us to limit the files used for graphing as required within the ecosystem's file parsers.
Dependabot currently fails fast when attempting to parse Bundler projects without a Gemfile ( or gemspec ), so I haven't added tests for these scenarios and just updated our existing tests to validate that only the lockfiles make it into the submission.
How will you know you've accomplished your goal?
When we merge this, the payloads we submit will only show 28 dependencies instead of 32, omitting the Gemfile records.
Checklist