-
Notifications
You must be signed in to change notification settings - Fork 711
Prototype for source repository support in cabal.project #4289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@alexbiehl, thanks for your PR! By analyzing the history of the files in this pull request, we identified @dcoutts, @ezyang and @Ericson2314 to be potential reviewers. |
This allows to write ``` source-repository-package type: git location: https://github.com/.... subdir: somesubdirinyourrepository tag: 481cabcab15501f300728bfa4f6de517d4492777 ```
d0944b1
to
d1a92b9
Compare
The patch looks pleasingly short, but I don't know enough about However, we're going to need tests and documentation. It would also probably be wise to refactor the brancher functions out from Distribution.Client.Get into their own module. |
One question: can I have same repository specified twice (with different subdir and/or different commits)? |
@ezyang I have put the brancher in its own module. One thing that bothers me is that I think I do too much in Another point is the calculation of the directory: Currently I take the And one more: I think we should allow a |
@phadej I guess so. Also see my last answer above for open questions. |
@alexbiehl yeah, it solves the different subdirectories, but not different commits. Maybe there should be comment somewhere where the repositories are cloned: hash based directory would make sense. Wouldn't worrry about cleaning them up and/or need to reclone on e.g. tag change (that's very rare operation and acceptable, but would mention that in docs). |
Yes, this is interesting. I did some more code reading and here is how I think this works:
If we go by how the code is structured, we definitely should NOT be fetching the code from the repo at BY THE WAY, why don't remote tarballs from package repositories have this problem? In this case, we are expected to have an index which actually contains all of the Cabal files, to be able to discover this tarball. So in this case, we already have the package description without any extra downloading. |
I see you pushed a commit to hash it. I think this is reasonable as long as we don't expect users to ever want to interact with these checked out repos. Stack supports Git repos, no? I wonder what they do.
Yes, I agree. This shouldn't be a difficult fix! |
edit: sorry I misunderstood, I basically repeated what you said @ezyang Its even worse in case of a So my current understanding is:
And: |
Yeah. So I think we've agreed, the existing code thought this functionality should be implemented in a particular way, but that strategy doesn't actually work out of the box... ...but now I've just thought of a dastardly idea! So, let's think about the remote Hackage repository case: how did we get the package description here? It's because we downloaded an external index using "cabal update". So, what if we introduced a similar command for fetching package descriptions from remote tarballs / source repos / etc? The external motivation for this command is that it is the, "Download all the package sources info necessary, so that I can do the rest of my development offline." I don't really know where you'd store this info (maybe it'll end up looking like the repo hashes you are doing here), and you'd probably want |
@ezyang sounds like a nice Symetry! What would be such a commands name? Edit: I would like to give it shot. What you suggest would be something like "fork-my-repositories-and-download-my-remote-tarballs-so-I-can-use-new-build-on-my-project", right? Sounds like "new-fetch" |
Also, the freeze file should track the mapping of git refs (branches, tags) to revs (commit hash) and tree hash (since Do we already track the Hackage index revision similarly? Granted, that is far less important as Hackage only grows bigger. |
@alexbiehl I like |
Indeed, the existing design allows for remote tarballs, and I think remote scm's here can fit that same pattern. It does mean we have to download the thing first even to extract the .cabal file. Only local dirs and hackage archives can short-cut that part by cheaply being able to read the .cabal file. I'm not convinced this needs splitting out into a separate fetch step/command but I'm prepared to be convinced. As far as I can see if we follow the remote tarball pattern then it should be the same. And @Ericson2314 makes a good point about freezing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think this is going in the right direction, but there's a few things we need to think about.
As I said in a comment already, I think it's right that we do have to fetch the repo up front just to be able to read the .cabal file. I don't see any sane alternative, and it's the approach I was imagining for both source repos and remote tarballs.
But we do need to handle the case of the user editing the source repo input info better.
Now currently the strategy here seems to be to hash the source repo input info and do a fresh clone. And if the hash has not changed and the local checkout exists then it is assumed to be immutable.
But that means if the user edits the tag/checkout or other info hen we'll fetch a complete new copy of the repo. That does have a certain simplicity to it but I don't think it's sustainable; it's too expensive. So I think we need to be able to handle the incremental case: that there's already some checkout in the expected location and we need to synchronise that with the current info for that repo (e.g. changed checkout, changed subdirs, changed URL even).
Clearly this is more complicated and will need more features in the branchers. My initial suggestion is that we think of it as a synchronise operation: we pass the brancher impl the location of the current checkout and the info about what we want it to be (URL, tag etc) and ask it to cheaply check if it's already there and if not to synchronise it to the target info.
We also need to extend the branchers to be able to give us content hashes, as we currently do for hackage packages. We want to be able to treat source repo packages as non-local dependencies, including installing them into the store (and thus sharing them between runs in CI systems or between multiple projects). This means we need to be able to compute nix style hashes. For git that should be easy: the current checkout hash is perfect. But this obviously is brancher specific, so we'd need to extend the brancher API for that.
-- hashing. | ||
hashSourceRepo :: SourceRepo -> HashValue | ||
hashSourceRepo sourceRepo = | ||
hashValue $ Binary.encode (sourceRepo { repoSubdir = Nothing }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, this is just to pick the location of the repository or is this to calculate the source hash for the content of the repo. In which case I don't think this belongs in this module which is about the package id hashes which are based on content hashes.
@@ -0,0 +1,155 @@ | |||
module Distribution.Client.Get.Brancher where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we prefer explicit export lists. People introduce private utils during refactoring and shouldn't have to think about introducing an export list at the same time.
|
||
readSourcePackage verbosity distDirLayout (ProjectPackageRemoteRepo repo) = do | ||
let | ||
scmSrcDir = "scm-" ++ showHashValue (hashSourceRepo repo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this is a good idea. I think we ought to be able to pick generally non-clashing names here without having to use huge hashes.
So what's the issue? People said they wanted to be able to have the same repo but with two different checkouts (and presumably two different subdirs otherwise they've selecting two versions of the same package which isn't supported with local dirs either).
So I can see that picking non-clashing names is an issue, but on the other hand we don't want to have to do a fresh checkout when someone updates the project file to point to a different tag. In those cases we should simply be able to fetch & reset. Note that this may imply more functionality in the branchers than we have right now, since currently we can only get a fresh checkout I think right?
-- we only need this for error messages | ||
repoloc = fromMaybe "" (repoLocation repo) | ||
|
||
repoExists <- liftIO $ doesDirectoryExist destDir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, here is where we will need to think about the incremental case. Notice that we are in a Rebuild
monad. Currently, this readSourcePackage
will get re-run every time the cabal.project file changes. That could be adjusted so we only re-run when the input information for the source repo changes.
BTW, notice that for the local file case we do a monitorFiles [monitorFileHashed cabalFile]
to force this code to get re-run when the user edits that file. For source repos we don't need to worry about the user editing things under us I think.
-> readSourcePackage | ||
verbosity | ||
distDirLayout | ||
(ProjectPackageLocalDirectory pkgdir cabalFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to convert this into a project local package, we want to preserve the information that this package is from a source repo and not necessarily treat it as local. That is we should be able to treat it like a package from hackage.
If we do this then it means we can put packages built from repos into the store and share them between multiple projects. This would be a big win. But it also means we need a mechanism to obtain a hash of the sources in the repo. This does not have to be a content hash of the unpacked dir, any stable content hash will do. So for git for example, the current git checkout hash is perfect. For other repo types other methods may be needed.
One thing that might help here is if one of us implements the local or remote tarball case, since that covers the general pattern of when do we download, dealing with changes in the remote tarball URL (the only info for a tarball, obviously source repos have more), and also the issue about getting source hashes so we can treat these tarball packages as non-local deps. |
This isn't great for git's garbage collecting, but it's perfectly possible to just fetch everything to the same local repo, and let git discover all possible incremental downloading that way. |
Related: the branch that #4399 is part of. |
Only just noticed this PR. I've been using a thin wrapper around cabal called mafia for about a year now and I think the way it handles code not on Hackage seems somewhat better than what is proposed here. First and foremost, add a project does not require any changes to the host project's cabal file. |
What is the status of this? I would also really like to have this feature. |
Looks stalled. |
This would also be extremely useful for Clash since I recently added I mentioned this to @dcoutts recently and IIRC he suggested that he'd like part of the work for |
extra-packages is in |
The provenance-qualified imports proposal (ghc-proposals/ghc-proposals#115) is relevant. Note that even if that passes, nearly all the work on this branch remains relevant as the core trickiness is in the fetching logic. |
This is a try at #2189
Please lets have a discussion about this. I would like to have this feature!
This allows to write the following in your cabal.project file
And it gets picked up as a dependency to your project.