Prototype for source repository support in cabal.project #4289

alexbiehl · 2017-02-01T17:22:04Z

This is a try at #2189

Please lets have a discussion about this. I would like to have this feature!

This allows to write the following in your cabal.project file

source-repository-package
  type: git
  location: https://github.com/....
  subdir: somesubdirinyourrepository
  tag: 481cabcab15501f300728bfa4f6de517d4492777

And it gets picked up as a dependency to your project.

mention-bot · 2017-02-01T17:22:06Z

@alexbiehl, thanks for your PR! By analyzing the history of the files in this pull request, we identified @dcoutts, @ezyang and @Ericson2314 to be potential reviewers.

This allows to write ``` source-repository-package type: git location: https://github.com/.... subdir: somesubdirinyourrepository tag: 481cabcab15501f300728bfa4f6de517d4492777 ```

ezyang · 2017-02-02T00:10:03Z

The patch looks pleasingly short, but I don't know enough about readSourcePackage to assess if the implementation is correct. I'm inclined to accept this patch.

However, we're going to need tests and documentation. It would also probably be wise to refactor the brancher functions out from Distribution.Client.Get into their own module.

phadej · 2017-02-02T06:37:35Z

One question: can I have same repository specified twice (with different subdir and/or different commits)?

alexbiehl · 2017-02-02T06:40:12Z

@ezyang I have put the brancher in its own module. One thing that bothers me is that I think I do too much in readSourcePackage: Forking the repository and searching for a cabal file. Ideally I think I would fork the repositories (and later maybe download any remote tarballs) before findProjectPackages which then does all the heavy lifting with finding cabal files and such as it does now.

Another point is the calculation of the directory: Currently I take the repoTag as directory name which is not ideal I think: if some parameters change except for the tag we are not going to fork again! I would want a function hashSourceRepo :: SourceRepo -> HashValue.

And one more: I think we should allow a SourceRepo to take a list of sub directories. This way we can import multiple projects from a source repository with just one stanza in cabal.project. (When creating the hash for a SourceRepo we should exclude the repoSubdir field. So we can add new sub directories without forking again).

alexbiehl · 2017-02-02T06:41:24Z

@phadej I guess so. Also see my last answer above for open questions.

phadej · 2017-02-02T06:47:37Z

@alexbiehl yeah, it solves the different subdirectories, but not different commits. Maybe there should be comment somewhere where the repositories are cloned: hash based directory would make sense. Wouldn't worrry about cleaning them up and/or need to reclone on e.g. tag change (that's very rare operation and acceptable, but would mention that in docs).

ezyang · 2017-02-03T22:05:14Z

One thing that bothers me is that I think I do too much in readSourcePackage: Forking the repository and searching for a cabal file. Ideally I think I would fork the repositories (and later maybe download any remote tarballs) before findProjectPackages which then does all the heavy lifting with finding cabal files and such as it does now.

Yes, this is interesting. I did some more code reading and here is how I think this works:

readSourcePackage is called from phaseReadLocalPackages to get some basic information about the local dependency (as recorded by SourcePackage.) This includes the package description and the package ID, but NOT necessarily the source code: this is why it's an UnresolvedSourcePackage. Instead, readSourcePackage can say, "this package is available from a remote tarball RemoteTarballPackage or a RepoTarballPackage, and then there's some code later (FetchUtils, I think) which actually goes ahead and downloads it.
This results in some awkward tension with source-repository-package, since the most convenient mechanism for getting the Cabal package description is... to also get the rest of the code. This is why your suggestion is the REVERSE of what the code is nudging you towards: you want to manage downloading the repos before reading out the Cabal description files! Furthermore, PackageLocation doesn't even have a case for requesting a download of an SM package, although there is a TODO block in the data structure for this case.

If we go by how the code is structured, we definitely should NOT be fetching the code from the repo at readSourcePackage; rather, we should return an unresolved source package with a pointer to the real place, and then let the fetcher code later handle it. But then, how do we get our hands on the actual Cabal file? Perhaps we can implement a "single file" fetcher for these source repositories, where we grab just the Cabal file, with the intention of downloading the full repository later. But this will probably break offline mode if we actually ping the remote repo, even when we already have a cached local copy hanging around. So, I don't think the code is structured correctly, and some refactoring is necessary: perhaps, as you suggest, we do need some pre-download step to get the branches where they need to be. Maybe @dcoutts has some ideas.

BY THE WAY, why don't remote tarballs from package repositories have this problem? In this case, we are expected to have an index which actually contains all of the Cabal files, to be able to discover this tarball. So in this case, we already have the package description without any extra downloading.

ezyang · 2017-02-03T22:11:26Z

Another point is the calculation of the directory

I see you pushed a commit to hash it. I think this is reasonable as long as we don't expect users to ever want to interact with these checked out repos. Stack supports Git repos, no? I wonder what they do.

I think we should allow a SourceRepo to take a list of sub directories. This way we can import multiple projects from a source repository with just one stanza in cabal.project. (When creating the hash for a SourceRepo we should exclude the repoSubdir field. So we can add new sub directories without forking again).

Yes, I agree. This shouldn't be a difficult fix!

alexbiehl · 2017-02-03T22:31:58Z

edit: sorry I misunderstood, I basically repeated what you said

@ezyang Its even worse in case of a ProjectPackageLocalTarball, ProjectPackageRemoteTarball and ProjectPackageRemoteRepo: there isn't even a package id! Just a FilePath, a URI or a SourceRepo. So we have to download and unpack these before we call findProjectPackage to find the cabal files needed.

So my current understanding is:

Fetching just one file out of a scm repository or a from a tarball seems awefully complicated.
Forking and unpacking for these cases needs to be done before we look for project packages.

And: stack currently clones the repository but lets users tamper with the cloned source without any restrictions the last time I checked.

ezyang · 2017-02-03T22:41:51Z

Yeah. So I think we've agreed, the existing code thought this functionality should be implemented in a particular way, but that strategy doesn't actually work out of the box...

...but now I've just thought of a dastardly idea! So, let's think about the remote Hackage repository case: how did we get the package description here? It's because we downloaded an external index using "cabal update". So, what if we introduced a similar command for fetching package descriptions from remote tarballs / source repos / etc? The external motivation for this command is that it is the, "Download all the package sources info necessary, so that I can do the rest of my development offline." I don't really know where you'd store this info (maybe it'll end up looking like the repo hashes you are doing here), and you'd probably want new-build to automatically invoke this if it finds that it can't find the revision it needs, but it's one interesting idea about how to structure the logic here.

alexbiehl · 2017-02-04T09:16:31Z

@ezyang sounds like a nice Symetry! What would be such a commands name?

Edit: I would like to give it shot. What you suggest would be something like "fork-my-repositories-and-download-my-remote-tarballs-so-I-can-use-new-build-on-my-project", right? Sounds like "new-fetch"

Ericson2314 · 2017-02-06T06:02:45Z

Also, the freeze file should track the mapping of git refs (branches, tags) to revs (commit hash) and tree hash (since .git should be thrown away). [Btw, this idea I am stealing from Rust's Cargo.]

Do we already track the Hackage index revision similarly? Granted, that is far less important as Hackage only grows bigger.

ezyang · 2017-02-06T06:08:07Z

@alexbiehl I like fetch; and we already have a cabal fetch command which is intended to download sources for later installation, so this would just be added functionality.

dcoutts · 2017-02-21T23:43:42Z

Indeed, the existing design allows for remote tarballs, and I think remote scm's here can fit that same pattern. It does mean we have to download the thing first even to extract the .cabal file. Only local dirs and hackage archives can short-cut that part by cheaply being able to read the .cabal file.

I'm not convinced this needs splitting out into a separate fetch step/command but I'm prepared to be convinced. As far as I can see if we follow the remote tarball pattern then it should be the same.

And @Ericson2314 makes a good point about freezing.

dcoutts

Ok, I think this is going in the right direction, but there's a few things we need to think about.

As I said in a comment already, I think it's right that we do have to fetch the repo up front just to be able to read the .cabal file. I don't see any sane alternative, and it's the approach I was imagining for both source repos and remote tarballs.

But we do need to handle the case of the user editing the source repo input info better.

Now currently the strategy here seems to be to hash the source repo input info and do a fresh clone. And if the hash has not changed and the local checkout exists then it is assumed to be immutable.

But that means if the user edits the tag/checkout or other info hen we'll fetch a complete new copy of the repo. That does have a certain simplicity to it but I don't think it's sustainable; it's too expensive. So I think we need to be able to handle the incremental case: that there's already some checkout in the expected location and we need to synchronise that with the current info for that repo (e.g. changed checkout, changed subdirs, changed URL even).

Clearly this is more complicated and will need more features in the branchers. My initial suggestion is that we think of it as a synchronise operation: we pass the brancher impl the location of the current checkout and the info about what we want it to be (URL, tag etc) and ask it to cheaply check if it's already there and if not to synchronise it to the target info.

We also need to extend the branchers to be able to give us content hashes, as we currently do for hackage packages. We want to be able to treat source repo packages as non-local dependencies, including installing them into the store (and thus sharing them between runs in CI systems or between multiple projects). This means we need to be able to compute nix style hashes. For git that should be easy: the current checkout hash is perfect. But this obviously is brancher specific, so we'd need to extend the brancher API for that.

dcoutts · 2017-02-22T00:01:03Z

cabal-install/Distribution/Client/PackageHash.hs

+-- hashing.
+hashSourceRepo :: SourceRepo -> HashValue
+hashSourceRepo sourceRepo =
+  hashValue $ Binary.encode (sourceRepo { repoSubdir = Nothing })


So if I understand correctly, this is just to pick the location of the repository or is this to calculate the source hash for the content of the repo. In which case I don't think this belongs in this module which is about the package id hashes which are based on content hashes.

dcoutts · 2017-02-22T00:07:32Z

cabal-install/Distribution/Client/Get/Brancher.hs

@@ -0,0 +1,155 @@
+module Distribution.Client.Get.Brancher where


I think we prefer explicit export lists. People introduce private utils during refactoring and shouldn't have to think about introducing an export list at the same time.

dcoutts · 2017-02-22T00:16:37Z

cabal-install/Distribution/Client/ProjectConfig.hs

+
+readSourcePackage verbosity distDirLayout (ProjectPackageRemoteRepo repo) = do
+    let
+      scmSrcDir  = "scm-" ++ showHashValue (hashSourceRepo repo)


I'm not convinced that this is a good idea. I think we ought to be able to pick generally non-clashing names here without having to use huge hashes.

So what's the issue? People said they wanted to be able to have the same repo but with two different checkouts (and presumably two different subdirs otherwise they've selecting two versions of the same package which isn't supported with local dirs either).

So I can see that picking non-clashing names is an issue, but on the other hand we don't want to have to do a fresh checkout when someone updates the project file to point to a different tag. In those cases we should simply be able to fetch & reset. Note that this may imply more functionality in the branchers than we have right now, since currently we can only get a fresh checkout I think right?

dcoutts · 2017-02-22T00:34:48Z

cabal-install/Distribution/Client/ProjectConfig.hs

+      -- we only need this for error messages
+      repoloc = fromMaybe "" (repoLocation repo)
+
+    repoExists <- liftIO $ doesDirectoryExist destDir


Ok, here is where we will need to think about the incremental case. Notice that we are in a Rebuild monad. Currently, this readSourcePackage will get re-run every time the cabal.project file changes. That could be adjusted so we only re-run when the input information for the source repo changes.

BTW, notice that for the local file case we do a monitorFiles [monitorFileHashed cabalFile] to force this code to get re-run when the user edits that file. For source repos we don't need to worry about the user editing things under us I think.

dcoutts · 2017-02-22T00:39:52Z

cabal-install/Distribution/Client/ProjectConfig.hs

+            -> readSourcePackage
+                verbosity
+                distDirLayout
+                (ProjectPackageLocalDirectory pkgdir cabalFile)


I don't think we want to convert this into a project local package, we want to preserve the information that this package is from a source repo and not necessarily treat it as local. That is we should be able to treat it like a package from hackage.

If we do this then it means we can put packages built from repos into the store and share them between multiple projects. This would be a big win. But it also means we need a mechanism to obtain a hash of the sources in the repo. This does not have to be a content hash of the unpacked dir, any stable content hash will do. So for git for example, the current git checkout hash is perfect. For other repo types other methods may be needed.

dcoutts · 2017-02-22T00:50:35Z

One thing that might help here is if one of us implements the local or remote tarball case, since that covers the general pattern of when do we download, dealing with changes in the remote tarball URL (the only info for a tarball, obviously source repos have more), and also the issue about getting source hashes so we can treat these tarball packages as non-local deps.

Ericson2314 · 2017-02-22T18:01:54Z

This isn't great for git's garbage collecting, but it's perfectly possible to just fetch everything to the same local repo, and let git discover all possible incremental downloading that way.

dcoutts · 2017-03-15T11:33:56Z

Related: the branch that #4399 is part of.

erikd · 2017-05-02T03:37:25Z

Only just noticed this PR. I've been using a thin wrapper around cabal called mafia for about a year now and I think the way it handles code not on Hackage seems somewhat better than what is proposed here. First and foremost, add a project does not require any changes to the host project's cabal file.

bgamari · 2017-09-27T12:48:14Z

What is the status of this? I would also really like to have this feature.

23Skidoo · 2017-09-27T13:13:49Z

Looks stalled.

thoughtpolice · 2017-10-02T00:24:27Z

This would also be extremely useful for Clash since I recently added .project files for it -- we have a few GHC plugins that currently need to be cloned where they otherwise could be handled by this.

I mentioned this to @dcoutts recently and IIRC he suggested that he'd like part of the work for extra-packages to land as well for this feature (currently in wip/extra-packages). @dcoutts -- thoughts on what needs to happen next? Now that the Summer of Code is over we have less bandwidth...

fgaz · 2017-10-31T10:20:33Z

extra-packages is in

gbaz · 2018-03-24T06:20:34Z

The provenance-qualified imports proposal (ghc-proposals/ghc-proposals#115) is relevant. Note that even if that passes, nearly all the work on this branch remains relevant as the core trickiness is in the fetching logic.

Prototype for source repository support in cabal.project

d1a92b9

This allows to write ``` source-repository-package type: git location: https://github.com/.... subdir: somesubdirinyourrepository tag: 481cabcab15501f300728bfa4f6de517d4492777 ```

alexbiehl force-pushed the remote-repo branch from d0944b1 to d1a92b9 Compare February 1, 2017 18:02

ezyang mentioned this pull request Feb 2, 2017

You're awesome! We're giving you commit bits. #3567

Closed

Introduce Distribution.Client.Get.Brancher

967fb17

Hash SourceRepo to determine stable fork directory

d275b4d

ezyang added the blocked: action-required label Feb 7, 2017

dcoutts requested changes Feb 22, 2017

View reviewed changes

alexbiehl closed this Jul 16, 2018

fgaz added the cabal-install: source-repository-package label Feb 2, 2021

		@@ -0,0 +1,155 @@
		module Distribution.Client.Get.Brancher where

Prototype for source repository support in cabal.project #4289

Prototype for source repository support in cabal.project #4289

Uh oh!

Conversation

alexbiehl commented Feb 1, 2017

Uh oh!

mention-bot commented Feb 1, 2017

Uh oh!

ezyang commented Feb 2, 2017

Uh oh!

phadej commented Feb 2, 2017

Uh oh!

alexbiehl commented Feb 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexbiehl commented Feb 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phadej commented Feb 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Feb 3, 2017

Uh oh!

ezyang commented Feb 3, 2017

Uh oh!

alexbiehl commented Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Feb 3, 2017

Uh oh!

alexbiehl commented Feb 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ericson2314 commented Feb 6, 2017

Uh oh!

ezyang commented Feb 6, 2017

Uh oh!

dcoutts commented Feb 21, 2017

Uh oh!

dcoutts left a comment

Choose a reason for hiding this comment

Uh oh!

dcoutts Feb 22, 2017

Choose a reason for hiding this comment

Uh oh!

dcoutts Feb 22, 2017

Choose a reason for hiding this comment

Uh oh!

dcoutts Feb 22, 2017

Choose a reason for hiding this comment

Uh oh!

dcoutts Feb 22, 2017

Choose a reason for hiding this comment

Uh oh!

dcoutts Feb 22, 2017

Choose a reason for hiding this comment

Uh oh!

dcoutts commented Feb 22, 2017

Uh oh!

Ericson2314 commented Feb 22, 2017

Uh oh!

dcoutts commented Mar 15, 2017

Uh oh!

erikd commented May 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bgamari commented Sep 27, 2017

Uh oh!

23Skidoo commented Sep 27, 2017

Uh oh!

thoughtpolice commented Oct 2, 2017

Uh oh!

fgaz commented Oct 31, 2017

Uh oh!

gbaz commented Mar 24, 2018

Uh oh!

Uh oh!

alexbiehl commented Feb 2, 2017 •

edited

Loading

alexbiehl commented Feb 2, 2017 •

edited

Loading

phadej commented Feb 2, 2017 •

edited

Loading

alexbiehl commented Feb 3, 2017 •

edited

Loading

alexbiehl commented Feb 4, 2017 •

edited

Loading

erikd commented May 2, 2017 •

edited

Loading