Skip to content

Remove special casing for PSEUDOREF updates #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Remove special casing for PSEUDOREF updates #673

wants to merge 3 commits into from

Conversation

hanwen
Copy link

@hanwen hanwen commented Jul 6, 2020

This gets rid of the special casing code for pseudorefs in refs.c

This is in preparation for reftable.

v4

  • Address problems on case-insensitive file systems reported by Dscho.

@hanwen hanwen changed the title Consider HEAD a PSEUDOREF rather than PER_WORKTREE. Remove special casing PSEUDOREF updates Jul 6, 2020
@hanwen
Copy link
Author

hanwen commented Jul 6, 2020

/submit

@hanwen hanwen changed the title Remove special casing PSEUDOREF updates Remove special casing for PSEUDOREF updates Jul 6, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 6, 2020

Submitted as [email protected]

@@ -739,102 +739,6 @@ long get_files_ref_lock_timeout_ms(void)
return timeout_ms;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <[email protected]> writes:

> Subject: Re: [PATCH 1/2] Modify pseudo refs through ref backend storage
> From: Han-Wen Nienhuys <[email protected]>

With what definition of "pseudo refs" has this change been made?
Those things like HEAD, CHERRY_PICK_HEAD, FETCH_HEAD etc. that have
traditionally been written as a plain text file in $GIT_DIR and are
used to name objects by having a full object name in it?  

Or the entities that behave like refs and stored in ref backends,
with all-uppercase-names but do not sit inside refs/ hierarchy?

I think it is OK (and possibly a good move in the longer term, but
that is just my gut feeling) to make ref backends resopnsible for
enumerating, reading and writing them (i.e. I think we want to use
the latter definition for the longer term health of the project).
And we would want to ...

> The previous behavior was introduced in commit 74ec19d4be
> ("pseudorefs: create and use pseudoref update and delete functions",
> Jul 31, 2015), with the justification "alternate ref backends still
> need to store pseudorefs in GIT_DIR".

... declare that justification invalid for that purpose.

Is that what is going on?  I just want to make sure I am following
your flow of thought.

> Refs such as REBASE_HEAD are read through the ref backend. This can
> only work consistently if they are written through the ref backend as
> well. Tooling that works directly on files under .git should be
> updated to use git commands to read refs instead.

OK.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 6516c7bc8c..9951c2e403 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1228,7 +1228,6 @@ static int files_delete_refs(struct ref_store *ref_store, const char *msg,
>  
>  	for (i = 0; i < refnames->nr; i++) {
>  		const char *refname = refnames->items[i].string;
> -
>  		if (refs_delete_ref(&refs->base, msg, refname, NULL, flags))
>  			result |= error(_("could not remove reference %s"), refname);
>  	}

Unreleated change?

> @@ -2436,7 +2435,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>  	update->backend_data = lock;
>  
>  	if (update->type & REF_ISSYMREF) {
> -		if (update->flags & REF_NO_DEREF) {
> +		if (update->flags & REF_NO_DEREF ||
> +		    (ref_type(update->refname) == REF_TYPE_PSEUDOREF &&
> +		     strcmp(update->refname, "HEAD"))) {
>  			/*
>  			 * We won't be reading the referent as part of
>  			 * the transaction, so we have to read it here

The old "if we are not dereferencing" condition in if() exactly
matched the comment, but the condition in if() after this change is
not "if we are not dereferencing".  Even if we are dereferencing,
under some new condition, we would still drop into this block and do
not follow the "else" side that creates a new update for the
referent.  Is this part of "modify pseudo refs via backend" topic,
or should it be a separate modification?  Why is this change needed?

It seems that no matter where in the refs/ hierarchy (or even
outside) a symbolic ref resides, the way to update itself (not the
referent through the symbolic ref) should be the same, which is what
the original says, and we want to change that reasoning here, but it
is not quite clear to me why.

> @@ -2782,8 +2783,10 @@ static int files_transaction_finish(struct ref_store *ref_store,
>  		struct ref_update *update = transaction->updates[i];
>  		struct ref_lock *lock = update->backend_data;
>  
> -		if (update->flags & REF_NEEDS_COMMIT ||
> -		    update->flags & REF_LOG_ONLY) {
> +		if ((ref_type(lock->ref_name) != REF_TYPE_PSEUDOREF ||
> +		     !strcmp(lock->ref_name, "HEAD")) &&
> +		    (update->flags & REF_NEEDS_COMMIT ||
> +		     update->flags & REF_LOG_ONLY)) {

And this one stops the files backend from touching all pseudorefs
other than HEAD with this codepath.  That somehow feels totally
opposite from what the log message explained above---weren't we
updating the code to write these pseudorefs through the individual
backends, which the files backend is one example of?  Isn't this
change stopping the backend from writing the pseudorefs other than
HEAD instead?

Puzzled.

>  			if (files_log_ref_write(refs,
>  						lock->ref_name,
>  						&lock->old_oid,


> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
> index 27171f8261..6b8030e8fe 100755
> --- a/t/t1400-update-ref.sh
> +++ b/t/t1400-update-ref.sh
> @@ -476,7 +476,7 @@ test_expect_success 'git cat-file blob master@{2005-05-26 23:42}:F (expect OTHER
>  test_expect_success 'given old value for missing pseudoref, do not create' '
>  	test_must_fail git update-ref PSEUDOREF $A $B 2>err &&
>  	test_path_is_missing .git/PSEUDOREF &&

The reason why I asked what this patch thinks the definition of
pseudoref is is because of this thing.  Shouldn't this line be fixed
not to depend on the files backend?  Likewise for $(cat .git/PSEUDOREF)
in the remaining two hunks.

> -	test_i18ngrep "could not read ref" err
> +	test_i18ngrep "unable to resolve reference" err
>  '
>  
>  test_expect_success 'create pseudoref' '
> @@ -497,7 +497,7 @@ test_expect_success 'overwrite pseudoref with correct old value' '
>  test_expect_success 'do not overwrite pseudoref with wrong old value' '
>  	test_must_fail git update-ref PSEUDOREF $D $E 2>err &&
>  	test $C = $(cat .git/PSEUDOREF) &&
> -	test_i18ngrep "unexpected object ID" err
> +	test_i18ngrep "cannot lock ref.*expected" err
>  '
>  
>  test_expect_success 'delete pseudoref' '
> @@ -509,7 +509,7 @@ test_expect_success 'do not delete pseudoref with wrong old value' '
>  	git update-ref PSEUDOREF $A &&
>  	test_must_fail git update-ref -d PSEUDOREF $B 2>err &&
>  	test $A = $(cat .git/PSEUDOREF) &&
> -	test_i18ngrep "unexpected object ID" err
> +	test_i18ngrep "cannot lock ref.*expected" err
>  '
>  
>  test_expect_success 'delete pseudoref with correct old value' '

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Mon, Jul 6, 2020 at 10:27 PM Junio C Hamano <[email protected]> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <[email protected]> writes:
>
> > Subject: Re: [PATCH 1/2] Modify pseudo refs through ref backend storage
> > From: Han-Wen Nienhuys <[email protected]>
>
> With what definition of "pseudo refs" has this change been made?
> Those things like HEAD, CHERRY_PICK_HEAD, FETCH_HEAD etc. that have
> traditionally been written as a plain text file in $GIT_DIR and are
> used to name objects by having a full object name in it?
>
> Or the entities that behave like refs and stored in ref backends,
> with all-uppercase-names but do not sit inside refs/ hierarchy?

I thought these were the same? - see below :)

> I think it is OK (and possibly a good move in the longer term, but
> that is just my gut feeling) to make ref backends resopnsible for
> enumerating, reading and writing them (i.e. I think we want to use
> the latter definition for the longer term health of the project).
> And we would want to ...
>
> > The previous behavior was introduced in commit 74ec19d4be
> > ("pseudorefs: create and use pseudoref update and delete functions",
> > Jul 31, 2015), with the justification "alternate ref backends still
> > need to store pseudorefs in GIT_DIR".
>
> ... declare that justification invalid for that purpose.
>
> Is that what is going on?  I just want to make sure I am following
> your flow of thought.

yep.

> >               const char *refname = refnames->items[i].string;
> > -
> >               if (refs_delete_ref(&refs->base, msg, refname, NULL, flags))
> >                       result |= error(_("could not remove reference %s"), refname);
> >       }
>
> Unreleated change?

oops.

> > @@ -2436,7 +2435,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
> >       update->backend_data = lock;
> >
> >       if (update->type & REF_ISSYMREF) {
> > -             if (update->flags & REF_NO_DEREF) {
> > +             if (update->flags & REF_NO_DEREF ||
> > +                 (ref_type(update->refname) == REF_TYPE_PSEUDOREF &&
> > +                  strcmp(update->refname, "HEAD"))) {
> >                       /*
> >                        * We won't be reading the referent as part of
> >                        * the transaction, so we have to read it here
>
> The old "if we are not dereferencing" condition in if() exactly
> matched the comment, but the condition in if() after this change is
> not "if we are not dereferencing".  Even if we are dereferencing,
> under some new condition, we would still drop into this block and do
> not follow the "else" side that creates a new update for the
> referent.  Is this part of "modify pseudo refs via backend" topic,
> or should it be a separate modification?  Why is this change needed?

There is a  test (mentioned in the commit message), t1405, which does

$RUN create-symref FOO refs/heads/master nothing &&
# calls refs_create_symref()
..
$RUN delete-refs 0 nothing FOO refs/tags/new-tag &&
# calls refs_delete_refs, which calls delete_pseudoref
..
git rev-parse master >expected &&

Previously, the delete_pseudoref code would delete just .git/FOO file.
Without the change here, going through the ref backend ends up in the
!REF_NO_DEREF case, deleting the master branch.

I don't know what the correct semantics are (are symrefs used for
anything except HEAD ?), so


> It seems that no matter where in the refs/ hierarchy (or even
> outside) a symbolic ref resides, the way to update itself (not the
> referent through the symbolic ref) should be the same, which is what
> the original says, and we want to change that reasoning here, but it
> is not quite clear to me why.
>
> > @@ -2782,8 +2783,10 @@ static int files_transaction_finish(struct ref_store *ref_store,
> >               struct ref_update *update = transaction->updates[i];
> >               struct ref_lock *lock = update->backend_data;
> >
> > -             if (update->flags & REF_NEEDS_COMMIT ||
> > -                 update->flags & REF_LOG_ONLY) {
> > +             if ((ref_type(lock->ref_name) != REF_TYPE_PSEUDOREF ||
> > +                  !strcmp(lock->ref_name, "HEAD")) &&
> > +                 (update->flags & REF_NEEDS_COMMIT ||
> > +                  update->flags & REF_LOG_ONLY)) {
>
> And this one stops the files backend from touching all pseudorefs
> other than HEAD with this codepath.  That somehow feels totally
> opposite from what the log message explained above---weren't we
> updating the code to write these pseudorefs through the individual
> backends, which the files backend is one example of?  Isn't this
> change stopping the backend from writing the pseudorefs other than
> HEAD instead?

There is a test

  'core.logAllRefUpdates=always creates no reflog for ORIG_HEAD'

that wants there to not be reflog updates for ORIG_HEAD. I don't see
why that behavior exists, and would be happy to change it, but I'm too
much of a newbie to decide what is right here.

> > --- a/t/t1400-update-ref.sh
> > +++ b/t/t1400-update-ref.sh
> > @@ -476,7 +476,7 @@ test_expect_success 'git cat-file blob master@{2005-05-26 23:42}:F (expect OTHER
> >  test_expect_success 'given old value for missing pseudoref, do not create' '
> >       test_must_fail git update-ref PSEUDOREF $A $B 2>err &&
> >       test_path_is_missing .git/PSEUDOREF &&
>
> The reason why I asked what this patch thinks the definition of
> pseudoref is is because of this thing.  Shouldn't this line be fixed
> not to depend on the files backend?  Likewise for $(cat .git/PSEUDOREF)
> in the remaining two hunks.

This patch doesn't introduce reftable yet, so both definitions are
equivalent for the sake of this  patch.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <[email protected]> writes:

>> Or the entities that behave like refs and stored in ref backends,
>> with all-uppercase-names but do not sit inside refs/ hierarchy?
>
> I thought these were the same? - see below :)

>> > --- a/t/t1400-update-ref.sh
>> > +++ b/t/t1400-update-ref.sh
>> > @@ -476,7 +476,7 @@ test_expect_success 'git cat-file blob master@{2005-05-26 23:42}:F (expect OTHER
>> >  test_expect_success 'given old value for missing pseudoref, do not create' '
>> >       test_must_fail git update-ref PSEUDOREF $A $B 2>err &&
>> >       test_path_is_missing .git/PSEUDOREF &&
>>
>> The reason why I asked what this patch thinks the definition of
>> pseudoref is is because of this thing.  Shouldn't this line be fixed
>> not to depend on the files backend?  Likewise for $(cat .git/PSEUDOREF)
>> in the remaining two hunks.
>
> This patch doesn't introduce reftable yet, so both definitions are
> equivalent for the sake of this patch.

But at the end of the tunnel, we do want to see people stop
expecting that .git/PSEUDOREF file is what PSEUDOREF ref is, no?  I
think that is what "pseudo refs are not just files directly under
$GIT_DIR, but managed by the backend" means, and that in turn is a
valid justification for other changes introduced in this patch.
Expecting the effect of modifying pseudo refs _after_ the code is
changed that such modifications are properly done through the ref
API to appear on the filesystem feels like it goes against the
reason why we are making this change, compared to checking to see if
the pseudoref really got updated as desired via the ref API, at
least to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Tue, Jul 7, 2020 at 5:20 PM Junio C Hamano <[email protected]> wrote:

> >> The reason why I asked what this patch thinks the definition of
> >> pseudoref is is because of this thing.  Shouldn't this line be fixed
> >> not to depend on the files backend?  Likewise for $(cat .git/PSEUDOREF)
> >> in the remaining two hunks.
> >
> > This patch doesn't introduce reftable yet, so both definitions are
> > equivalent for the sake of this patch.
>
> But at the end of the tunnel, we do want to see people stop
> expecting that .git/PSEUDOREF file is what PSEUDOREF ref is, no?  I
> think that is what "pseudo refs are not just files directly under
> $GIT_DIR, but managed by the backend" means, and that in turn is a
> valid justification for other changes introduced in this patch.
> Expecting the effect of modifying pseudo refs _after_ the code is
> changed that such modifications are properly done through the ref
> API to appear on the filesystem feels like it goes against the
> reason why we are making this change, compared to checking to see if
> the pseudoref really got updated as desired via the ref API, at
> least to me.

I can fix this specific instance here (which is what I think you
want), for this commit to practice what it preaches. At the same time
there are probably about 100 or so other places where the tests check
the file system directly for ref(log) existence, so it would never be
totally consistent.

The only way to systematically find the offending places is to
introduce a new ref backend and then fix all the tests, and I think
that goes outside the scope of this small series.

--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <[email protected]> writes:

> I can fix this specific instance here (which is what I think you
> want), for this commit to practice what it preaches. At the same time
> there are probably about 100 or so other places where the tests check
> the file system directly for ref(log) existence, so it would never be
> totally consistent.
>
> The only way to systematically find the offending places is to
> introduce a new ref backend and then fix all the tests, and I think
> that goes outside the scope of this small series.

Good.  I think we are on the same page.  I suggested to make the
patch internally consistent, nothing more.  And fixing everything in
the world is outside the scope of these two patches.

Thanks.

@@ -676,10 +676,9 @@ int dwim_log(const char *str, int len, struct object_id *oid, char **log)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <[email protected]> writes:

> From: Han-Wen Nienhuys <[email protected]>
>
> This is consistent with the definition of REF_TYPE_PSEUDOREF
> (uppercase in the root ref namespace).

I wonder if some special casing we saw for "HEAD" in 1/2 (e.g. we no
longer do this to pseudorefs in this codepath but HEAD is different
and will continue to be dealt with in the codepath) needs adjustment
after this change?  HEAD still needs to be able to have a different
value (either detached or pointing to another ref) per worktree, but
it is unclear where that knowledge now resides in the new code.  Are
we declaring that all pseudorefs are per worktree (which I think is
not wrong)?

> Signed-off-by: Han-Wen Nienhuys <[email protected]>
> ---
>  refs.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index a2fd42364f..265767a234 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -676,10 +676,9 @@ int dwim_log(const char *str, int len, struct object_id *oid, char **log)
>  
>  static int is_per_worktree_ref(const char *refname)
>  {
> -	return !strcmp(refname, "HEAD") ||
> -		starts_with(refname, "refs/worktree/") ||
> -		starts_with(refname, "refs/bisect/") ||
> -		starts_with(refname, "refs/rewritten/");
> +	return starts_with(refname, "refs/worktree/") ||
> +	       starts_with(refname, "refs/bisect/") ||
> +	       starts_with(refname, "refs/rewritten/");
>  }
>  
>  static int is_pseudoref_syntax(const char *refname)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Mon, Jul 6, 2020 at 10:30 PM Junio C Hamano <[email protected]> wrote:
> > This is consistent with the definition of REF_TYPE_PSEUDOREF
> > (uppercase in the root ref namespace).
>
> I wonder if some special casing we saw for "HEAD" in 1/2 (e.g. we no
> longer do this to pseudorefs in this codepath but HEAD is different
> and will continue to be dealt with in the codepath) needs adjustment
> after this change?  HEAD still needs to be able to have a different
> value (either detached or pointing to another ref) per worktree, but
> it is unclear where that knowledge now resides in the new code.  Are
> we declaring that all pseudorefs are per worktree (which I think is
> not wrong)?

the source code already says

enum ref_type {
 REF_TYPE_PER_WORKTREE, /* refs inside refs/ but not shared */
 REF_TYPE_PSEUDOREF, /* refs outside refs/ in current worktree */

but considers HEAD a REF_TYPE_PER_WORKTREE, counter to the comment.

IIRC, this change actually has almost no effect, because most code
treats both enum values as equivalent. (see eg. builtin/reflog.c),
except for the code that the preceding commit changes.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2020

There are issues in commit a25092600d1ce184697cecf0d968aa9db5926cd1:
Prefixed commit message must be in lower case: t1400: Use git rev-parse for testing PSEUDOREF existence
Commit not signed off

@hanwen
Copy link
Author

hanwen commented Jul 9, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2020

Submitted as [email protected]

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 10, 2020

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <[email protected]> writes:

> This gets rid of the special casing code for pseudorefs in refs.c
>
> This is in preparation for reftable.
>
> v2
>
>  * remove special casing of non-HEAD pseudorefs; update t1400 and t1405
>    accordingly
>  * open question: should git-update-ref.txt be updated, when it talks about
>    logAllRefUpdates?

This does make writing ORIG_HEAD and friends a lot more heavyweight
that just "write a 40-byte text file on the filesystem to remember
it for a short while", and it becomes even so with the whole reflog
thing, but I think it is a good longer term direction to go.  

Others may disagree, though.

In any case, if we are changing the behaviour with these patches, we
should update the documentation.

Thanks.


@hanwen
Copy link
Author

hanwen commented Jul 16, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 16, 2020

Submitted as [email protected]

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 16, 2020

This branch is now known as hn/reftable-prep-part-2.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 16, 2020

This patch series was integrated into seen via git@d3bbd5b.

@gitgitgadget gitgitgadget bot added the seen label Jul 16, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 16, 2020

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <[email protected]> writes:

> This gets rid of the special casing code for pseudorefs in refs.c
>
> This is in preparation for reftable.
>
> v3
>
>  * tweak git-update-ref.txt description for logAllRefUpdates
>
> Han-Wen Nienhuys (3):
>   t1400: use git rev-parse for testing PSEUDOREF existence
>   Modify pseudo refs through ref backend storage
>   Make HEAD a PSEUDOREF rather than PER_WORKTREE.

I reviewed some codepaths that deal with FETCH_HEAD recently.  

As the file is quite different from all the other pseudo references
in that it needs to store more than one object name and in that each
ref in it needs more than just the object name, I doubt that it
makes much sense to enhance the refs API so that its requirements
can be covered.

It would probably make sense for FETCH_HEAD to stay "a text file
directly underneath $GIT_DIR", but get_oid() needs to be able to
read it and return the first object name recorded in the file.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 16, 2020

This patch series was integrated into seen via git@c062f2a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 17, 2020

This patch series was integrated into seen via git@16d8606.

@@ -148,12 +148,13 @@ still see a subset of the modifications.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Han-Wen,

On Thu, 16 Jul 2020, Han-Wen Nienhuys via GitGitGadget wrote:

> From: Han-Wen Nienhuys <[email protected]>
>
> The previous behavior was introduced in commit 74ec19d4be
> ("pseudorefs: create and use pseudoref update and delete functions",
> Jul 31, 2015), with the justification "alternate ref backends still
> need to store pseudorefs in GIT_DIR".
>
> Refs such as REBASE_HEAD are read through the ref backend. This can
> only work consistently if they are written through the ref backend as
> well. Tooling that works directly on files under .git should be
> updated to use git commands to read refs instead.
>
> The following behaviors change:
>
> * Updates to pseudorefs (eg. ORIG_HEAD) with
>   core.logAllRefUpdates=always will create reflogs for the pseudoref.
>
> * non-HEAD pseudoref symrefs are also dereferenced on deletion. Update
>   t1405 accordingly.

Unfortunately, there is still a breakage in t1405, although it only really
shows up on Windows and macOS because of case-insensitive filesystems.
https://github.com/gitgitgadget/git/runs/879772942?check_suite_focus=true
shows the concrete problem: t1405.17 'delete_ref(refs/heads/foo)' will
fail thusly:

	++ test-tool ref-store main delete-ref msg refs/heads/foo c90e4dc5e12224a428dedfbd45ba11e5531706a2 0
	error: cannot lock ref 'refs/heads/foo': is at 1e995a94b5a60488b6ebaf78ec779d85a55ea700 but expected c90e4dc5e12224a428dedfbd45ba11e5531706a2
	error: last command exited with $?=1

The reason is that there exists a `refs/heads/foo` _and_ the symref `FOO`
(which points to `refs/heads/master`, which is at 1e995a).

An earlier symptom of this bug is visible on Linux, too, though: when
you run t1405.4 'delete_refs(FOO, refs/tags/new-tag)', it shows this:

	+ git rev-parse FOO --
	warning: ignoring dangling symref FOO
	fatal: bad revision 'FOO'

(Seeing this warning suggests to me that this should probably be caught
_by the regression test_, i.e. stderr should be redirected, and it should
be verified via `test_i18ngrep` that that warning is not shown.)

The reason for this warning is that the symref `.git/FOO` is no longer
removed as part of t1405.4 'delete_refs(FOO, refs/tags/new-tag)'. I've
used this patch to bisect the breakage down to this here patch:

-- snip --
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index a8d695cd4f1..779906d607f 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -34,6 +34,7 @@ test_expect_success 'delete_refs(FOO, refs/tags/new-tag)' '
 	m=$(git rev-parse master) &&
 	$RUN delete-refs 0 nothing FOO refs/tags/new-tag &&
 	test_must_fail git rev-parse FOO -- &&
+	test_path_is_missing .git/FOO &&
 	test_must_fail git rev-parse refs/tags/new-tag --&&
 	test_must_fail git rev-parse master -- &&
 	git update-ref refs/heads/master $m
-- snap --

Hopefully you can figure out what is going wrong.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Fri, Jul 17, 2020 at 12:18 PM Johannes Schindelin
<[email protected]> wrote:
> The reason is that there exists a `refs/heads/foo` _and_ the symref `FOO`
> (which points to `refs/heads/master`, which is at 1e995a).
>
> An earlier symptom of this bug is visible on Linux, too, though: when
> you run t1405.4 'delete_refs(FOO, refs/tags/new-tag)', it shows this:
>
>         + git rev-parse FOO --
>         warning: ignoring dangling symref FOO
>         fatal: bad revision 'FOO'

Thanks, this was very helpful for debugging.

The reason is that we do split_symref_update(), where a transaction
that modifies SYMREF (pointing to REFERENT) without REF_NO_DEREF is
changed to a transaction that modifies REFERENT and a reflog-only
update for SYMREF. In this case, that leaves the FOO symref in place.

This seems very specific to the test (is there any use of symrefs
beyond HEAD?) , and I fixed it by passing REF_NO_DEREF to the
delete-refs call.

It would also be nice if this behavior was lifted out of the files
backend and into the generic layer, but it's not yet clear to me if
this is possible.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 20, 2020

This patch series was integrated into seen via git@8f85217.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 21, 2020

This patch series was integrated into seen via git@1cda0c2.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 21, 2020

This patch series was integrated into seen via git@342854c.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 23, 2020

This patch series was integrated into seen via git@5b2ba95.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 23, 2020

This patch series was integrated into seen via git@93c2228.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 24, 2020

This patch series was integrated into seen via git@94c31ad.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 24, 2020

This patch series was integrated into seen via git@88cae4f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 24, 2020

This patch series was integrated into seen via git@111b36a.

wa268 referenced this pull request Jul 25, 2020
The "reference transaction" hook was introduced in commit 6754159
(refs: implement reference transaction hook, 2020-06-19). The name of
the hook is declared as "reference-transaction" in "refs.c" and
testcases, but the name declared in "githooks.txt" is different.

Signed-off-by: Bojun Chen <[email protected]>
Reviewed-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
wa268 referenced this pull request Jul 25, 2020
A new hook.

* ps/ref-transaction-hook:
  githooks.txt: use correct "reference-transaction" hook name
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 5, 2020

This patch series was integrated into seen via git@3a41dd3.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 5, 2020

This patch series was integrated into seen via git@750f130.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 6, 2020

This patch series was integrated into seen via git@77580fb.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 6, 2020

This patch series was integrated into seen via git@641b6fa.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 6, 2020

This patch series was integrated into seen via git@c15b717.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 7, 2020

This patch series was integrated into seen via git@ddbf547.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 7, 2020

This patch series was integrated into seen via git@50b124c.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2020

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Wed, Aug 5, 2020 at 5:54 PM Junio C Hamano <[email protected]> wrote:
> >> All of which means FETCH_HEAD is special and we may not want to
> >> burden the special casing of it to newer backends.
> >
> > Can you confirm that FETCH_HEAD is the only thing that can store more
> > than just a symref / SHA1 ? Based on the name, and a comment in the
> > JGit source, I thought that MERGE_HEAD might contain more than one
> > SHA1 at a time.
>
> No I cannot.  I do not think MERGE_HEAD is something I added with as
> deep a thought as I did with FETCH_HEAD.  If it mimics FETCH_HEAD,
> then perhaps it does, but I somehow doubt it.
>
> As can be seen in builtin/merge.c::collect_parents(), we do special
> case FETCH_HEAD when grabbing what commit*S* to merge, but all
> others are read with get_merge_parent() that makes a single call to
> get_oid(), i.e. anything other than FETCH_HEAD cannot have more than
> one object recorded.

Understood. Is there anything you'd like me to do with this patch
series so it can be merged?

Dealing with FETCH_HEAD generically isn't possible unless we extend
the API of the ref backend: the generic ref_store instance doesn't
offer a way to get at the path that corresponds to FETCH_HEAD, so we
can't handle it in refs_read_raw_ref(). In the current reftable
series, FETCH_HEAD is dealt with in the backends separately.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2020

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <[email protected]> writes:

> Dealing with FETCH_HEAD generically isn't possible unless we extend
> the API of the ref backend: the generic ref_store instance doesn't
> offer a way to get at the path that corresponds to FETCH_HEAD, so we
> can't handle it in refs_read_raw_ref(). In the current reftable
> series, FETCH_HEAD is dealt with in the backends separately.

I am not sure what the best way would be, but I do not think any
existing code writes into it using the refs API at all, even though
it may be read only for the first object name using the refs API.

And I am not sure if we want to extend the write side API so that
the callers can express the full flexibility of that single file.

So perhaps the best way forward would be to ensure that anybody who
tries to read from FETCH_HEAD using the ref API reads the first
object name in it from $GIT_DIR/FETCH_HEAD file as we've always done
since the beginning of time, regardless of what ref backend is used,
that anybody who tries to write FETCH_HEAD using the ref API gets an
error, and letting the writers write into it directly?



@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2020

This patch series was integrated into seen via git@8091ebc.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2020

This patch series was integrated into seen via git@1269f83.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2020

This patch series was integrated into seen via git@924658a.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 11, 2020

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Mon, Aug 10, 2020 at 6:04 PM Junio C Hamano <[email protected]> wrote:
>
> Han-Wen Nienhuys <[email protected]> writes:
>
> > Dealing with FETCH_HEAD generically isn't possible unless we extend
> > the API of the ref backend: the generic ref_store instance doesn't
> > offer a way to get at the path that corresponds to FETCH_HEAD, so we
> > can't handle it in refs_read_raw_ref(). In the current reftable
> > series, FETCH_HEAD is dealt with in the backends separately.
>
> I am not sure what the best way would be, but I do not think any
> existing code writes into it using the refs API at all, even though
> it may be read only for the first object name using the refs API.
>
> And I am not sure if we want to extend the write side API so that
> the callers can express the full flexibility of that single file.

That's not what I am getting at. I am just interested in how to handle
FETCH_HEAD in refs_read_raw_ref.

> So perhaps the best way forward would be to ensure that anybody who
> tries to read from FETCH_HEAD using the ref API reads the first
> object name in it from $GIT_DIR/FETCH_HEAD file as we've always done
> since the beginning of time, regardless of what ref backend is used,

Right, but how do we get at the value of $GIT_DIR given a struct
ref_store? We can either push that out to the ref store backend,
because each backend knows what $GIT_DIR is, or we can make $GIT_DIR a
property of the generic ref_store.

I suppose it's cleaner to make the latter API extension.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 11, 2020

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <[email protected]> writes:

> On Mon, Aug 10, 2020 at 6:04 PM Junio C Hamano <[email protected]> wrote:
>>
>> Han-Wen Nienhuys <[email protected]> writes:
>>
>> > Dealing with FETCH_HEAD generically isn't possible unless we extend
>> > the API of the ref backend: the generic ref_store instance doesn't
>> > offer a way to get at the path that corresponds to FETCH_HEAD, so we
>> > can't handle it in refs_read_raw_ref(). In the current reftable
>> > series, FETCH_HEAD is dealt with in the backends separately.
>>
>> I am not sure what the best way would be, but I do not think any
>> existing code writes into it using the refs API at all, even though
>> it may be read only for the first object name using the refs API.
>>
>> And I am not sure if we want to extend the write side API so that
>> the callers can express the full flexibility of that single file.
>
> That's not what I am getting at. I am just interested in how to handle
> FETCH_HEAD in refs_read_raw_ref.
>
>> So perhaps the best way forward would be to ensure that anybody who
>> tries to read from FETCH_HEAD using the ref API reads the first
>> object name in it from $GIT_DIR/FETCH_HEAD file as we've always done
>> since the beginning of time, regardless of what ref backend is used,
>
> Right, but how do we get at the value of $GIT_DIR given a struct
> ref_store? We can either push that out to the ref store backend,
> because each backend knows what $GIT_DIR is, or we can make $GIT_DIR a
> property of the generic ref_store.
>
> I suppose it's cleaner to make the latter API extension.

Yup.  Sounds like the right direction to go.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 11, 2020

This patch series was integrated into seen via git@a2b1cda.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 12, 2020

This patch series was integrated into seen via git@277ca30.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 12, 2020

This patch series was integrated into next via git@43ac0bc.

@gitgitgadget gitgitgadget bot added the next label Aug 12, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 13, 2020

This patch series was integrated into seen via git@335aff3.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 14, 2020

This patch series was integrated into seen via git@76d03bf.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2020

This patch series was integrated into seen via git@95c687b.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2020

This patch series was integrated into next via git@95c687b.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2020

This patch series was integrated into master via git@95c687b.

@gitgitgadget gitgitgadget bot added the master label Aug 18, 2020
@gitgitgadget gitgitgadget bot closed this Aug 18, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2020

Closed via 95c687b.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 19, 2020

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Wed, Aug 5, 2020 at 5:54 PM Junio C Hamano <[email protected]> wrote:
> >> All of which means FETCH_HEAD is special and we may not want to
> >> burden the special casing of it to newer backends.
> >
> > Can you confirm that FETCH_HEAD is the only thing that can store more
> > than just a symref / SHA1 ? Based on the name, and a comment in the
> > JGit source, I thought that MERGE_HEAD might contain more than one
> > SHA1 at a time.
>
> No I cannot.  I do not think MERGE_HEAD is something I added with as
> deep a thought as I did with FETCH_HEAD.  If it mimics FETCH_HEAD,
> then perhaps it does, but I somehow doubt it.

blame.c has an append_merge_parents() which reads multiple lines from
MERGE_HEAD, and cmd_commit does so too. It's populated from
write_merge_heads(). So the special casing should be extended to
MERGE_HEAD.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 19, 2020

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <[email protected]> writes:

> blame.c has an append_merge_parents() which reads multiple lines from
> MERGE_HEAD, and cmd_commit does so too. It's populated from
> write_merge_heads(). So the special casing should be extended to
> MERGE_HEAD.

Ahh.  That makes sense.

@hanwen hanwen deleted the pseudoref branch April 23, 2021 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant