[GSoC] Implement Corrected Commit Date #676

abhishekkumar2718 · 2020-07-14T03:10:29Z

This patch series implements the corrected commit date offsets as generation number v2, along with other pre-requisites.

Git uses topological levels in the commit-graph file for commit-graph traversal operations like 'git log --graph'. Unfortunately, topological levels can perform worse than committer date when parents of a commit differ greatly in generation numbers [1]. For example, 'git merge-base v4.8 v4.9' on the Linux repository walks 635,579 commits using topological levels and walks 167,468 using committer date. Since 091f4cf (commit: don't use generation numbers if not needed, 2018-08-30), 'git merge-base' uses committer date heuristic unless there is a cutoff because of the performance hit.

[1] https://lore.kernel.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com/

Thus, the need for generation number v2 was born. As Git used to die when graph version understood by it and in the commit-graph file are different [2], we needed a way to distinguish between the old and new generation number without incrementing the graph version.

[2] https://lore.kernel.org/git/[email protected]/

The following candidates were proposed (https://github.com/derrickstolee/gen-test, abhishekkumar2718#1):

(Epoch, Date) Pairs.
Maximum Generation Numbers.
Corrected Commit Date.
FELINE Index.
Corrected Commit Date with Monotonically Increasing Offsets.

Based on performance, local computability, and immutability (along with the introduction of an additional commit-graph chunk which relieved the requirement of backwards-compatibility) Corrected Commit Date was chosen as generation number v2 and is defined as follows:

For a commit C, let its corrected commit date be the maximum of the commit date of C and the corrected commit dates of its parents plus 1. Then corrected commit date offset is the difference between corrected commit date of C and commit date of C. As a special case, a root commit with the timestamp zero has corrected commit date of 1 to distinguish it from GENERATION_NUMBER_ZERO (that is, an uncomputed generation number).

While it was proposed initially to store corrected commit date offsets within Commit Data Chunk, storing the offsets in a new chunk did not affect the performance measurably. The new chunk is "Generation DATa (GDAT) chunk" and it stores corrected commit date offsets while CDAT chunk stores topological level. The old versions of Git would ignore GDAT chunk, using topological levels from CDAT chunk. In contrast, new versions of Git would use corrected commit dates, falling back to topological level if the generation data chunk is absent in the commit-graph file.

While storing corrected commit date offsets saves us 4 bytes per commit (as compared with storing corrected commit dates directly), it's however possible for the offset to overflow the space allocated. To handle such cases, we introduce a new chunk, Generation Data Overflow (GDOV) that stores the corrected commit date. For overflowing offsets, we set MSB and store the position into the GDOV chunk, in a mechanism similar to the Extra Edges list chunk.

For mixed generation number environment (for example new Git on the command line, old Git used by GUI client), we can encounter a mixed-chain commit-graph (a commit-graph chain where some of split commit-graph files have GDAT chunk and others do not). As backward compatibility is one of the goals, we can define the following behavior:

While reading a mixed-chain commit-graph version, we fall back on topological levels as corrected commit dates and topological levels cannot be compared directly.

When adding new layer to the split commit-graph file, and when merging some or all layers (replacing them in the latter case), the new layer will have GDAT chunk if and only if in the final result there would be no layer without GDAT chunk just below it.

Thanks to Dr. Stolee, Dr. Narębski, Taylor Blau and SZEDER Gábor for their reviews.

I look forward to everyone's reviews!

Thanks

Abhishek

Improvements left for a future series:

Save commits with generation data overflow and extra edge commits instead of looping over all commits. cf. [email protected]
Verify both topological levels and corrected commit dates when present. cf. [email protected]

Changes in version 7:

Moved the documentation patch ahead of "commit-graph: implement corrected commit date" and elaborated on the introduction of generation number v2.

Changes in version 6:

Fixed typos in commit message for "commit-graph: implement corrected commit date".
Removed an unnecessary else-block in "commit-graph: implement corrected commit date".
Validate mixed generation chain correctly while writing in "commit-graph: use generation v2 only if the entire chain does".
Die if the GDAT chunk indicates data has overflown but there are is no generation data overflow chunk.

Changes in version 5:

Explained a possible reason for no change in performance for "commit-graph: fix regression when computing bloom-filters"
Clarified about the addition of a new test for 11-digit octal implementations of ustar.
Fixed duplicate test names in "commit-graph: consolidate fill_commit_graph_info".
Swapped the order "commit-graph: return 64-bit generation number", "commit-graph: add a slab to store topological levels" to minimize lines changed.
Fixed the mismerge in "commit-graph: return 64-bit generation number"
Clarified the preparatory steps are for the larger goal of implementing generation number v2 in "commit-graph: return 64-bit generation number".
Moved the rename of "run_three_modes()" to "run_all_modes()" into a new patch "t6600-test-reach: generalize *_three_modes".
Explained and removed the checks for GENERATION_NUMBER_INFINITY that can never be true in "commit-graph: add a slab to store topological levels".
Fixed incorrect logic for verifying commit-graph in "commit-graph: implement corrected commit date".
Added minor improvements to commit message of "commit-graph: implement generation data chunk".
Added '--date ' option to test_commit() in 'test-lib-functions.sh' in "commit-graph: implement generation data chunk".
Improved coding style (also in tests) for "commit-graph: use generation v2 only if entire chain does".
Simplified test repository structure in "commit-graph: use generation v2 only if entire chain does" as only the number of commits in a split commit-graph layer are relevant.
Added a new test in "commit-graph: use generation v2 only if entire chain does" to check if the layers are merged correctly.
Explicitly mentioned commit "091f4cf3" in the commit-message of "commit-graph: use corrected commit dates in paint_down_to_common()".
Minor corrections to documentation in "doc: add corrected commit date info".
Minor corrections to coding style.

Changes in version 4:

Added GDOV to handle overflows in generation data.
Added a test for writing tip graph for a generation number v2 graph chain in t5324-split-commit-graph.sh
Added a section on how mixed generation number chains are handled in Documentation/technical/commit-graph-format.txt
Reverted unimportant whitespace, style changes in commit-graph.c
Added header comments about the order of comparision for compare_commits_by_gen_then_commit_date in commit.h, compare_commits_by_gen in commit-graph.h
Elaborated on why t6404 fails with corrected commit date and must be run with GIT_TEST_COMMIT_GRAPH=1in the commit "commit-reach: use corrected commit dates in paint_down_to_common()"
Elaborated on write behavior for mixed generation number chains in the commit "commit-graph: use generation v2 only if entire chain does"
Added notes about adding the topo_level slab to struct write_commit_graph_context as well as struct commit_graph.
Clarified commit message for "commit-graph: consolidate fill_commit_graph_info"
Removed the claim "GDAT can store future generation numbers" because it hasn't been tested yet.

Changes in version 3:

Reordered patches to implement corrected commit date before generation data chunk [3].
Split "implement corrected commit date" into two patches - one introducing the topo level slab and other implementing corrected commit dates.
Extended split-commit-graph tests to verify at the end of test.
Use topological levels as generation number if any of split commit-graph files do not have generation data chunk.

[3] https://lore.kernel.org/git/[email protected]/

Changes in version 2:

Add tests for generation data chunk.
Add an option GIT_TEST_COMMIT_GRAPH_NO_GDAT to control whether to write generation data chunk.
Compare commits with corrected commit dates if present in paint_down_to_common().
Update technical documentation.
Handle mixed generation commit chains.
Improve commit messages for "commit-graph: fix regression when computing bloom filter", "commit-graph: consolidate fill_commit_graph_info",
Revert unnecessary whitespace changes.
Split uint_32 -> timestamp_t change into a new commit.

cc: Derrick Stolee [email protected], Jakub Narębski [email protected]
cc: Abhishek Kumar [email protected]
cc: SZEDER Gábor [email protected]
cc: Taylor Blau [email protected]

gitgitgadget · 2020-07-14T03:11:14Z

Welcome to GitGitGadget

Hi @abhishekkumar2718, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that your Pull Request has a good description, as it will be used as cover letter.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

the lines should not exceed 76 columns,
the first line should be like a header and typically start with a prefix like "tests:" or "commit:", and
the commit messages' body should be describing the "why?" of the change.
Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Search: is:pr is:open "/allow"

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the FreeNode IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Freenode. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

gitgitgadget · 2020-07-14T03:11:14Z

There is an issue in commit 7193ebf81003b2916fa961bfb6e8600ad51d5d59:
Commit not signed off

gitgitgadget · 2020-07-14T03:11:15Z

There is an issue in commit a739d05b697b75bbb42fb544cec85b97cf156b88:
Commit not signed off

gitgitgadget · 2020-07-14T03:11:16Z

There is an issue in commit 5a3f76b14423df00b6171734135f6b06fd272513:
Commit checks stopped - the message is too short

derrickstolee · 2020-07-14T13:28:36Z

@abhishekkumar2718 thanks for opening this so I can see your progress! Please @-mention me when you have working builds so I can take a quick review before you submit.

dscho · 2020-07-14T14:22:49Z

/allow

gitgitgadget · 2020-07-14T14:23:36Z

User abhishekkumar2718 is now allowed to use GitGitGadget.

WARNING: abhishekkumar2718 has no public email address set on GitHub

derrickstolee · 2020-07-14T14:42:06Z

User abhishekkumar2718 is now allowed to use GitGitGadget.

WARNING: abhishekkumar2718 has no public email address set on GitHub

@abhishekkumar2718 please fix the public email address so GitGitGadget can CC you properly.

gitgitgadget · 2020-07-16T16:01:59Z

There is an issue in commit dd7a19b322302173f2e03921a9fcc51f85caa864:
Commit not signed off

derrickstolee

I think updating your struct to make the generation comparisons simpler will help. You will need to update a lot of code in commit-reach.c to use a uint64_t min_generation whenever you see a uint32_t min_generation or similar.

commit.c

commit.h

commit-graph.c

t/t5318-commit-graph.sh

commit-graph.h

commit-graph.c

abhishekkumar2718 · 2020-07-28T09:13:01Z

/submit

gitgitgadget · 2020-07-28T09:14:22Z

Submitted as [email protected]

revision.c

commit-graph.c

gitgitgadget · 2021-01-31T23:26:58Z

This patch series was integrated into seen via git@66841f6.

gitgitgadget · 2021-02-01T00:01:17Z

This patch series was integrated into seen via git@218525a.

Git uses topological levels in the commit-graph file for commit-graph traversal operations like 'git log --graph'. Unfortunately, topological levels can perform worse than committer date when parents of a commit differ greatly in generation numbers [1]. For example, 'git merge-base v4.8 v4.9' on the Linux repository walks 635,579 commits using topological levels and walks 167,468 using committer date. Since 091f4cf (commit: don't use generation numbers if not needed, 2018-08-30), 'git merge-base' uses committer date heuristic unless there is a cutoff because of the performance hit. [1] https://lore.kernel.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com/ Thus, the need for generation number v2 was born. As Git used to die when graph version understood by it and in the commit-graph file are different [2], we needed a way to distinguish between the old and new generation number without incrementing the graph version. [2] https://lore.kernel.org/git/[email protected]/ The following candidates were proposed (https://github.com/derrickstolee/gen-test, #1): - (Epoch, Date) Pairs. - Maximum Generation Numbers. - Corrected Commit Date. - FELINE Index. - Corrected Commit Date with Monotonically Increasing Offsets. Based on performance, local computability, and immutability (along with the introduction of an additional commit-graph chunk which relieved the requirement of backwards-compatibility) Corrected Commit Date was chosen as generation number v2 and is defined as follows: For a commit C, let its corrected commit date be the maximum of the commit date of C and the corrected commit dates of its parents plus 1. Then corrected commit date offset is the difference between corrected commit date of C and commit date of C. As a special case, a root commit with the timestamp zero has corrected commit date of 1 to distinguish it from GENERATION_NUMBER_ZERO (that is, an uncomputed generation number). While it was proposed initially to store corrected commit date offsets within Commit Data Chunk, storing the offsets in a new chunk did not affect the performance measurably. The new chunk is "Generation DATa (GDAT) chunk" and it stores corrected commit date offsets while CDAT chunk stores topological level. The old versions of Git would ignore GDAT chunk, using topological levels from CDAT chunk. In contrast, new versions of Git would use corrected commit dates, falling back to topological level if the generation data chunk is absent in the commit-graph file. While storing corrected commit date offsets saves us 4 bytes per commit (as compared with storing corrected commit dates directly), it's however possible for the offset to overflow the space allocated. To handle such cases, we introduce a new chunk, _Generation Data Overflow_ (GDOV) that stores the corrected commit date. For overflowing offsets, we set MSB and store the position into the GDOV chunk, in a mechanism similar to the Extra Edges list chunk. For mixed generation number environment (for example new Git on the command line, old Git used by GUI client), we can encounter a mixed-chain commit-graph (a commit-graph chain where some of split commit-graph files have GDAT chunk and others do not). As backward compatibility is one of the goals, we can define the following behavior: While reading a mixed-chain commit-graph version, we fall back on topological levels as corrected commit dates and topological levels cannot be compared directly. When adding new layer to the split commit-graph file, and when merging some or all layers (replacing them in the latter case), the new layer will have GDAT chunk if and only if in the final result there would be no layer without GDAT chunk just below it. Signed-off-by: Abhishek Kumar <[email protected]>

With most of preparations done, let's implement corrected commit date. The corrected commit date for a commit is defined as: * A commit with no parents (a root commit) has corrected commit date equal to its committer date. * A commit with at least one parent has corrected commit date equal to the maximum of its commit date and one more than the largest corrected commit date among its parents. As a special case, a root commit with timestamp of zero (01.01.1970 00:00:00Z) has corrected commit date of one, to be able to distinguish from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit date). To minimize the space required to store corrected commit date, Git stores corrected commit date offsets into the commit-graph file. The corrected commit date offset for a commit is defined as the difference between its corrected commit date and actual commit date. Storing corrected commit date requires sizeof(timestamp_t) bytes, which in most cases is 64 bits (uintmax_t). However, corrected commit date offsets can be safely stored using only 32-bits. This halves the size of GDAT chunk, which is a reduction of around 6% in the size of commit-graph file. However, using offsets be problematic if a commit is malformed but valid and has committer date of 0 Unix time, as the offset would be the same as corrected commit date and thus require 64-bits to be stored properly. While Git does not write out offsets at this stage, Git stores the corrected commit dates in member generation of struct commit_graph_data. It will begin writing commit date offsets with the introduction of generation data chunk. Signed-off-by: Abhishek Kumar <[email protected]>

As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/[email protected]/ Signed-off-by: Abhishek Kumar <[email protected]>

Since there are released versions of Git that understand generation numbers in the commit-graph's CDAT chunk but do not understand the GDAT chunk, the following scenario is possible: 1. "New" Git writes a commit-graph with the GDAT chunk. 2. "Old" Git writes a split commit-graph on top without a GDAT chunk. If each layer of split commit-graph is treated independently, as it was the case before this commit, with Git inspecting only the current layer for chunk_generation_data pointer, commits in the lower layer (one with GDAT) whould have corrected commit date as their generation number, while commits in the upper layer would have topological levels as their generation. Corrected commit dates usually have much larger values than topological levels. This means that if we take two commits, one from the upper layer, and one reachable from it in the lower layer, then the expectation that the generation of a parent is smaller than the generation of a child would be violated. It is difficult to expose this issue in a test. Since we _start_ with artificially low generation numbers, any commit walk that prioritizes generation numbers will walk all of the commits with high generation number before walking the commits with low generation number. In all the cases I tried, the commit-graph layers themselves "protect" any incorrect behavior since none of the commits in the lower layer can reach the commits in the upper layer. This issue would manifest itself as a performance problem in this case, especially with something like "git log --graph" since the low generation numbers would cause the in-degree queue to walk all of the commits in the lower layer before allowing the topo-order queue to write anything to output (depending on the size of the upper layer). Therefore, When writing the new layer in split commit-graph, we write a GDAT chunk only if the topmost layer has a GDAT chunk. This guarantees that if a layer has GDAT chunk, all lower layers must have a GDAT chunk as well. Rewriting layers follows similar approach: if the topmost layer below the set of layers being rewritten (in the split commit-graph chain) exists, and it does not contain GDAT chunk, then the result of rewrite does not have GDAT chunks either. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Abhishek Kumar <[email protected]>

091f4cf (commit: don't use generation numbers if not needed, 2018-08-30) changed paint_down_to_common() to use commit dates instead of generation numbers v1 (topological levels) as the performance regressed on certain topologies. With generation number v2 (corrected commit dates) implemented, we no longer have to rely on commit dates and can use generation numbers. For example, the command `git merge-base v4.8 v4.9` on the Linux repository walks 167468 commits, taking 0.135s for committer date and 167496 commits, taking 0.157s for corrected committer date respectively. While using corrected commit dates, Git walks nearly the same number of commits as commit date, the process is slower as for each comparision we have to access a commit-slab (for corrected committer date) instead of accessing struct member (for committer date). This change incidentally broke the fragile t6404-recursive-merge test. t6404-recursive-merge sets up a unique repository where all commits have the same committer date without a well-defined merge-base. While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer date as a heuristic in paint_down_to_common(). 6404.1 'combined merge conflicts' merges commits in the order: - Merge C with B to form an intermediate commit. - Merge the intermediate commit with A. With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently use the corrected committer date, which changes the order in which commits are merged: - Merge A with B to form an intermediate commit. - Merge the intermediate commit with C. While resulting repositories are equivalent, 6404.4 'virtual trees were processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting different merge-bases and thus have different object ids for the intermediate commits. As this has already causes problems (as noted in 859fdc0 (commit-graph: define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph within t6404-recursive-merge. Signed-off-by: Abhishek Kumar <[email protected]>

abhishekkumar2718 · 2021-02-01T06:57:42Z

/submit

gitgitgadget · 2021-02-01T06:59:17Z

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-676/abhishekkumar2718/corrected_commit_date-v7

To fetch this version to local tag pr-676/abhishekkumar2718/corrected_commit_date-v7:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-676/abhishekkumar2718/corrected_commit_date-v7

gitgitgadget · 2021-02-01T13:18:28Z

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/1/2021 1:58 AM, Abhishek Kumar via GitGitGadget wrote:

> Changes in version 7:
> 
>  * Moved the documentation patch ahead of "commit-graph: implement corrected
>    commit date" and elaborated on the introduction of generation number v2.

The only change in this version is this commit message:

>  11:  e571f03d8bd !  7:  8647b5d2e38 doc: add corrected commit date info
>      @@ Metadata
>       Author: Abhishek Kumar <[email protected]>
>       
>        ## Commit message ##
>      -    doc: add corrected commit date info
>      +    commit-graph: document generation number v2
>       
>      -    With generation data chunk and corrected commit dates implemented, let's
>      -    update the technical documentation for commit-graph.
>      +    Git uses topological levels in the commit-graph file for commit-graph
>      +    traversal operations like 'git log --graph'. Unfortunately, topological
>      +    levels can perform worse than committer date when parents of a commit
>      +    differ greatly in generation numbers [1]. For example, 'git merge-base
>      +    v4.8 v4.9' on the Linux repository walks 635,579 commits using
>      +    topological levels and walks 167,468 using committer date. Since
>      +    091f4cf3 (commit: don't use generation numbers if not needed,
>      +    2018-08-30), 'git merge-base' uses committer date heuristic unless there
>      +    is a cutoff because of the performance hit.
>      +
>      +    [1] https://lore.kernel.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com/
>      +
>      +    Thus, the need for generation number v2 was born. As Git used to die
>      +    when graph version understood by it and in the commit-graph file are
>      +    different [2], we needed a way to distinguish between the old and new
>      +    generation number without incrementing the graph version.
>      +
>      +    [2] https://lore.kernel.org/git/[email protected]/
>      +
>      +    The following candidates were proposed (https://github.com/derrickstolee/gen-test,
>      +    https://github.com/abhishekkumar2718/git/pull/1):
>      +    - (Epoch, Date) Pairs.
>      +    - Maximum Generation Numbers.
>      +    - Corrected Commit Date.
>      +    - FELINE Index.
>      +    - Corrected Commit Date with Monotonically Increasing Offsets.
>      +
>      +    Based on performance, local computability, and immutability (along with
>      +    the introduction of an additional commit-graph chunk which relieved the
>      +    requirement of backwards-compatibility) Corrected Commit Date was chosen
>      +    as generation number v2 and is defined as follows:
>      +
>      +    For a commit C, let its corrected commit date  be the maximum of the
>      +    commit date of C and the corrected commit dates of its parents plus 1.
>      +    Then corrected commit date offset is the difference between corrected
>      +    commit date of C and commit date of C. As a special case, a root commit
>      +    with the timestamp zero has corrected commit date of 1 to distinguish it
>      +    from GENERATION_NUMBER_ZERO (that is, an uncomputed generation number).
>      +
>      +    While it was proposed initially to store corrected commit date offsets
>      +    within Commit Data Chunk, storing the offsets in a new chunk did not
>      +    affect the performance measurably. The new chunk is "Generation DATa
>      +    (GDAT) chunk" and it stores corrected commit date offsets while CDAT
>      +    chunk stores topological level. The old versions of Git would ignore
>      +    GDAT chunk, using topological levels from CDAT chunk. In contrast, new
>      +    versions of Git would use corrected commit dates, falling back to
>      +    topological level if the generation data chunk is absent in the
>      +    commit-graph file.
>      +
>      +    While storing corrected commit date offsets saves us 4 bytes per commit
>      +    (as compared with storing corrected commit dates directly), it's however
>      +    possible for the offset to overflow the space allocated. To handle such
>      +    cases, we introduce a new chunk, _Generation Data Overflow_ (GDOV) that
>      +    stores the corrected commit date. For overflowing offsets, we set MSB
>      +    and store the position into the GDOV chunk, in a mechanism similar to
>      +    the Extra Edges list chunk.
>      +
>      +    For mixed generation number environment (for example new Git on the
>      +    command line, old Git used by GUI client), we can encounter a
>      +    mixed-chain commit-graph (a commit-graph chain where some of split
>      +    commit-graph files have GDAT chunk and others do not). As backward
>      +    compatibility is one of the goals, we can define the following behavior:
>      +
>      +    While reading a mixed-chain commit-graph version, we fall back on
>      +    topological levels as corrected commit dates and topological levels
>      +    cannot be compared directly.
>      +
>      +    When adding new layer to the split commit-graph file, and when merging
>      +    some or all layers (replacing them in the latter case), the new layer
>      +    will have GDAT chunk if and only if in the final result there would be
>      +    no layer without GDAT chunk just below it.

While that is a quality message, v6 has landed in 'next' and I've begun
working off of that version. As Taylor attempted to say [1], this topic
should be considered final and updates should be follow-ups on top.

[1] https://lore.kernel.org/git/[email protected]/

(Of course, if Junio says differently, then listen to him.)

Thanks,
-Stolee

dscho · 2021-02-01T13:25:03Z

/submit

Ummm. @abhishekkumar2718 did you miss what I wrote about next?

gitgitgadget · 2021-02-01T18:30:05Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <[email protected]> writes:

>>      +    When adding new layer to the split commit-graph file, and when merging
>>      +    some or all layers (replacing them in the latter case), the new layer
>>      +    will have GDAT chunk if and only if in the final result there would be
>>      +    no layer without GDAT chunk just below it.
>
> While that is a quality message, v6 has landed in 'next' and I've begun
> working off of that version. As Taylor attempted to say [1], this topic
> should be considered final and updates should be follow-ups on top.
>
> [1] https://lore.kernel.org/git/[email protected]/

Sounds sensible, modulo s/final/solid enough/ ;-)

I would imagine that the "quality message" has something of value to
keep to help future developers, and if that is the case, a follow-up
patch to add to the Documentation/technical/ would be appropriate.

Thanks all, for a quality series.

gitgitgadget · 2021-02-02T00:15:28Z

This patch series was integrated into seen via git@deae33e.

gitgitgadget · 2021-02-02T00:45:45Z

This patch series was integrated into seen via git@7862ac8.

gitgitgadget · 2021-02-03T23:15:38Z

This patch series was integrated into seen via git@cd2bfee.

gitgitgadget · 2021-02-06T01:59:59Z

This patch series was integrated into seen via git@e502eab.

gitgitgadget · 2021-02-08T23:57:37Z

This patch series was integrated into seen via git@5fb6127.

gitgitgadget · 2021-02-10T01:06:32Z

This patch series was integrated into seen via git@deae33e.

gitgitgadget · 2021-02-10T01:27:45Z

This patch series was integrated into seen via git@59d4dae.

gitgitgadget · 2021-02-10T23:39:55Z

This patch series was integrated into seen via git@7943c06.

gitgitgadget · 2021-02-11T01:30:57Z

This patch series was integrated into seen via git@cf9fbab.

gitgitgadget · 2021-02-11T23:02:06Z

This patch series was integrated into seen via git@2e3ad28.

gitgitgadget · 2021-02-12T22:57:01Z

This patch series was integrated into seen via git@0cf0dc9.

gitgitgadget · 2021-02-19T00:12:21Z

This patch series was integrated into seen via git@8b4701a.

gitgitgadget · 2021-02-19T00:12:23Z

This patch series was integrated into next via git@8b4701a.

gitgitgadget · 2021-02-19T00:12:23Z

This patch series was integrated into master via git@8b4701a.

gitgitgadget · 2021-02-19T00:12:27Z

Closed via 8b4701a.

derrickstolee self-requested a review July 14, 2020 13:28

abhishekkumar2718 force-pushed the corrected_commit_date branch from 5a3f76b to dd7a19b Compare July 16, 2020 16:01

abhishekkumar2718 force-pushed the corrected_commit_date branch 6 times, most recently from e151734 to 171c05e Compare July 20, 2020 17:03

derrickstolee reviewed Jul 21, 2020

View reviewed changes

abhishekkumar2718 force-pushed the corrected_commit_date branch 7 times, most recently from b784e44 to 647290d Compare July 28, 2020 07:17

abhishekkumar2718 marked this pull request as ready for review July 28, 2020 09:14

gitgitgadget bot reviewed Jul 28, 2020

View reviewed changes

revision.c Show resolved Hide resolved

gitgitgadget bot reviewed Jul 28, 2020

View reviewed changes

commit-graph.c Show resolved Hide resolved

abhishekkumar2718 added 5 commits February 1, 2021 11:48

abhishekkumar2718 force-pushed the corrected_commit_date branch from 9b40d5d to 523e2d4 Compare February 1, 2021 06:24

gitgitgadget bot added the master label Feb 19, 2021

gitgitgadget bot closed this Feb 19, 2021

[GSoC] Implement Corrected Commit Date #676

[GSoC] Implement Corrected Commit Date #676

Uh oh!

Conversation

abhishekkumar2718 commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitgitgadget bot commented Jul 14, 2020

Welcome to GitGitGadget

Contributing the patches

Need help?

Uh oh!

gitgitgadget bot commented Jul 14, 2020

Uh oh!

gitgitgadget bot commented Jul 14, 2020

Uh oh!

gitgitgadget bot commented Jul 14, 2020

Uh oh!

derrickstolee commented Jul 14, 2020

Uh oh!

dscho commented Jul 14, 2020

Uh oh!

gitgitgadget bot commented Jul 14, 2020

Uh oh!

derrickstolee commented Jul 14, 2020

Uh oh!

gitgitgadget bot commented Jul 16, 2020

Uh oh!

derrickstolee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhishekkumar2718 commented Jul 28, 2020

Uh oh!

gitgitgadget bot commented Jul 28, 2020

Uh oh!

Uh oh!

Uh oh!

gitgitgadget bot commented Jan 31, 2021

Uh oh!

gitgitgadget bot commented Feb 1, 2021

Uh oh!

abhishekkumar2718 commented Feb 1, 2021

Uh oh!

gitgitgadget bot commented Feb 1, 2021

Uh oh!

gitgitgadget bot commented Feb 1, 2021

Uh oh!

dscho commented Feb 1, 2021

Uh oh!

gitgitgadget bot commented Feb 1, 2021

Uh oh!

gitgitgadget bot commented Feb 2, 2021

Uh oh!

gitgitgadget bot commented Feb 2, 2021

Uh oh!

gitgitgadget bot commented Feb 3, 2021

Uh oh!

gitgitgadget bot commented Feb 6, 2021

Uh oh!

gitgitgadget bot commented Feb 8, 2021

Uh oh!

gitgitgadget bot commented Feb 10, 2021

Uh oh!

gitgitgadget bot commented Feb 10, 2021

Uh oh!

gitgitgadget bot commented Feb 10, 2021

Uh oh!

gitgitgadget bot commented Feb 11, 2021

Uh oh!

gitgitgadget bot commented Feb 11, 2021

Uh oh!

gitgitgadget bot commented Feb 12, 2021

abhishekkumar2718 commented Jul 14, 2020 •

edited

Loading