You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Git uses topological levels in the commit-graph file for commit-graph
traversal operations like 'git log --graph'. Unfortunately, topological
levels can perform worse than committer date when parents of a commit
differ greatly in generation numbers [1]. For example, 'git merge-base
v4.8 v4.9' on the Linux repository walks 635,579 commits using
topological levels and walks 167,468 using committer date. Since
091f4cf (commit: don't use generation numbers if not needed,
2018-08-30), 'git merge-base' uses committer date heuristic unless there
is a cutoff because of the performance hit.
[1] https://lore.kernel.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com/
Thus, the need for generation number v2 was born. As Git used to die
when graph version understood by it and in the commit-graph file are
different [2], we needed a way to distinguish between the old and new
generation number without incrementing the graph version.
[2] https://lore.kernel.org/git/[email protected]/
The following candidates were proposed (https://github.com/derrickstolee/gen-test,
#1):
- (Epoch, Date) Pairs.
- Maximum Generation Numbers.
- Corrected Commit Date.
- FELINE Index.
- Corrected Commit Date with Monotonically Increasing Offsets.
Based on performance, local computability, and immutability (along with
the introduction of an additional commit-graph chunk which relieved the
requirement of backwards-compatibility) Corrected Commit Date was chosen
as generation number v2 and is defined as follows:
For a commit C, let its corrected commit date be the maximum of the
commit date of C and the corrected commit dates of its parents plus 1.
Then corrected commit date offset is the difference between corrected
commit date of C and commit date of C. As a special case, a root commit
with the timestamp zero has corrected commit date of 1 to distinguish it
from GENERATION_NUMBER_ZERO (that is, an uncomputed generation number).
While it was proposed initially to store corrected commit date offsets
within Commit Data Chunk, storing the offsets in a new chunk did not
affect the performance measurably. The new chunk is "Generation DATa
(GDAT) chunk" and it stores corrected commit date offsets while CDAT
chunk stores topological level. The old versions of Git would ignore
GDAT chunk, using topological levels from CDAT chunk. In contrast, new
versions of Git would use corrected commit dates, falling back to
topological level if the generation data chunk is absent in the
commit-graph file.
While storing corrected commit date offsets saves us 4 bytes per commit
(as compared with storing corrected commit dates directly), it's however
possible for the offset to overflow the space allocated. To handle such
cases, we introduce a new chunk, _Generation Data Overflow_ (GDOV) that
stores the corrected commit date. For overflowing offsets, we set MSB
and store the position into the GDOV chunk, in a mechanism similar to
the Extra Edges list chunk.
For mixed generation number environment (for example new Git on the
command line, old Git used by GUI client), we can encounter a
mixed-chain commit-graph (a commit-graph chain where some of split
commit-graph files have GDAT chunk and others do not). As backward
compatibility is one of the goals, we can define the following behavior:
While reading a mixed-chain commit-graph version, we fall back on
topological levels as corrected commit dates and topological levels
cannot be compared directly.
When adding new layer to the split commit-graph file, and when merging
some or all layers (replacing them in the latter case), the new layer
will have GDAT chunk if and only if in the final result there would be
no layer without GDAT chunk just below it.
Signed-off-by: Abhishek Kumar <[email protected]>
0 commit comments