Skip to content

Commit 6defbaa

Browse files
jeffhostetlerdscho
authored andcommitted
status: add status serialization mechanism
Teach STATUS to optionally serialize the results of a status computation to a file. Teach STATUS to optionally read an existing serialization file and simply print the results, rather than actually scanning. This is intended for immediate status results on extremely large repos and assumes the use of a service/daemon to maintain a fresh current status snapshot. Signed-off-by: Jeff Hostetler <[email protected]>
1 parent 1c63bb2 commit 6defbaa

13 files changed

+1315
-3
lines changed

Documentation/config/status.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,9 @@ status.submoduleSummary::
7070
the --ignore-submodules=dirty command-line option or the 'git
7171
submodule summary' command, which shows a similar output but does
7272
not honor these settings.
73+
74+
status.deserializePath::
75+
EXPERIMENTAL, Pathname to a file containing cached status results
76+
generated by `--serialize`. This will be overridden by
77+
`--deserialize=<path>` on the command line. If the cache file is
78+
invalid or stale, git will fall-back and compute status normally.

Documentation/git-status.txt

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,19 @@ ignored, then the directory is not shown, but all contents are shown.
152152
update it afterwards if any changes were detected. Defaults to
153153
`--lock-index`.
154154

155+
--serialize[=<version>]::
156+
(EXPERIMENTAL) Serialize raw status results to stdout in a
157+
format suitable for use by `--deserialize`. Valid values for
158+
`<version>` are "1" and "v1".
159+
160+
--deserialize[=<path>]::
161+
(EXPERIMENTAL) Deserialize raw status results from a file or
162+
stdin rather than scanning the worktree. If `<path>` is omitted
163+
and `status.deserializePath` is unset, input is read from stdin.
164+
--no-deserialize::
165+
(EXPERIMENTAL) Disable implicit deserialization of status results
166+
from the value of `status.deserializePath`.
167+
155168
<pathspec>...::
156169
See the 'pathspec' entry in linkgit:gitglossary[7].
157170

@@ -397,6 +410,26 @@ quoted as explained for the configuration variable `core.quotePath`
397410
(see linkgit:git-config[1]).
398411

399412

413+
SERIALIZATION and DESERIALIZATION (EXPERIMENTAL)
414+
------------------------------------------------
415+
416+
The `--serialize` option allows git to cache the result of a
417+
possibly time-consuming status scan to a binary file. A local
418+
service/daemon watching file system events could use this to
419+
periodically pre-compute a fresh status result.
420+
421+
Interactive users could then use `--deserialize` to simply
422+
(and immediately) print the last-known-good result without
423+
waiting for the status scan.
424+
425+
The binary serialization file format includes some worktree state
426+
information allowing `--deserialize` to reject the cached data
427+
and force a normal status scan if, for example, the commit, branch,
428+
or status modes/options change. The format cannot, however, indicate
429+
when the cached data is otherwise stale -- that coordination belongs
430+
to the task driving the serializations.
431+
432+
400433
CONFIGURATION
401434
-------------
402435

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
Git status serialization format
2+
===============================
3+
4+
Git status serialization enables git to dump the results of a status scan
5+
to a binary file. This file can then be loaded by later status invocations
6+
to print the cached status results.
7+
8+
The file contains the essential fields from:
9+
() the index
10+
() the "struct wt_status" for the overall results
11+
() the contents of "struct wt_status_change_data" for tracked changed files
12+
() the list of untracked and ignored files
13+
14+
Version 1 Format:
15+
=================
16+
17+
The V1 file begins with a required header section followed by optional
18+
sections for each type of item (changed, untracked, ignored). Individual
19+
item sections are only present if necessary. Each item section begins
20+
with an item-type header with the number of items in the section.
21+
22+
Each "line" in the format is encoded using pkt-line with a final LF.
23+
Flush packets are used to terminate sections.
24+
25+
-----------------
26+
PKT-LINE("version" SP "1")
27+
<v1-header-section>
28+
[<v1-changed-item-section>]
29+
[<v1-untracked-item-section>]
30+
[<v1-ignored-item-section>]
31+
-----------------
32+
33+
34+
V1 Header
35+
---------
36+
37+
The v1-header-section fields are taken directly from "struct wt_status".
38+
Each field is printed on a separate pkt-line. Lines for NULL string
39+
values are omitted. All integers are printed with "%d". OIDs are
40+
printed in hex.
41+
42+
v1-header-section = <v1-index-headers>
43+
<v1-wt-status-headers>
44+
PKT-LINE(<flush>)
45+
46+
v1-index-headers = PKT-LINE("index_mtime" SP <sec> SP <nsec> LF)
47+
48+
v1-wt-status-headers = PKT-LINE("is_initial" SP <integer> LF)
49+
[ PKT-LINE("branch" SP <branch-name> LF) ]
50+
[ PKT-LINE("reference" SP <reference-name> LF) ]
51+
PKT-LINE("show_ignored_files" SP <integer> LF)
52+
PKT-LINE("show_untracked_files" SP <integer> LF)
53+
PKT-LINE("show_ignored_directory" SP <integer> LF)
54+
[ PKT-LINE("ignore_submodule_arg" SP <string> LF) ]
55+
PKT-LINE("detect_rename" SP <integer> LF)
56+
PKT-LINE("rename_score" SP <integer> LF)
57+
PKT-LINE("rename_limit" SP <integer> LF)
58+
PKT-LINE("detect_break" SP <integer> LF)
59+
PKT-LINE("sha1_commit" SP <oid> LF)
60+
PKT-LINE("committable" SP <integer> LF)
61+
PKT-LINE("workdir_dirty" SP <integer> LF)
62+
63+
64+
V1 Changed Items
65+
----------------
66+
67+
The v1-changed-item-section lists all of the changed items with one
68+
item per pkt-line. Each pkt-line contains: a binary block of data
69+
from "struct wt_status_serialize_data_fixed" in a fixed header where
70+
integers are in network byte order and OIDs are in raw (non-hex) form.
71+
This is followed by one or two raw pathnames (not c-quoted) with NUL
72+
terminators (both NULs are always present even if there is no rename).
73+
74+
v1-changed-item-section = PKT-LINE("changed" SP <count> LF)
75+
[ PKT-LINE(<changed_item> LF) ]+
76+
PKT-LINE(<flush>)
77+
78+
changed_item = <byte[4] worktree_status>
79+
<byte[4] index_status>
80+
<byte[4] stagemask>
81+
<byte[4] score>
82+
<byte[4] mode_head>
83+
<byte[4] mode_index>
84+
<byte[4] mode_worktree>
85+
<byte[4] dirty_submodule>
86+
<byte[4] new_submodule_commits>
87+
<byte[20] oid_head>
88+
<byte[20] oid_index>
89+
<byte[*] path>
90+
NUL
91+
[ <byte[*] src_path> ]
92+
NUL
93+
94+
95+
V1 Untracked and Ignored Items
96+
------------------------------
97+
98+
These sections are simple lists of pathnames. They ARE NOT
99+
c-quoted.
100+
101+
v1-untracked-item-section = PKT-LINE("untracked" SP <count> LF)
102+
[ PKT-LINE(<pathname> LF) ]+
103+
PKT-LINE(<flush>)
104+
105+
v1-ignored-item-section = PKT-LINE("ignored" SP <count> LF)
106+
[ PKT-LINE(<pathname> LF) ]+
107+
PKT-LINE(<flush>)

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1044,6 +1044,8 @@ LIB_OBJS += wrapper.o
10441044
LIB_OBJS += write-or-die.o
10451045
LIB_OBJS += ws.o
10461046
LIB_OBJS += wt-status.o
1047+
LIB_OBJS += wt-status-deserialize.o
1048+
LIB_OBJS += wt-status-serialize.o
10471049
LIB_OBJS += xdiff-interface.o
10481050
LIB_OBJS += zlib.o
10491051

builtin/commit.c

Lines changed: 119 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,67 @@ static int opt_parse_porcelain(const struct option *opt, const char *arg, int un
143143
return 0;
144144
}
145145

146+
static int do_serialize = 0;
147+
static int do_implicit_deserialize = 0;
148+
static int do_explicit_deserialize = 0;
149+
static char *deserialize_path = NULL;
150+
151+
/*
152+
* --serialize | --serialize=1 | --serialize=v1
153+
*
154+
* Request that we serialize our output rather than printing in
155+
* any of the established formats. Optionally specify serialization
156+
* version.
157+
*/
158+
static int opt_parse_serialize(const struct option *opt, const char *arg, int unset)
159+
{
160+
enum wt_status_format *value = (enum wt_status_format *)opt->value;
161+
if (unset || !arg)
162+
*value = STATUS_FORMAT_SERIALIZE_V1;
163+
else if (!strcmp(arg, "v1") || !strcmp(arg, "1"))
164+
*value = STATUS_FORMAT_SERIALIZE_V1;
165+
else
166+
die("unsupported serialize version '%s'", arg);
167+
168+
if (do_explicit_deserialize)
169+
die("cannot mix --serialize and --deserialize");
170+
do_implicit_deserialize = 0;
171+
172+
do_serialize = 1;
173+
return 0;
174+
}
175+
176+
/*
177+
* --deserialize | --deserialize=<path> |
178+
* --no-deserialize
179+
*
180+
* Request that we deserialize status data from some existing resource
181+
* rather than performing a status scan.
182+
*
183+
* The input source can come from stdin or a path given here -- or be
184+
* inherited from the config settings.
185+
*/
186+
static int opt_parse_deserialize(const struct option *opt, const char *arg, int unset)
187+
{
188+
if (unset) {
189+
do_implicit_deserialize = 0;
190+
do_explicit_deserialize = 0;
191+
} else {
192+
if (do_serialize)
193+
die("cannot mix --serialize and --deserialize");
194+
/* override config or stdin */
195+
deserialize_path = xstrdup_or_null(arg);
196+
if (deserialize_path && *deserialize_path
197+
&& (access(deserialize_path, R_OK) != 0))
198+
die("cannot find serialization file '%s'",
199+
deserialize_path);
200+
201+
do_explicit_deserialize = 1;
202+
}
203+
204+
return 0;
205+
}
206+
146207
static int opt_parse_m(const struct option *opt, const char *arg, int unset)
147208
{
148209
struct strbuf *buf = opt->value;
@@ -1038,6 +1099,8 @@ static void handle_untracked_files_arg(struct wt_status *s)
10381099
s->show_untracked_files = SHOW_NORMAL_UNTRACKED_FILES;
10391100
else if (!strcmp(untracked_files_arg, "all"))
10401101
s->show_untracked_files = SHOW_ALL_UNTRACKED_FILES;
1102+
else if (!strcmp(untracked_files_arg,"complete"))
1103+
s->show_untracked_files = SHOW_COMPLETE_UNTRACKED_FILES;
10411104
else
10421105
die(_("Invalid untracked files mode '%s'"), untracked_files_arg);
10431106
}
@@ -1266,6 +1329,19 @@ static int git_status_config(const char *k, const char *v, void *cb)
12661329
s->relative_paths = git_config_bool(k, v);
12671330
return 0;
12681331
}
1332+
if (!strcmp(k, "status.deserializepath")) {
1333+
/*
1334+
* Automatically assume deserialization if this is
1335+
* set in the config and the file exists. Do not
1336+
* complain if the file does not exist, because we
1337+
* silently fall back to normal mode.
1338+
*/
1339+
if (v && *v && access(v, R_OK) == 0) {
1340+
do_implicit_deserialize = 1;
1341+
deserialize_path = xstrdup(v);
1342+
}
1343+
return 0;
1344+
}
12691345
if (!strcmp(k, "status.showuntrackedfiles")) {
12701346
if (!v)
12711347
return config_error_nonbool(k);
@@ -1308,7 +1384,8 @@ int cmd_status(int argc, const char **argv, const char *prefix)
13081384
static int show_ignored_directory = 0;
13091385
static struct wt_status s;
13101386
unsigned int progress_flag = 0;
1311-
int fd;
1387+
int try_deserialize;
1388+
int fd = -1;
13121389
struct object_id oid;
13131390
static struct option builtin_status_options[] = {
13141391
OPT__VERBOSE(&verbose, N_("be verbose")),
@@ -1323,6 +1400,12 @@ int cmd_status(int argc, const char **argv, const char *prefix)
13231400
{ OPTION_CALLBACK, 0, "porcelain", &status_format,
13241401
N_("version"), N_("machine-readable output"),
13251402
PARSE_OPT_OPTARG, opt_parse_porcelain },
1403+
{ OPTION_CALLBACK, 0, "serialize", &status_format,
1404+
N_("version"), N_("serialize raw status data to stdout"),
1405+
PARSE_OPT_OPTARG | PARSE_OPT_NONEG, opt_parse_serialize },
1406+
{ OPTION_CALLBACK, 0, "deserialize", NULL,
1407+
N_("path"), N_("deserialize raw status data from file"),
1408+
PARSE_OPT_OPTARG, opt_parse_deserialize },
13261409
OPT_SET_INT(0, "long", &status_format,
13271410
N_("show status in long format (default)"),
13281411
STATUS_FORMAT_LONG),
@@ -1383,10 +1466,26 @@ int cmd_status(int argc, const char **argv, const char *prefix)
13831466
s.show_untracked_files == SHOW_NO_UNTRACKED_FILES)
13841467
die(_("Unsupported combination of ignored and untracked-files arguments"));
13851468

1469+
if (s.show_untracked_files == SHOW_COMPLETE_UNTRACKED_FILES &&
1470+
s.show_ignored_mode == SHOW_NO_IGNORED)
1471+
die(_("Complete Untracked only supported with ignored files"));
1472+
13861473
parse_pathspec(&s.pathspec, 0,
13871474
PATHSPEC_PREFER_FULL,
13881475
prefix, argv);
13891476

1477+
/*
1478+
* If we want to try to deserialize status data from a cache file,
1479+
* we need to re-order the initialization code. The problem is that
1480+
* this makes for a very nasty diff and causes merge conflicts as we
1481+
* carry it forward. And it easy to mess up the merge, so we
1482+
* duplicate some code here to hopefully reduce conflicts.
1483+
*/
1484+
try_deserialize = (!do_serialize &&
1485+
(do_implicit_deserialize || do_explicit_deserialize));
1486+
if (try_deserialize)
1487+
goto skip_init;
1488+
13901489
enable_fscache(0);
13911490
if (status_format != STATUS_FORMAT_PORCELAIN &&
13921491
status_format != STATUS_FORMAT_PORCELAIN_V2)
@@ -1401,6 +1500,7 @@ int cmd_status(int argc, const char **argv, const char *prefix)
14011500
else
14021501
fd = -1;
14031502

1503+
skip_init:
14041504
s.is_initial = get_oid(s.reference, &oid) ? 1 : 0;
14051505
if (!s.is_initial)
14061506
hashcpy(s.sha1_commit, oid.hash);
@@ -1417,6 +1517,24 @@ int cmd_status(int argc, const char **argv, const char *prefix)
14171517
s.rename_score = parse_rename_score(&rename_score_arg);
14181518
}
14191519

1520+
if (try_deserialize) {
1521+
if (s.relative_paths)
1522+
s.prefix = prefix;
1523+
1524+
if (wt_status_deserialize(&s, deserialize_path) == DESERIALIZE_OK)
1525+
return 0;
1526+
1527+
/* deserialize failed, so force the initialization we skipped above. */
1528+
enable_fscache(1);
1529+
read_cache_preload(&s.pathspec);
1530+
refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, &s.pathspec, NULL, NULL);
1531+
1532+
if (use_optional_locks())
1533+
fd = hold_locked_index(&index_lock, 0);
1534+
else
1535+
fd = -1;
1536+
}
1537+
14201538
wt_status_collect(&s);
14211539

14221540
if (0 <= fd)

pkt-line.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
185185
return status;
186186
}
187187

188-
static int packet_write_gently(const int fd_out, const char *buf, size_t size)
188+
int packet_write_gently(const int fd_out, const char *buf, size_t size)
189189
{
190190
static char packet_write_buffer[LARGE_PACKET_MAX];
191191
size_t packet_size;

pkt-line.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((f
3030
void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len);
3131
int packet_flush_gently(int fd);
3232
int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
33+
int packet_write_gently(const int fd_out, const char *buf, size_t size);
3334
int write_packetized_from_fd(int fd_in, int fd_out);
3435
int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
3536

0 commit comments

Comments
 (0)