Skip to content

Commit 4135e3a

Browse files
committed
Merge pull request #293: Merge in current ds/maintenance-part-3
This includes all changes from #292, but then also `ds/maintenance-part-3` from upstream. This is _not_ the final maintenance builtin, but is very close. It's time to start making full updates in Scalar that depend on them.
2 parents 606ed55 + 1bbf20e commit 4135e3a

22 files changed

Lines changed: 1307 additions & 28 deletions

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
/git-filter-branch
6868
/git-fmt-merge-msg
6969
/git-for-each-ref
70+
/git-for-each-repo
7071
/git-format-patch
7172
/git-fsck
7273
/git-fsck-objects

Documentation/config/core.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -627,8 +627,8 @@ core.useReplaceRefs::
627627

628628
core.multiPackIndex::
629629
Use the multi-pack-index file to track multiple packfiles using a
630-
single index. See link:technical/multi-pack-index.html[the
631-
multi-pack-index design document].
630+
single index. See linkgit:git-multi-pack-index[1] for more
631+
information. Defaults to true.
632632

633633
core.gvfs::
634634
Enable the features needed for GVFS. This value can be set to true

Documentation/config/maintenance.txt

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,20 @@
1+
maintenance.auto::
2+
This boolean config option controls whether some commands run
3+
`git maintenance run --auto` after doing their normal work. Defaults
4+
to true.
5+
16
maintenance.<task>.enabled::
27
This boolean config option controls whether the maintenance task
38
with name `<task>` is run when no `--task` option is specified to
49
`git maintenance run`. These config values are ignored if a
510
`--task` option exists. By default, only `maintenance.gc.enabled`
611
is true.
712

13+
maintenance.<task>.schedule::
14+
This config option controls whether or not the given `<task>` runs
15+
during a `git maintenance run --schedule=<frequency>` command. The
16+
value must be one of "hourly", "daily", or "weekly".
17+
818
maintenance.commit-graph.auto::
919
This integer config option controls how often the `commit-graph` task
1020
should be run as part of `git maintenance run --auto`. If zero, then
@@ -14,3 +24,21 @@ maintenance.commit-graph.auto::
1424
reachable commits that are not in the commit-graph file is at least
1525
the value of `maintenance.commit-graph.auto`. The default value is
1626
100.
27+
28+
maintenance.loose-objects.auto::
29+
This integer config option controls how often the `loose-objects` task
30+
should be run as part of `git maintenance run --auto`. If zero, then
31+
the `loose-objects` task will not run with the `--auto` option. A
32+
negative value will force the task to run every time. Otherwise, a
33+
positive value implies the command should run when the number of
34+
loose objects is at least the value of `maintenance.loose-objects.auto`.
35+
The default value is 100.
36+
37+
maintenance.incremental-repack.auto::
38+
This integer config option controls how often the `incremental-repack`
39+
task should be run as part of `git maintenance run --auto`. If zero,
40+
then the `incremental-repack` task will not run with the `--auto`
41+
option. A negative value will force the task to run every time.
42+
Otherwise, a positive value implies the command should run when the
43+
number of pack-files not in the multi-pack-index is at least the value
44+
of `maintenance.incremental-repack.auto`. The default value is 10.
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
git-for-each-repo(1)
2+
====================
3+
4+
NAME
5+
----
6+
git-for-each-repo - Run a Git command on a list of repositories
7+
8+
9+
SYNOPSIS
10+
--------
11+
[verse]
12+
'git for-each-repo' --config=<config> [--] <arguments>
13+
14+
15+
DESCRIPTION
16+
-----------
17+
Run a Git command on a list of repositories. The arguments after the
18+
known options or `--` indicator are used as the arguments for the Git
19+
subprocess.
20+
21+
THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
22+
23+
For example, we could run maintenance on each of a list of repositories
24+
stored in a `maintenance.repo` config variable using
25+
26+
-------------
27+
git for-each-repo --config=maintenance.repo maintenance run
28+
-------------
29+
30+
This will run `git -C <repo> maintenance run` for each value `<repo>`
31+
in the multi-valued config variable `maintenance.repo`.
32+
33+
34+
OPTIONS
35+
-------
36+
--config=<config>::
37+
Use the given config variable as a multi-valued list storing
38+
absolute path names. Iterate on that list of paths to run
39+
the given arguments.
40+
+
41+
These config values are loaded from system, global, and local Git config,
42+
as available. If `git for-each-repo` is run in a directory that is not a
43+
Git repository, then only the system and global config is used.
44+
45+
46+
SUBPROCESS BEHAVIOR
47+
-------------------
48+
49+
If any `git -C <repo> <arguments>` subprocess returns a non-zero exit code,
50+
then the `git for-each-repo` process returns that exit code without running
51+
more subprocesses.
52+
53+
Each `git -C <repo> <arguments>` subprocess inherits the standard file
54+
descriptors `stdin`, `stdout`, and `stderr`.
55+
56+
57+
GIT
58+
---
59+
Part of the linkgit:git[1] suite

Documentation/git-maintenance.txt

Lines changed: 144 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,53 @@ Git repository.
2929
SUBCOMMANDS
3030
-----------
3131

32+
register::
33+
Initialize Git config values so any scheduled maintenance will
34+
start running on this repository. This adds the repository to the
35+
`maintenance.repo` config variable in the current user's global
36+
config and enables some recommended configuration values for
37+
`maintenance.<task>.schedule`. The tasks that are enabled are safe
38+
for running in the background without disrupting foreground
39+
processes.
40+
+
41+
If your repository has no `maintenance.<task>.schedule` configuration
42+
values set, then Git will use a recommended default schedule that performs
43+
background maintenance that will not interrupt foreground commands. The
44+
default schedule is as follows:
45+
+
46+
* `gc`: disabled.
47+
* `commit-graph`: hourly.
48+
* `prefetch`: hourly.
49+
* `loose-objects`: daily.
50+
* `incremental-repack`: daily.
51+
+
52+
`git maintenance register` will also disable foreground maintenance by
53+
setting `maintenance.auto = false` in the current repository. This config
54+
setting will remain after a `git maintenance unregister` command.
55+
3256
run::
3357
Run one or more maintenance tasks. If one or more `--task` options
3458
are specified, then those tasks are run in that order. Otherwise,
3559
the tasks are determined by which `maintenance.<task>.enabled`
3660
config options are true. By default, only `maintenance.gc.enabled`
3761
is true.
3862

63+
start::
64+
Start running maintenance on the current repository. This performs
65+
the same config updates as the `register` subcommand, then updates
66+
the background scheduler to run `git maintenance run --scheduled`
67+
on an hourly basis.
68+
69+
stop::
70+
Halt the background maintenance schedule. The current repository
71+
is not removed from the list of maintained repositories, in case
72+
the background maintenance is restarted later.
73+
74+
unregister::
75+
Remove the current repository from background maintenance. This
76+
only removes the repository from the configured list. It does not
77+
stop the background maintenance processes from running.
78+
3979
TASKS
4080
-----
4181

@@ -47,6 +87,21 @@ commit-graph::
4787
`commit-graph-chain` file. They will be deleted by a later run based
4888
on the expiration delay.
4989

90+
prefetch::
91+
The `prefetch` task updates the object directory with the latest
92+
objects from all registered remotes. For each remote, a `git fetch`
93+
command is run. The refmap is custom to avoid updating local or remote
94+
branches (those in `refs/heads` or `refs/remotes`). Instead, the
95+
remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are
96+
not updated.
97+
+
98+
This is done to avoid disrupting the remote-tracking branches. The end users
99+
expect these refs to stay unmoved unless they initiate a fetch. With prefetch
100+
task, however, the objects necessary to complete a later real fetch would
101+
already be obtained, so the real fetch would go faster. In the ideal case,
102+
it will just become an update to bunch of remote-tracking branches without
103+
any object transfer.
104+
50105
gc::
51106
Clean up unnecessary files and optimize the local repository. "GC"
52107
stands for "garbage collection," but this task performs many
@@ -55,14 +110,58 @@ gc::
55110
be disruptive in some situations, as it deletes stale data. See
56111
linkgit:git-gc[1] for more details on garbage collection in Git.
57112

113+
loose-objects::
114+
The `loose-objects` job cleans up loose objects and places them into
115+
pack-files. In order to prevent race conditions with concurrent Git
116+
commands, it follows a two-step process. First, it deletes any loose
117+
objects that already exist in a pack-file; concurrent Git processes
118+
will examine the pack-file for the object data instead of the loose
119+
object. Second, it creates a new pack-file (starting with "loose-")
120+
containing a batch of loose objects. The batch size is limited to 50
121+
thousand objects to prevent the job from taking too long on a
122+
repository with many loose objects. The `gc` task writes unreachable
123+
objects as loose objects to be cleaned up by a later step only if
124+
they are not re-added to a pack-file; for this reason it is not
125+
advisable to enable both the `loose-objects` and `gc` tasks at the
126+
same time.
127+
128+
incremental-repack::
129+
The `incremental-repack` job repacks the object directory
130+
using the `multi-pack-index` feature. In order to prevent race
131+
conditions with concurrent Git commands, it follows a two-step
132+
process. First, it calls `git multi-pack-index expire` to delete
133+
pack-files unreferenced by the `multi-pack-index` file. Second, it
134+
calls `git multi-pack-index repack` to select several small
135+
pack-files and repack them into a bigger one, and then update the
136+
`multi-pack-index` entries that refer to the small pack-files to
137+
refer to the new pack-file. This prepares those small pack-files
138+
for deletion upon the next run of `git multi-pack-index expire`.
139+
The selection of the small pack-files is such that the expected
140+
size of the big pack-file is at least the batch size; see the
141+
`--batch-size` option for the `repack` subcommand in
142+
linkgit:git-multi-pack-index[1]. The default batch-size is zero,
143+
which is a special case that attempts to repack all pack-files
144+
into a single pack-file.
145+
58146
OPTIONS
59147
-------
60148
--auto::
61149
When combined with the `run` subcommand, run maintenance tasks
62150
only if certain thresholds are met. For example, the `gc` task
63151
runs when the number of loose objects exceeds the number stored
64152
in the `gc.auto` config setting, or when the number of pack-files
65-
exceeds the `gc.autoPackLimit` config setting.
153+
exceeds the `gc.autoPackLimit` config setting. Not compatible with
154+
the `--schedule` option.
155+
156+
--schedule::
157+
When combined with the `run` subcommand, run maintenance tasks
158+
only if certain time conditions are met, as specified by the
159+
`maintenance.<task>.schedule` config value for each `<task>`.
160+
This config value specifies a number of seconds since the last
161+
time that task ran, according to the `maintenance.<task>.lastRun`
162+
config value. The tasks that are tested are those provided by
163+
the `--task=<task>` option(s) or those with
164+
`maintenance.<task>.enabled` set to true.
66165

67166
--quiet::
68167
Do not report progress or other information over `stderr`.
@@ -74,6 +173,50 @@ OPTIONS
74173
`maintenance.<task>.enabled` configured as `true` are considered.
75174
See the 'TASKS' section for the list of accepted `<task>` values.
76175

176+
177+
TROUBLESHOOTING
178+
---------------
179+
The `git maintenance` command is designed to simplify the repository
180+
maintenance patterns while minimizing user wait time during Git commands.
181+
A variety of configuration options are available to allow customizing this
182+
process. The default maintenance options focus on operations that complete
183+
quickly, even on large repositories.
184+
185+
Users may find some cases where scheduled maintenance tasks do not run as
186+
frequently as intended. Each `git maintenance run` command takes a lock on
187+
the repository's object database, and this prevents other concurrent
188+
`git maintenance run` commands from running on the same repository. Without
189+
this safeguard, competing processes could leave the repository in an
190+
unpredictable state.
191+
192+
The background maintenance schedule runs `git maintenance run` processes
193+
on an hourly basis. Each run executes the "hourly" tasks. At midnight,
194+
that process also executes the "daily" tasks. At midnight on the first day
195+
of the week, that process also executes the "weekly" tasks. A single
196+
process iterates over each registered repository, performing the scheduled
197+
tasks for that frequency. Depending on the number of registered
198+
repositories and their sizes, this process may take longer than an hour.
199+
In this case, multiple `git maintenance run` commands may run on the same
200+
repository at the same time, colliding on the object database lock. This
201+
results in one of the two tasks not running.
202+
203+
If you find that some maintenance windows are taking longer than one hour
204+
to complete, then consider reducing the complexity of your maintenance
205+
tasks. For example, the `gc` task is much slower than the
206+
`incremental-repack` task. However, this comes at a cost of a slightly
207+
larger object database. Consider moving more expensive tasks to be run
208+
less frequently.
209+
210+
Expert users may consider scheduling their own maintenance tasks using a
211+
different schedule than is available through `git maintenance start` and
212+
Git configuration options. These users should be aware of the object
213+
database lock and how concurrent `git maintenance run` commands behave.
214+
Further, the `git gc` command should not be combined with
215+
`git maintenance run` commands. `git gc` modifies the object database
216+
but does not take the lock in the same way as `git maintenance run`. If
217+
possible, use `git maintenance run --task=gc` instead of `git gc`.
218+
219+
77220
GIT
78221
---
79222
Part of the linkgit:git[1] suite

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -695,6 +695,7 @@ TEST_BUILTINS_OBJS += test-bloom.o
695695
TEST_BUILTINS_OBJS += test-chmtime.o
696696
TEST_BUILTINS_OBJS += test-cmp.o
697697
TEST_BUILTINS_OBJS += test-config.o
698+
TEST_BUILTINS_OBJS += test-crontab.o
698699
TEST_BUILTINS_OBJS += test-ctype.o
699700
TEST_BUILTINS_OBJS += test-date.o
700701
TEST_BUILTINS_OBJS += test-delta.o
@@ -1096,6 +1097,7 @@ BUILTIN_OBJS += builtin/fetch-pack.o
10961097
BUILTIN_OBJS += builtin/fetch.o
10971098
BUILTIN_OBJS += builtin/fmt-merge-msg.o
10981099
BUILTIN_OBJS += builtin/for-each-ref.o
1100+
BUILTIN_OBJS += builtin/for-each-repo.o
10991101
BUILTIN_OBJS += builtin/fsck.o
11001102
BUILTIN_OBJS += builtin/gc.o
11011103
BUILTIN_OBJS += builtin/get-tar-commit-id.o

builtin.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix);
155155
int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
156156
int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
157157
int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
158+
int cmd_for_each_repo(int argc, const char **argv, const char *prefix);
158159
int cmd_format_patch(int argc, const char **argv, const char *prefix);
159160
int cmd_fsck(int argc, const char **argv, const char *prefix);
160161
int cmd_gc(int argc, const char **argv, const char *prefix);

builtin/for-each-repo.c

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
#include "cache.h"
2+
#include "config.h"
3+
#include "builtin.h"
4+
#include "parse-options.h"
5+
#include "run-command.h"
6+
#include "string-list.h"
7+
8+
static const char * const for_each_repo_usage[] = {
9+
N_("git for-each-repo --config=<config> <command-args>"),
10+
NULL
11+
};
12+
13+
static int run_command_on_repo(const char *path,
14+
void *cbdata)
15+
{
16+
int i;
17+
struct child_process child = CHILD_PROCESS_INIT;
18+
struct strvec *args = (struct strvec *)cbdata;
19+
20+
child.git_cmd = 1;
21+
strvec_pushl(&child.args, "-C", path, NULL);
22+
23+
for (i = 0; i < args->nr; i++)
24+
strvec_push(&child.args, args->v[i]);
25+
26+
return run_command(&child);
27+
}
28+
29+
int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
30+
{
31+
static const char *config_key = NULL;
32+
int i, result = 0;
33+
const struct string_list *values;
34+
struct strvec args = STRVEC_INIT;
35+
36+
const struct option options[] = {
37+
OPT_STRING(0, "config", &config_key, N_("config"),
38+
N_("config key storing a list of repository paths")),
39+
OPT_END()
40+
};
41+
42+
argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
43+
PARSE_OPT_STOP_AT_NON_OPTION);
44+
45+
if (!config_key)
46+
die(_("missing --config=<config>"));
47+
48+
for (i = 0; i < argc; i++)
49+
strvec_push(&args, argv[i]);
50+
51+
values = repo_config_get_value_multi(the_repository,
52+
config_key);
53+
54+
for (i = 0; !result && i < values->nr; i++)
55+
result = run_command_on_repo(values->items[i].string, &args);
56+
57+
return result;
58+
}

0 commit comments

Comments
 (0)