Skip to content

[RFC] Create 'core.featureAdoptionRate' setting to update config defaults #254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Jun 3, 2019

Here is a second run at this RFC, which aims to create a "meta" config setting that automatically turns on other settings according to a user's willingness to trade new Git behavior or new feature risk for performance benefits. The new name for the setting is "core.featureAdoptionRate" and is an integer scale from 0 to 10. There will be multiple "categories" of settings, and the intention is to allow more granular levels as necessary.

The first category is "3 or higher" which means that the user is willing to adopt features that have been tested in multiple major releases. The settings to include here are core.commitGraph=true, gc.writeCommitGraph=true, and index.version=4.

The second category is "5 or higher" which means the user is willing to adopt features that have not been out for multiple major releases. The setting included here is pack.useSparse=true.

In the future, I would add a "7 or higher" setting which means the user is willing to have a change of behavior in exchange for performance benefits. The two settings to place here are 'status.aheadBehind=false' and 'fetch.showForcedUpdates=false'. Instead of including these settings in the current series, I've submitted them independently for full review [1, 2].

Hopefully this direction is amenable to allow "early adopters" gain access to new performance features even if they are not necessary reading every line of the release notes.

Thanks,
-Stolee

[1] https://public-inbox.org/git/[email protected]/

[2] https://public-inbox.org/git/[email protected]/

Cc: [email protected], [email protected]

@derrickstolee derrickstolee force-pushed the config-large/upstream branch from 28cbf45 to d4ff987 Compare June 3, 2019 20:12
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2019

Submitted as [email protected]

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 6/3/2019 4:18 PM, Derrick Stolee via GitGitGadget wrote:
>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>     'large' as a value. 

I do want to point out that this "core.size=large" option is probably a
terrible name and could easily be replaced with something better. Please
consider alternatives that better describe the goals at hand (helping users
get performance boosts on upgrade without needing to pay close attention).

Thanks,
-Stolee

@@ -577,8 +577,9 @@ the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1].

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 6/3/2019 4:18 PM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <[email protected]>
> 
> Several advanced config settings are highly recommended for clients
> using large repositories. Power users learn these one-by-one and
> enable them as they see fit. This could be made simpler, to allow
> more users to have access to these almost-always beneficial features
> (and more beneficial in larger repos).
> 
> Create a 'repo.size' config setting whose only accepted value is
> 'large'. When a repo.size=large is given, change the default values
> of some config settings. If the setting is given explicitly, then
> take the explicit value.
> 
> This change adds these two defaults to the repo.size=large setting:
> 
>   * core.commitGraph=true
>   * gc.writeCommitGraph=true
> 
> To centralize these config options and properly set the defaults,
> create a repo_settings that contains chars for each config variable.
> Use -1 as "unset", with 0 for false and 1 for true.
> 
> The prepare_repo_settings() method ensures that this settings
> struct has been initialized, and avoids double-scanning the config
> settings.
> 
> Signed-off-by: Derrick Stolee <[email protected]>
[...]
> diff --git a/repo-settings.c b/repo-settings.c
> new file mode 100644
> index 0000000000..6f5e18d92e
> --- /dev/null
> +++ b/repo-settings.c
> @@ -0,0 +1,44 @@
> +#include "cache.h"
> +#include "repository.h"
> +#include "config.h"
> +#include "repo-settings.h"
> +
> +
> +#define UPDATE_DEFAULT(s,v) if (s != -1) { s = v; }

We should guard this with a "do { ... } while (0)"

> +
> +static int git_repo_config(const char *key, const char *value, void *cb)
> +{
> +	struct repo_settings *rs = (struct repo_settings *)cb;
> +
> +	if (!strcmp(key, "core.size")) {
> +		if (!strcmp(value, "large")) {
> +			UPDATE_DEFAULT(rs->core_commit_graph, 1);
> +			UPDATE_DEFAULT(rs->gc_write_commit_graph, 1);
> +		}
> +		return 0;
> +	}
> +	if (!strcmp(key, "core.commitgraph")) {
> +		rs->core_commit_graph = git_config_bool(key, value);
> +		return 0;
> +	}
> +	if (!strcmp(key, "gc.writecommitgraph")) {
> +		rs->gc_write_commit_graph = git_config_bool(key, value);
> +		return 0;
> +	}
> +
> +	return 1;
> +}
[...]

Jeff

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 4, 2019

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Stolee,

On Mon, 3 Jun 2019, Derrick Stolee via GitGitGadget wrote:

>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>     'large' as a value. This enables several config values that are
>     beneficial for large repos.

I find `core.size` a bit non-descriptive. Maybe `repository.size` instead?

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 4, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 6/4/2019 10:43 AM, Johannes Schindelin wrote:
> Hi Stolee,
> 
> On Mon, 3 Jun 2019, Derrick Stolee via GitGitGadget wrote:
> 
>>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>>     'large' as a value. This enables several config values that are
>>     beneficial for large repos.
> 
> I find `core.size` a bit non-descriptive. Maybe `repository.size` instead?

Thanks for the suggestion! If the "repository." doesn't make sense as a top-
level category, then maybe "core.repositorySize" would work?

A thought I had overnight that may broaden our options would be to think of
this as a tolerance for experimental features. Maybe "core.adoptionRing" with
options for "slow" and "fast", where "slow" takes things that have been cooking
a long while (index.version=4, core.commitGraph=gc.writeCommitGraph=1) and the
"fast" option gets all of those values plus the more experimental options
(status.aheadBehind=false, fetch.showForcedUpdates=false).

Alternate names with this slow/fast idea could be:

* core.experimentTolerance={none,low,high}

* core.autoConfig={none,some,all}

Hopefully these options can trigger some creativity to decide on a good
name that an experienced Git user could understand.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2019

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> This patch series includes a few new config options we created to speed up
> certain critical commands in VFS for Git. On their own, they would
> contribute little value as it is hard to discover new config variables.
> Instead, I've created this RFC as a goal for probably three sequential patch
> series:
>
>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>     'large' as a value. This enables several config values that are
>     beneficial for large repos. We use a certain set in VFS for Git (see
>     [1]), and most of those are applicable to any repo. This 'core.size'
>     setting is intended for users to automatically receive performance
>     updates as soon as they are stable, but they must opt-in to the setting
>     and can always explicitly set their own config values. The settings to
>     include here are core.commitGraph=true, gc.writeCommitGraph=true,
>     index.version=4, pack.useSparse=true.

... and not the configuration introduced by the other two points in
this list?

"If you set this, these other configuration variables are set to
these default values" is a very valuable usability feature.  It
looks a lot more "meta" or "macro", and certainly is not a good idea
to call it as if it sits next to variables in any existing hierarchy.

I also wonder if this is something we would want to support in
general; random things that come to mind are:

 - should such a "macro" configuration be limited to boolean
   (e.g. the above core.size that takes 'large' is a boolean between
   'large' and 'not large'), or can it be an enum (e.g. choose among
   'large', 'medium' and 'small', and core.bigFileThreshold will be
   set to 1G, 512M and 128M respectively---this silly example is for
   illustration purposes only), and if so, can we express what these
   default values are for each choice without writing a lot of code?

 - if we were to have more than just this 'core.size' macro, can two
   otherwise orthogonal macros both control the same underlying
   variable, and if so, how do we express their interactions?
   "using these two at the same time is forbidden" is a perfectly
   acceptable answer for the first round until we figure out the
   desired semantics, of course.

 - perhaps we may eventually want to allow end users (via their
   ~/.gitconfig) and system administrators (via /etc/gitconfig)
   define such a macro setting (e.g. setting macro.largeRepoSetting
   sets pack.usebitmaps=true, pack.useSpars=true, etc.) *after* we
   figure out what we want to do to the other points in this list.

 - even if we do not allow end users and system administrators futz
   with custom macros, can we specify the macros we ship without
   casting them in code?

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 6/5/2019 4:39 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> 
>> This patch series includes a few new config options we created to speed up
>> certain critical commands in VFS for Git. On their own, they would
>> contribute little value as it is hard to discover new config variables.
>> Instead, I've created this RFC as a goal for probably three sequential patch
>> series:
>>
>>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>>     'large' as a value. This enables several config values that are
>>     beneficial for large repos. We use a certain set in VFS for Git (see
>>     [1]), and most of those are applicable to any repo. This 'core.size'
>>     setting is intended for users to automatically receive performance
>>     updates as soon as they are stable, but they must opt-in to the setting
>>     and can always explicitly set their own config values. The settings to
>>     include here are core.commitGraph=true, gc.writeCommitGraph=true,
>>     index.version=4, pack.useSparse=true.
> 
> ... and not the configuration introduced by the other two points in
> this list?

They are added to the config setting after they are introduced. See
patches 7 (status.aheadBehind) and 11 (fetch.showForcedUpdates).

> "If you set this, these other configuration variables are set to
> these default values" is a very valuable usability feature.  It
> looks a lot more "meta" or "macro", and certainly is not a good idea
> to call it as if it sits next to variables in any existing hierarchy.
> 
> I also wonder if this is something we would want to support in
> general; random things that come to mind are:
> 
>  - should such a "macro" configuration be limited to boolean
>    (e.g. the above core.size that takes 'large' is a boolean between
>    'large' and 'not large'), or can it be an enum (e.g. choose among
>    'large', 'medium' and 'small', and core.bigFileThreshold will be
>    set to 1G, 512M and 128M respectively---this silly example is for
>    illustration purposes only), and if so, can we express what these
>    default values are for each choice without writing a lot of code?

That's a good point that we could include recommended values for
other non-boolean variables if our "meta" config setting is also
non-boolean. This fits in with the "ring" ideas discussed earlier [1].
Taking in a few ideas from your message, perhaps we create a new "meta"
category for this setting and use an integer value for "how big do I
think my repo is?" and we can apply different settings based on thresholds:

 0: no config defaults changed
 3: safe defaults (core.commitGraph, index.version=4)
 6: behavior-modifying defaults (status.aheadBehind, fetch.noShowForcedUpdates)

Using 3 and 6 here to allow for finer gradients at a later date.

[1] https://public-inbox.org/git/[email protected]/T/#m8dbaedc016ce7301b9d80e5ceb6a82edfa7bafac

>  - if we were to have more than just this 'core.size' macro, can two
>    otherwise orthogonal macros both control the same underlying
>    variable, and if so, how do we express their interactions?
>    "using these two at the same time is forbidden" is a perfectly
>    acceptable answer for the first round until we figure out the
>    desired semantics, of course.

To borrow from linear algebra, I would recommend that two orthogonal
config settings have disjoint _bases_ (i.e. the set of config settings
they use are disjoint). Of course, this can be discussed in more
detail when someone suggests a second meta-config setting. Such a
second setting would need justification for why it doesn't work with
our first setting.
 
>  - perhaps we may eventually want to allow end users (via their
>    ~/.gitconfig) and system administrators (via /etc/gitconfig)
>    define such a macro setting (e.g. setting macro.largeRepoSetting
>    sets pack.usebitmaps=true, pack.useSpars=true, etc.) *after* we
>    figure out what we want to do to the other points in this list.
>
>  - even if we do not allow end users and system administrators futz
>    with custom macros, can we specify the macros we ship without
>    casting them in code?

Are you suggesting that we allow some config values to be pulled from
the repo contents? If we could identify some config options as "safe"
to include in the Git data, then a repo administrator could commit a
"/.gitconfig" file _and_ some existing config option says "look at the
config in the repo".

I see value in making some "safe" settings available in the repo, but
also see that it can be very tricky to get right. Further, I think it
is independent of the current direction. In fact, I would imagine the
meta-config setting be one of the "safe" settings that we could put in
this committed config file.

Thanks,
-Stolee
 

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2019

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <[email protected]> writes:

>>  - perhaps we may eventually want to allow end users (via their
>>    ~/.gitconfig) and system administrators (via /etc/gitconfig)
>>    define such a macro setting (e.g. setting macro.largeRepoSetting
>>    sets pack.usebitmaps=true, pack.useSpars=true, etc.) *after* we
>>    figure out what we want to do to the other points in this list.
>>
>>  - even if we do not allow end users and system administrators futz
>>    with custom macros, can we specify the macros we ship without
>>    casting them in code?
>
> Are you suggesting that we allow some config values to be pulled from
> the repo contents?

Not at all.  As far as the configuration is concerned, what project
ships is tainted data that should not be used blindly.

What I had in mind is parallel to the idea of pushing "static struct
userdiff_driver builtin_drivers[]" out of the compiled-in code and
instead have a text file shipped in /usr/share/git/ somewhere.  So,
instead of having "core.size==large means these other four variables
are set to these values" in the code, we invent a general mechanism
to read such "macro" specification out of a text file, and that
would be the only code change---the specific "core.size==large
affects X, Y and Z" would not be in the code, but would be in the
text file we ship and read by the mechanism.

If the list of allowed "meta" configuration variables and the
configuration variables whose default each of them affects can be
expressed in our usual ".gitconfig" file format, then the system
administrators can add their own in /etc/gitconfig, too, to help
their users.

That is what I meant by the last item.  Note that I was "wondering
if it makes sense" and what I wrote above in this message is merely
clarifying what I meant---I am not making further/more arguments to
claim it is a good idea (at least not yet).

Thanks.

@derrickstolee derrickstolee force-pushed the config-large/upstream branch from d4ff987 to 4abb634 Compare June 19, 2019 13:50
@derrickstolee derrickstolee changed the title [RFC] Create 'core.size=large' setting to update config defaults [RFC] Create 'core.featureAdoptionRate' setting to update config defaults Jun 19, 2019
@derrickstolee derrickstolee force-pushed the config-large/upstream branch from 4abb634 to 5bba906 Compare June 19, 2019 14:12
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 19, 2019

Submitted as [email protected]

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 20, 2019

This branch is now known as ds/early-access.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 20, 2019

This patch series was integrated into pu via git@5956c60.

@gitgitgadget gitgitgadget bot added the pu label Jun 20, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 21, 2019

This patch series was integrated into pu via git@824356a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 24, 2019

This patch series was integrated into pu via git@e8a49a8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 25, 2019

This patch series was integrated into pu via git@08638cf.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 26, 2019

This patch series was integrated into pu via git@426fac7.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 26, 2019

This patch series was integrated into pu via git@35c55df.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 26, 2019

This patch series was integrated into pu via git@0f9ba5f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2019

This patch series was integrated into pu via git@76c9e34.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@612dc94.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@489feba.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 28, 2019

This patch series was integrated into pu via git@ae81f0a.

If a repo is large, it likely has many paths in its working directory.
This means the index could be compressed using version 4. Set this as
a default when core.featureAdoptionRate is at least three.

Since the index version is written to a file, this is an excellent
opportunity to test that the config settings are working correctly
with the different precedence rules. Adapt a test from t1600-index.sh
to verify the version is set properly with different values of
index.version config, core.featureAdoptionRate, and GIT_INDEX_VERSION.

Signed-off-by: Derrick Stolee <[email protected]>
If a repo is large, then it probably has a very large working
directory. In this case, a typical developer's edits usually impact
many fewer paths than the full path set. The sparse treewalk
algorithm is optimized for this case, speeding up 'git push' calls.

Use pack.useSparse=true when core.featureAdoptionRate is at least
five. This is the first setting where the feature has only been
out for a single major version. This could be moved to the "at
least three" category after another major version.

Signed-off-by: Derrick Stolee <[email protected]>
@derrickstolee derrickstolee force-pushed the config-large/upstream branch from 5bba906 to d080065 Compare July 1, 2019 13:12
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2019

Submitted as [email protected]

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2019

This patch series was integrated into pu via git@7ffae11.

@@ -577,7 +577,8 @@ the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1].

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Carlo Arenas wrote (reply to this):

On Mon, Jul 1, 2019 at 8:32 AM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> To centralize these config options and properly set the defaults,
> create a repo_settings that contains chars for each config variable.
> Use -1 as "unset", with 0 for false and 1 for true.

minor nitpick that hopefully Junio can fix: s/chars/ints

> +* `gc.writeCommitGraph=true` eneables writing commit-graph files during

typo: s/eneables/enable

Carlo

@@ -577,7 +577,8 @@ the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1].

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Duy Nguyen wrote (reply to this):

On Mon, Jul 1, 2019 at 10:32 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
> @@ -601,3 +602,22 @@ core.abbrev::
>         in your repository, which hopefully is enough for
>         abbreviated object names to stay unique for some time.
>         The minimum length is 4.
> +
> +core.featureAdoptionRate::
> +       Set an integer value on a scale from 0 to 10 describing your
> +       desire to adopt new performance features. Defaults to 0. As
> +       the value increases, features are enabled by changing the
> +       default values of other config settings. If a config variable
> +       is specified explicitly, the explicit value will override these
> +       defaults:

This is because I'd like to keep core.* from growing too big (it's
already big), hard to read, search and maintain. Perhaps this should
belong to a separate group? Something like tuning.something or
defaults.something.

> +If the value is at least 3, then the following defaults are modified.
> +These represent relatively new features that have existed for multiple
> +major releases, and may present performance benefits. These benefits
> +depend on the amount and kind of data in your repo and how you use it.

Then instead of numeric values, maybe the user should write some sort
description about the repo and we optimize for that, similar to gcc
-Os optimized for size, -Ofast for compiler speed (-O<n> is all about
execution speed).

We could write, for example, tuning.commitHistory = {small, medium,
large} and tuning.worktree = {small, large, medium} and maybe
tuning.refSize and use that to optimize. We can still have different
optimization levels (probably just "none", "recommended" vs
"aggressive" where agressive enables most new stuff),
-- 
Duy

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Jul 02 2019, Duy Nguyen wrote:

> On Mon, Jul 1, 2019 at 10:32 PM Derrick Stolee via GitGitGadget
> <[email protected]> wrote:
>> @@ -601,3 +602,22 @@ core.abbrev::
>>         in your repository, which hopefully is enough for
>>         abbreviated object names to stay unique for some time.
>>         The minimum length is 4.
>> +
>> +core.featureAdoptionRate::
>> +       Set an integer value on a scale from 0 to 10 describing your
>> +       desire to adopt new performance features. Defaults to 0. As
>> +       the value increases, features are enabled by changing the
>> +       default values of other config settings. If a config variable
>> +       is specified explicitly, the explicit value will override these
>> +       defaults:
>
> This is because I'd like to keep core.* from growing too big (it's
> already big), hard to read, search and maintain. Perhaps this should
> belong to a separate group? Something like tuning.something or
> defaults.something.

The main thing users look at is "man git-config" (or its web rendering)
which renders it all in one page anyway.

I think in general adding more things to core.* sucks less than
explaining the special-case that "tuning.*" isn't a config for
git-tuning(1) (although we have some of that already, e.g. with
trace2.*).

Documentation/config/core.txt is ~600 lines. Maybe it would be a good
idea to split it up, similar to your split of
Documentation/config/*.txt, but let's not conflate how we'd like to
maintain stuff in git.git with a config interface we expose externally.

It's going to be very confusing for users if some settings that
otherwise would be in core aren't there because a file in git.git was
"too big" at the time. Users (mostly) aren't going to know/care in what
chronological order we added config keys.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jakub Narebski wrote (reply to this):

Duy Nguyen <[email protected]> writes:
> On Mon, Jul 1, 2019 at 10:32 PM Derrick Stolee via GitGitGadget
> <[email protected]> wrote:
>> @@ -601,3 +602,22 @@ core.abbrev::
>>         in your repository, which hopefully is enough for
>>         abbreviated object names to stay unique for some time.
>>         The minimum length is 4.
>> +
>> +core.featureAdoptionRate::
>> +       Set an integer value on a scale from 0 to 10 describing your
>> +       desire to adopt new performance features. Defaults to 0. As
>> +       the value increases, features are enabled by changing the
>> +       default values of other config settings. If a config variable
>> +       is specified explicitly, the explicit value will override these
>> +       defaults:
>
> This is because I'd like to keep core.* from growing too big (it's
> already big), hard to read, search and maintain. Perhaps this should
> belong to a separate group? Something like tuning.something or
> defaults.something.

I'm not sure if I consider core.* too big.  Well, there are 55 or more
entries in this namespace.

>> +If the value is at least 3, then the following defaults are modified.
>> +These represent relatively new features that have existed for multiple
>> +major releases, and may present performance benefits. These benefits
>> +depend on the amount and kind of data in your repo and how you use it.
>
> Then instead of numeric values, maybe the user should write some sort
> description about the repo and we optimize for that, similar to gcc
> -Os optimized for size, -Ofast for compiler speed (-O<n> is all about
> execution speed).

I also do not like those magic numbers.

>
> We could write, for example, tuning.commitHistory =3D {small, medium,
> large} and tuning.worktree =3D {small, large, medium} and maybe
> tuning.refSize and use that to optimize. We can still have different
> optimization levels (probably just "none", "recommended" vs
> "aggressive" where agressive enables most new stuff),

I think we have three different things that are currently conflated in
one config variable and one value.

First is what we want to optimize for; is it on-disk repository size,
command performance / execution speed, or maybe convenient information.

Second is what type of repository we are dealing with.  Is there a
problem with long history, large number of files in checkout, large
and/or binary files, or all together?  The original `core.size=3Dlarge`
(or proposed core.repositorySize) was all about this issue.  Another
issue that might be important is that if it is leaf developer
repository, or is it maintainer repository, etc. (which affects for
example how the push looks like).

Third is what tradeoffs we are willing to accept to get required
performance.  Are we willing to use additional stable optional features;
are we willing to use new experimental optional features; are we
willing; are we willing to sacrifice convenience (ahead/behind
information in status, information bout forced updates in push output,
etc.) for performance?  This what current proposal is about.

It may not nnned to be a separate confi variable for a separate aspect;
it may be enough to have value that is space-separated list, or
something like that.

Best,
--
Jakub Nar=EAbski

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 2, 2019

This patch series was integrated into pu via git@15e2861.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into pu via git@091ac6a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2019

This patch series was integrated into pu via git@1ac2906.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 8, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/1/2019 10:29 AM, Derrick Stolee via GitGitGadget wrote:
> Here is a second run at this RFC, which aims to create a "meta" config
> setting that automatically turns on other settings according to a user's
> willingness to trade new Git behavior or new feature risk for performance
> benefits. The new name for the setting is "core.featureAdoptionRate" and is
> an integer scale from 0 to 10. There will be multiple "categories" of
> settings, and the intention is to allow more granular levels as necessary.

(Adding people who contributed feedback to CC line.)

It seems that this "Feature Adoption Rate" idea was too simplistic, and
had several issues. Time to take a different stab at this direction, but
with these clear goals in mind:

 1. We want intermediate users to be able to take advantage of new config
    options without watching every release for new config options.

 2. The config name should match the general effect of the implied
    settings.

 3. There are orthogonal settings that may not apply beneficially to
    all repos.

With this in mind, I propose instead a set of "feature.*" config settings
that form groups of "community recommended" settings (with some caveats).
In the space below, I'll list a set of possible feature names and the
implied config options.

First, the main two categories we've discussed so far: many commits and
many files. These two feature sets are for when your repo is large in
one of these dimensions. Perhaps there are other settings to include
in these?

	feature.manyFiles:
		index.version = 4
		index.threads = true
		core.untrackedCache = true

	feature.manyCommits:
		core.commitGraph = true
		gc.writeCommitGraph = true
		(future: fetch.writeSplitCommitGraph = true)

Note: the `fetch.writeSplitCommitGraph` does not exist yet, but could
be introduced in a later release to write a new commit-graph (with --split)
on fetch.

The other category that has been discussed already is that of "experimental
features that we generally think are helpful but change behavior slightly in
some cases".

	feature.experimental:
		pack.useSparse = true
		status.aheadBehind = false
		fetch.showForcedUpdates = false
		merge.directoryRenames = true
		protocol.version = 2
		fetch.negotiationAlgorithm = skipping

We have not discussed anything like the next category, but Dscho thought
a set of configs to make pretty diffs could be a fun "meta-config" setting:

	feature.prettyDiff:
		diff.color = auto
		ui.color = auto
		diff.context = 5
		diff.colorMoved = true
		diff.colorMovedWs = allow-indentation-change
		diff.algorithm = minimal

These are just a first round of suggestions. I'm sure we would enjoy a
debate around an optimal set of diff settings.

Finally, here is a kind of feature that I could imagine being helpful
in the future, but maybe is not a good idea to pursue right now. In
some cases users use "gc.auto = 0" to prevent all user-time blocking
maintenance. This can degrade performance over time as loose objects
and pack-files accumulate. The performance could mostly be recovered
by using a multi-pack-index, but there is not current way to automatically
write the file. This would not solve the space issues that happen here.

	feature.noGC:
		gc.auto = 0
		core.multiPackIndex = true
		(future: fetch.writeMultiPackIndex = true)

What do people think about this general idea? Are there any other
feature.* settings that could be useful? Any additional settings
to add to these groups?

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2019

On the Git mailing list, Taylor Blau wrote (reply to this):

Hi Derrick,

I'm a little bit late to the part, but I think that this is a really
interesting feature with a lot of really interesting discussion so far.

I hope you don't mind me throwing in my $.02 as well :-).

On Mon, Jul 08, 2019 at 03:22:49PM -0400, Derrick Stolee wrote:
> On 7/1/2019 10:29 AM, Derrick Stolee via GitGitGadget wrote:
> > Here is a second run at this RFC, which aims to create a "meta" config
> > setting that automatically turns on other settings according to a user's
> > willingness to trade new Git behavior or new feature risk for performance
> > benefits. The new name for the setting is "core.featureAdoptionRate" and is
> > an integer scale from 0 to 10. There will be multiple "categories" of
> > settings, and the intention is to allow more granular levels as necessary.
>
> (Adding people who contributed feedback to CC line.)
>
> It seems that this "Feature Adoption Rate" idea was too simplistic, and
> had several issues. Time to take a different stab at this direction, but
> with these clear goals in mind:
>
>  1. We want intermediate users to be able to take advantage of new config
>     options without watching every release for new config options.
>
>  2. The config name should match the general effect of the implied
>     settings.
>
>  3. There are orthogonal settings that may not apply beneficially to
>     all repos.

I think that this is a clear representation of the initial reaction I
had to the 'core.featureAdoptionRate' idea. I had drafted a response to
advance these concerns before realizing that this subsequent RFC
existed, which does a nice job highlighting the concerns that I had.

> With this in mind, I propose instead a set of "feature.*" config settings
> that form groups of "community recommended" settings (with some caveats).
> In the space below, I'll list a set of possible feature names and the
> implied config options.

I think that 'feature.*' configuration settings are a good idea. They
address each of the above (3) concerns, since they are:

  1. Can be easily adopted by even novice-level users. Perhaps
     novice-users will not be setting 'feature.manyFiles = 1', but they
     can easily opt-in to organization-level features that have been
     defined to handle organization-specific concerns.

  2. This one is straightforward: I think that setting
     'feature.manyFiles = 1' is clearer than 'feature.adoptionRate = 3'.

  3. Right. Windows developers may have a different set of what features
     are interesting to adopt than, say, every-day users, and likewise
     for kernel developers, too.

> First, the main two categories we've discussed so far: many commits and
> many files. These two feature sets are for when your repo is large in
> one of these dimensions. Perhaps there are other settings to include
> in these?
>
> 	feature.manyFiles:
> 		index.version = 4
> 		index.threads = true
> 		core.untrackedCache = true
>
> 	feature.manyCommits:
> 		core.commitGraph = true
> 		gc.writeCommitGraph = true
> 		(future: fetch.writeSplitCommitGraph = true)

I think that for this *feature* (pun mostly unintended) to really shine,
we ought to adopt Junio's suggestion in [1] that we allow users to:

  * use pre-baked features that are defined within and shipped with
    Git itself.

  * define their own features and second-order features that can
    reference both pre-baked and user-defined feature groups.

I think that this will let, say, folks at Microsoft to define a set of
features that are interesting to Windows developers, that are separate
from the features that core Git thinks will be interesting to every-day
users.

>
> <snip>
>
> Thanks,
> -Stolee

Thanks,
Taylor

[1]: https://public-inbox.org/git/[email protected]/

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2019

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <[email protected]> writes:

> The other category that has been discussed already is that of "experimental
> features that we generally think are helpful but change behavior slightly in
> some cases".
>
> 	feature.experimental:
> 		pack.useSparse = true
> 		status.aheadBehind = false
> 		fetch.showForcedUpdates = false
> 		merge.directoryRenames = true
> 		protocol.version = 2
> 		fetch.negotiationAlgorithm = skipping

Other classes you listed I can easily support, but I have trouble
deciding if this concept itself is bad, or merely that some/many of
the sample knobs you listed above are not exactly appropriate.
Either way, I have hard time swallowing this one as-is.  You may
think aheadBehind==false is helpful, but I don't, for example, and
there may be people for and against each of the experimental knobs.

But there may be a clear set of "this is agreed to be the way to the
future, but the implementation currently is too convoluted and
suspected of bugs, so we'll let early adoptors opt into the feature,
and when that happens, eventually this knob will go away (i.e. you
won't be able to turn it off)" type of knobs.  Or it may change the
behaviour drastically, but as long as it is agreed that the future
lies in that direction, I think it is OK to throw such a knob into
this class.  The key points are (1) we are committed that in the
future everybody will be forced to have it and (2) it is not merely
"we generally think", but "the decision about the future has been
made---there won't be any other way".  The feature.experimental
becomes merely a way to let early adoptors in.  If you limit the
individual features governed by feature.experimental to that kind of
knobs, I can be easily convinced that this class is a good idea.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/9/2019 3:21 PM, Junio C Hamano wrote:
> Derrick Stolee <[email protected]> writes:
> 
>> The other category that has been discussed already is that of "experimental
>> features that we generally think are helpful but change behavior slightly in
>> some cases".
>>
>> 	feature.experimental:
>> 		pack.useSparse = true
>> 		status.aheadBehind = false
>> 		fetch.showForcedUpdates = false
>> 		merge.directoryRenames = true
>> 		protocol.version = 2
>> 		fetch.negotiationAlgorithm = skipping
> 
> Other classes you listed I can easily support, but I have trouble
> deciding if this concept itself is bad, or merely that some/many of
> the sample knobs you listed above are not exactly appropriate.
> Either way, I have hard time swallowing this one as-is.  You may
> think aheadBehind==false is helpful, but I don't, for example, and
> there may be people for and against each of the experimental knobs.

Thanks for the specific note about aheadBehind. I'll drop that one
from consideration.

I suppose that fetch.showForcedUpdates is in the same category, and
it has a self-discovery mechanism (a warning message) for users who
feel the pain of checking for forced updates (i.e. it takes >10s).

> But there may be a clear set of "this is agreed to be the way to the
> future, but the implementation currently is too convoluted and
> suspected of bugs, so we'll let early adoptors opt into the feature,
> and when that happens, eventually this knob will go away (i.e. you
> won't be able to turn it off)" type of knobs.  Or it may change the
> behaviour drastically, but as long as it is agreed that the future
> lies in that direction, I think it is OK to throw such a knob into
> this class.  The key points are (1) we are committed that in the
> future everybody will be forced to have it and (2) it is not merely
> "we generally think", but "the decision about the future has been
> made---there won't be any other way".  The feature.experimental
> becomes merely a way to let early adoptors in.  If you limit the
> individual features governed by feature.experimental to that kind of
> knobs, I can be easily convinced that this class is a good idea.

From this list, do you think any of these settings are likely to
become defaults? It seems that protocol.version = 2 may be a default
now that _most_ services have an implementation, and it always falls
back to protocol v1 without extra cost.

When pack.useSparse was first introduced, I considered making it true
by default after a while. But you protested, saying you want people
knocking at the door saying it is useful. What if it lived here?

fetch.negotiationAlgorithm and merge.directoryRenames seem like
valuable features and maybe just need more time out in the world
before they could be considered defaults.

I appreciate all of the feedback, and to drive the discussion forward
I'm trying to tease out very specific opinions.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2019

This patch series was integrated into pu via git@f852a7a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2019

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <[email protected]> writes:

> From this list, do you think any of these settings are likely to
> become defaults? It seems that protocol.version = 2 may be a default
> now that _most_ services have an implementation, and it always falls
> back to protocol v1 without extra cost.
>
> When pack.useSparse was first introduced, I considered making it true
> by default after a while. But you protested, saying you want people
> knocking at the door saying it is useful. What if it lived here?
>
> fetch.negotiationAlgorithm and merge.directoryRenames seem like
> valuable features and maybe just need more time out in the world
> before they could be considered defaults.

I mostly agree with the categorization you gave above.

I think it is perfectly fine for a knob, after proving its worth by
existing in the world without being a part of any feature.* set, to
become part of feature.experimental, and then later be ejected
without ever becoming the default in response to reactions by real
world users.  This would be easier to arrange if we had at least two
experiment levels.  One class would be "we are firmly committed to
make these default in the future and ironing kinks out---please help
by setting feature.experimental on" and is more for early adopter
testing.  The other class may be "we try this on users to see if
there are some populations of them with usage patterns we did not
anticipate, and will yank it out if it turns out to be problematic
to some users."  The more guinea pig users opt into the latter
"Highly Experimental" category, the more help they can give us to
prevent an ill-thought-out feature that does not universally help to
become a new default.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 10, 2019

This patch series was integrated into pu via git@9424e2b.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 11, 2019

On the Git mailing list, Jakub Narebski wrote (reply to this):

Derrick Stolee <[email protected]> writes:
> On 7/1/2019 10:29 AM, Derrick Stolee via GitGitGadget wrote:
>>
>> Here is a second run at this RFC, which aims to create a "meta" config
>> setting that automatically turns on other settings according to a user's
>> willingness to trade new Git behavior or new feature risk for performance
>> benefits. The new name for the setting is "core.featureAdoptionRate" and=
 is
>> an integer scale from 0 to 10. There will be multiple "categories" of
>> settings, and the intention is to allow more granular levels as necessar=
y.
>
> (Adding people who contributed feedback to CC line.)
>
> It seems that this "Feature Adoption Rate" idea was too simplistic, and
> had several issues. Time to take a different stab at this direction, but
> with these clear goals in mind:
>
>  1. We want intermediate users to be able to take advantage of new config
>     options without watching every release for new config options.
>
>  2. The config name should match the general effect of the implied
>     settings.
>
>  3. There are orthogonal settings that may not apply beneficially to
>     all repos.
>
> With this in mind, I propose instead a set of "feature.*" config settings
> that form groups of "community recommended" settings (with some caveats).
> In the space below, I'll list a set of possible feature names and the
> implied config options.

A bit of bikeshed painting: I am unsure if "feature.*" is the best name
for this category of config (meta)settings.  Perhaps "defaults.*" or
"presets.*" would be a better name -- they would certainly be more
indicative of what setting this config variable actually *does*.

> First, the main two categories we've discussed so far: many commits and
> many files. These two feature sets are for when your repo is large in
> one of these dimensions. Perhaps there are other settings to include
> in these?
>
> 	feature.manyFiles:
> 		index.version =3D 4
> 		index.threads =3D true
> 		core.untrackedCache =3D true
>
> 	feature.manyCommits:
> 		core.commitGraph =3D true
> 		gc.writeCommitGraph =3D true
> 		(future: fetch.writeSplitCommitGraph =3D true)
>
> Note: the `fetch.writeSplitCommitGraph` does not exist yet, but could
> be introduced in a later release to write a new commit-graph (with --spli=
t)
> on fetch.

That looks really nice (for a built-in set of defaults).

It would be good if the above format, or something like it, could be
used as a source of truth for this feature.

> The other category that has been discussed already is that of "experiment=
al
> features that we generally think are helpful but change behavior slightly=
 in
> some cases".
>
> 	feature.experimental:
> 		pack.useSparse =3D true
> 		status.aheadBehind =3D false
> 		fetch.showForcedUpdates =3D false
> 		merge.directoryRenames =3D true
> 		protocol.version =3D 2
> 		fetch.negotiationAlgorithm =3D skipping

Well... turning off by default status.aheadBehind and
fetch.showForcedUpdates makes sense only if also repository is large.
Otherwise it is not useful, and even a bad thing.

Both of status.aheadBehind and fetch.showForcedUpdates are discoverable;
as far as I remember Git would show a hint about those config options
when the related task takes too long (either 'git status' in case of
status.aheadBehind, or 'git fetch' in the latter case).

I don't know if we have discoverability in the opposite direction: do we
show some advice (which as all advice can be turned off in the config)
if either of status.aheadBehind or fetch.showForcedUpdates is false?

> We have not discussed anything like the next category, but Dscho thought
> a set of configs to make pretty diffs could be a fun "meta-config" settin=
g:
>
> 	feature.prettyDiff:
> 		diff.color =3D auto
> 		ui.color =3D auto
> 		diff.context =3D 5
> 		diff.colorMoved =3D true
> 		diff.colorMovedWs =3D allow-indentation-change
> 		diff.algorithm =3D minimal
>
> These are just a first round of suggestions. I'm sure we would enjoy a
> debate around an optimal set of diff settings.

Maybe Git for Windows defaults would be shipped in similar form; though
I wonder if in this case it is better from simple system-wide settings.

> Finally, here is a kind of feature that I could imagine being helpful
> in the future, but maybe is not a good idea to pursue right now. In
> some cases users use "gc.auto =3D 0" to prevent all user-time blocking
> maintenance.

Don't we have a hook for that?

>              This can degrade performance over time as loose objects
> and pack-files accumulate. The performance could mostly be recovered
> by using a multi-pack-index, but there is not current way to automatically
> write the file. This would not solve the space issues that happen here.
>
> 	feature.noGC:
> 		gc.auto =3D 0
> 		core.multiPackIndex =3D true
> 		(future: fetch.writeMultiPackIndex =3D true)
>
> What do people think about this general idea? Are there any other
> feature.* settings that could be useful? Any additional settings
> to add to these groups?

Maybe feature.slowFilesystem / defaults.slowFilesystem?  Or maybe
feature.server?

Best regards,
--=20
Jakub Nar=C4=99bski

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 12, 2019

This patch series was integrated into pu via git@c0d71ed.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 12, 2019

This patch series was integrated into pu via git@0522edb.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 15, 2019

This patch series was integrated into pu via git@365cd2f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 15, 2019

This patch series was integrated into pu via git@6ec1ead.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 18, 2019

This patch series was integrated into pu via git@e2c26bc.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 19, 2019

This patch series was integrated into pu via git@af5be84.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 22, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/9/2019 6:05 PM, Junio C Hamano wrote:
> Derrick Stolee <[email protected]> writes:
> 
>> From this list, do you think any of these settings are likely to
>> become defaults? It seems that protocol.version = 2 may be a default
>> now that _most_ services have an implementation, and it always falls
>> back to protocol v1 without extra cost.
>>
>> When pack.useSparse was first introduced, I considered making it true
>> by default after a while. But you protested, saying you want people
>> knocking at the door saying it is useful. What if it lived here?
>>
>> fetch.negotiationAlgorithm and merge.directoryRenames seem like
>> valuable features and maybe just need more time out in the world
>> before they could be considered defaults.
> 
> I mostly agree with the categorization you gave above.
> 
> I think it is perfectly fine for a knob, after proving its worth by
> existing in the world without being a part of any feature.* set, to
> become part of feature.experimental, and then later be ejected
> without ever becoming the default in response to reactions by real
> world users.  This would be easier to arrange if we had at least two
> experiment levels.  One class would be "we are firmly committed to
> make these default in the future and ironing kinks out---please help
> by setting feature.experimental on" and is more for early adopter
> testing.  The other class may be "we try this on users to see if
> there are some populations of them with usage patterns we did not
> anticipate, and will yank it out if it turns out to be problematic
> to some users."  The more guinea pig users opt into the latter
> "Highly Experimental" category, the more help they can give us to
> prevent an ill-thought-out feature that does not universally help to
> become a new default.

How about "feature.preview" for defaults we expect to change in a later
version, while "feature.experimental" is for defaults we are not sure
about?

-Stolee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant