-
Notifications
You must be signed in to change notification settings - Fork 140
[GSoC] Improvements to ref-filter #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/preview |
Preview email sent as [email protected] |
/submit |
Submitted as [email protected] |
ref-filter.c
Outdated
@@ -102,6 +102,10 @@ static struct ref_to_worktree_map { | |||
struct worktree **worktrees; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Hariom Verma via GitGitGadget" <[email protected]> writes:
> From: Hariom Verma <[email protected]>
>
> Currently, ref-filter only supports printing email with arrow brackets.
This is the first time I heard the term "arrow bracket". Aren't
they more commonly called angle brackets?
> Let's add support for two more email options.
> - trim : print email without arrow brackets.
Why would this be useful?
> - localpart : prints the part before the @ sign
Meaning I'd get "<gitster" for me?
Building small pieces of new feature in each patch is good, and
adding tests to each step is also good. Why not do the same for
docs?
> +static struct email_option{
Missing SP.
> + enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option;
> +} email_option;
> +
> @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf)
> const char *eoemail;
> if (!email)
> return xstrdup("");
> - eoemail = strchr(email, '>');
The original code prepares to see NULL from this strchr(); that is
why it checks eoemail for NULL and returns an empty string---the
data is broken (i.e. not an address in angle brackets), which this
code cannot do anything about---in the later part of the code.
> + switch (email_option.option) {
> + case EO_RAW:
> + eoemail = strchr(email, '>') + 1;
And this breaks the carefully laid out error handling by the
original code. Adding 1 to NULL is quite undefined.
> + break;
> + case EO_TRIM:
> + email++;
> + eoemail = strchr(email, '>');
> + break;
> + case EO_LOCALPART:
> + email++;
> + eoemail = strchr(email, '@');
The undocumented design here is that you want to return "hariom" for
"<[email protected]>", i.e. out of the "trimmed" e-mail, the part
before the at-sign is returned.
If the data were "<hariom>", you'd still want to return "hariom" no?
But because you do not check for NULL, you end up returning an empty
string.
I think you want to cut at the first '@' or '>', whichever comes
first.
> + break;
> + case EO_INVALID:
> + default:
Invalid and unhandled ones are silently ignored and not treated as
an error? I would have thought that at least the "default" one
would be a BUG(), as you covered all the possible values for the
enum with case arms. I wouldn't be surprised if seeing EO_INVALID
is also a BUG(), i.e. the control flow that led to the caller to
call this function with EO_INVALID in email_option.option is likely
to be broken. It's not like you return "" to protect yourself when
fed a bad data from objects---a bad value in .option can only come
here if the parser you wrote for "--format=<string>" produced a
wrong result.
> + return xstrdup("");
> + }
> +
> if (!eoemail)
> return xstrdup("");
> - return xmemdupz(email, eoemail + 1 - email);
> + return xmemdupz(email, eoemail - email);
> }
>
> static char *copy_subject(const char *buf, unsigned long len)
> @@ -1113,7 +1133,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
> continue;
> if (name[wholen] != 0 &&
> strcmp(name + wholen, "name") &&
> - strcmp(name + wholen, "email") &&
> + !starts_with(name + wholen, "email") &&
> !starts_with(name + wholen, "date"))
> continue;
> if (!wholine)
> @@ -1124,8 +1144,16 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
> v->s = copy_line(wholine);
> else if (!strcmp(name + wholen, "name"))
> v->s = copy_name(wholine);
> - else if (!strcmp(name + wholen, "email"))
> + else if (starts_with(name + wholen, "email")) {
> + email_option.option = EO_INVALID;
> + if (!strcmp(name + wholen, "email"))
> + email_option.option = EO_RAW;
> + if (!strcmp(name + wholen, "email:trim"))
> + email_option.option = EO_TRIM;
> + if (!strcmp(name + wholen, "email:localpart"))
> + email_option.option = EO_LOCALPART;
The ref-filter formatting language already knows many "colon plus
modifier" suffix like "refname:short" and "contents:body", but I do
not think we have ugly repetition like the above code to parse them.
Perhaps the addition for "email:<whatever>" can benefit from
studying and mimicking existing practices a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Hariom verma wrote (reply to this):
Hi,
On Tue, Jul 28, 2020 at 4:21 AM Junio C Hamano <[email protected]> wrote:
>
> "Hariom Verma via GitGitGadget" <[email protected]> writes:
>
> > From: Hariom Verma <[email protected]>
> >
> > Currently, ref-filter only supports printing email with arrow brackets.
>
> This is the first time I heard the term "arrow bracket". Aren't
> they more commonly called angle brackets?
Yeah. Sorry about that.
> > Let's add support for two more email options.
> > - trim : print email without arrow brackets.
>
> Why would this be useful?
It might be useful for using the ref-filter's logic in pretty.c
(especially for `--pretty` formats like `%an` and `%cn`)
> > - localpart : prints the part before the @ sign
>
> Meaning I'd get "<gitster" for me?
No, you'll get "gitster".
> Building small pieces of new feature in each patch is good, and
> adding tests to each step is also good. Why not do the same for
> docs?
Yeah, I agree with you. I should have focused on documentation too.
> > +static struct email_option{
>
> Missing SP.
I'll fix it.
> > + enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option;
> > +} email_option;
> > +
> > @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf)
> > const char *eoemail;
> > if (!email)
> > return xstrdup("");
> > - eoemail = strchr(email, '>');
>
> The original code prepares to see NULL from this strchr(); that is
> why it checks eoemail for NULL and returns an empty string---the
> data is broken (i.e. not an address in angle brackets), which this
> code cannot do anything about---in the later part of the code.
I think this commit still takes care of NULL.
After the switch-case statements, code consists of:
```
if (!eoemail)
return xstrdup("");
```
Which checks eoemail for NULL. And will return empty string if address
is not in angle brackets.
Same applies for local-part too. If '@' is not present in email
address, it will still return empty string.
> > + switch (email_option.option) {
> > + case EO_RAW:
> > + eoemail = strchr(email, '>') + 1;
>
> And this breaks the carefully laid out error handling by the
> original code. Adding 1 to NULL is quite undefined.
Yeah. I'll take care of it in the next version.
> > + break;
> > + case EO_TRIM:
> > + email++;
> > + eoemail = strchr(email, '>');
> > + break;
> > + case EO_LOCALPART:
> > + email++;
> > + eoemail = strchr(email, '@');
>
> The undocumented design here is that you want to return "hariom" for
> "<[email protected]>", i.e. out of the "trimmed" e-mail, the part
> before the at-sign is returned.
>
> If the data were "<hariom>", you'd still want to return "hariom" no?
> But because you do not check for NULL, you end up returning an empty
> string.
I never heard of email address without '@' symbol. Thats why I
returned empty string.
Will fix that too.
> I think you want to cut at the first '@' or '>', whichever comes
> first.
If email data can be without '@' symbol, then I guess "yes".
> > + break;
> > + case EO_INVALID:
> > + default:
>
> Invalid and unhandled ones are silently ignored and not treated as
> an error? I would have thought that at least the "default" one
> would be a BUG(), as you covered all the possible values for the
> enum with case arms. I wouldn't be surprised if seeing EO_INVALID
> is also a BUG(), i.e. the control flow that led to the caller to
> call this function with EO_INVALID in email_option.option is likely
> to be broken. It's not like you return "" to protect yourself when
> fed a bad data from objects---a bad value in .option can only come
> here if the parser you wrote for "--format=<string>" produced a
> wrong result.
Christian <[email protected]> also suggested me the same. Will
fix this too.
BTW, on master "{author,committer,tagger}email:<xyz>" does not print any error.
> > + return xstrdup("");
> > + }
> > +
> > if (!eoemail)
> > return xstrdup("");
> > - return xmemdupz(email, eoemail + 1 - email);
> > + return xmemdupz(email, eoemail - email);
> > }
> >
> > static char *copy_subject(const char *buf, unsigned long len)
> > @@ -1113,7 +1133,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
> > continue;
> > if (name[wholen] != 0 &&
> > strcmp(name + wholen, "name") &&
> > - strcmp(name + wholen, "email") &&
> > + !starts_with(name + wholen, "email") &&
> > !starts_with(name + wholen, "date"))
> > continue;
> > if (!wholine)
> > @@ -1124,8 +1144,16 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
> > v->s = copy_line(wholine);
> > else if (!strcmp(name + wholen, "name"))
> > v->s = copy_name(wholine);
> > - else if (!strcmp(name + wholen, "email"))
> > + else if (starts_with(name + wholen, "email")) {
> > + email_option.option = EO_INVALID;
> > + if (!strcmp(name + wholen, "email"))
> > + email_option.option = EO_RAW;
> > + if (!strcmp(name + wholen, "email:trim"))
> > + email_option.option = EO_TRIM;
> > + if (!strcmp(name + wholen, "email:localpart"))
> > + email_option.option = EO_LOCALPART;
>
> The ref-filter formatting language already knows many "colon plus
> modifier" suffix like "refname:short" and "contents:body", but I do
> not think we have ugly repetition like the above code to parse them.
> Perhaps the addition for "email:<whatever>" can benefit from
> studying and mimicking existing practices a bit more?
>
For "email:<whatever>",
even If I parse that <whatever>. I still make comparison something like:
```
if (!modifier)
email_option.option = EO_RAW;
else if (!strcmp(modifier, "trim"))
email_option.option = EO_TRIM;
else if (!strcmp(arg, "localpart"))
email_option.option = EO_LOCALPART;
```
Thanks,
Hariom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Hariom verma <[email protected]> writes:
>> The ref-filter formatting language already knows many "colon plus
>> modifier" suffix like "refname:short" and "contents:body", but I do
>> not think we have ugly repetition like the above code to parse them.
>> Perhaps the addition for "email:<whatever>" can benefit from
>> studying and mimicking existing practices a bit more?
>>
>
> For "email:<whatever>",
> even If I parse that <whatever>. I still make comparison something like:
> ```
> if (!modifier)
> email_option.option = EO_RAW;
> else if (!strcmp(modifier, "trim"))
> email_option.option = EO_TRIM;
> else if (!strcmp(arg, "localpart"))
> email_option.option = EO_LOCALPART;
> ```
Somebody needs to do a comparison, but it should be done at parsing
phase when the --format is grokked, not in grab phase that is run
for each and every ref to be shown.
These patches should only be done after looking at existing
"<basicatom>:<modifiers>" like "objectname:short" etc are handled,
not before, I think.
ref-filter.c
Outdated
@@ -102,6 +102,10 @@ static struct ref_to_worktree_map { | |||
struct worktree **worktrees; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Hariom Verma via GitGitGadget" <[email protected]> writes:
> - if (!strcmp(name, "tree")) {
> + if (!strcmp(name, "tree"))
> v->s = xstrdup(oid_to_hex(get_commit_tree_oid(commit)));
> - }
> + else if (!strcmp(name, "tree:short"))
> + v->s = xstrdup(find_unique_abbrev(get_commit_tree_oid(commit), DEFAULT_ABBREV));
Again, isn't this going in totally unacceptable direction?
By the time grab_foo() helper functions are reached, the requested
format should have been parsed to atom->u.foo.option and the only
thing grab_foo() helper functions should look at are the option.
Perhaps studying how "objectname" and its ":"-modified forms are
handled before writing this series would be beneficial.
- objectname_atom_parser() is called when the parser for --format
notices "objectname:modifier"; it is responsible for setting up
atom->u.objectname.option. Note that this is done only once at
the very begining of processing
- grab_objectname() is called for each and every object ref-filter
iterates over and "objectname" and/or its modified form
(e.g. "objectname:short") is requested. Since the modifier is
already parsed, it can do a simple switch on the value in
.option.
I do not know if patches [3-5/5] follow the pattern used in [1-2/5],
but if they do, then they all need to be fixed, I think.
Thanks.
ref-filter.c
Outdated
@@ -102,6 +102,10 @@ static struct ref_to_worktree_map { | |||
struct worktree **worktrees; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Đoàn Trần Công Danh wrote (reply to this):
[this is a resent, my previous mail couldn't reach the archive]
On 2020-07-27 20:43:04+0000, Hariom Verma via GitGitGadget <[email protected]> wrote:
> From: Hariom Verma <[email protected]>
>
> Currently, ref-filter only supports printing email with arrow brackets.
>
> Let's add support for two more email options.
> - trim : print email without arrow brackets.
> - localpart : prints the part before the @ sign
>
> Mentored-by: Christian Couder <[email protected]>
> Mentored-by: Heba Waly <[email protected]>
> Signed-off-by: Hariom Verma <[email protected]>
> ---
> ref-filter.c | 36 ++++++++++++++++++++++++++++++++----
> t/t6300-for-each-ref.sh | 16 ++++++++++++++++
> 2 files changed, 48 insertions(+), 4 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 8447cb09be..8563088eb1 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -102,6 +102,10 @@ static struct ref_to_worktree_map {
> struct worktree **worktrees;
> } ref_to_worktree_map;
>
> +static struct email_option{
> + enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option;
> +} email_option;
> +
> /*
> * An atom is a valid field atom listed below, possibly prefixed with
> * a "*" to denote deref_tag().
> @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf)
> const char *eoemail;
> if (!email)
> return xstrdup("");
> - eoemail = strchr(email, '>');
> + switch (email_option.option) {
> + case EO_RAW:
> + eoemail = strchr(email, '>') + 1;
> + break;
> + case EO_TRIM:
> + email++;
> + eoemail = strchr(email, '>');
> + break;
> + case EO_LOCALPART:
> + email++;
> + eoemail = strchr(email, '@');
> + break;
This is not correct.
RFC-822 allows @ in local part,
albeit, that localpart must be quoted:
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
qcontent = qtext / quoted-pair
qtext = NO-WS-CTL / ; Non white space
qtext = NO-WS-CTL / ; Non white space controls
%d33 / ; The rest of the US-ASCII
%d35-91 / ; characters not including "\"
%d93-126 ; or the quote character
quoted-pair = ("\" text) / obs-qp
IOW, those below email addresses are valid email address,
and the local part is `quoted@local'
"quoted@local"@example.com
quoted\@[email protected]
Anyway, it seems like current Git strips first `"'
from `"quoted@local"@example.com'
--
Danh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Đoàn Trần Công Danh <[email protected]> writes:
> This is not correct.
> RFC-822 allows @ in local part,
> albeit, that localpart must be quoted:
We do care about truly local e-mail addresses, without '@' anywhere
inside <braket>, simply because they are common in the result of SCM
conversion from CVS/SVN.
But I do not think we are pedantic/academic enough to have cared
about any local part that is unusual enough to require quoting; we
instead relied on the fact that we live in real world with practical
people who would avoid such an address ;-).
ced66ab
to
33f1404
Compare
Yeah I will explain it in commit message properly. |
There is an issue in commit 4595203: |
bc74985
to
feace82
Compare
/submit |
Submitted as [email protected] |
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
On the Git mailing list, Hariom verma wrote (reply to this):
|
Prepares `grab_objectname()` for more generic usage. This change will allow us to reuse `grab_objectname()` for the `tree` and `parent` atoms in a following commit. Mentored-by: Christian Couder <[email protected]> Mentored-by: Heba Waly <[email protected]> Signed-off-by: Hariom Verma <[email protected]>
As we plan to use `grab_objectname()` for `tree` and `parent` atom, it's better to parameterize the error messages in the function `grab_objectname()` where "objectname" is hard coded. Mentored-by: Christian Couder <[email protected]> Mentored-by: Heba Waly <[email protected]> Signed-off-by: Hariom Verma <[email protected]>
In previous commits, we prepared some `objectname` related functions for more generic usage, so that these functions can be used for `tree` and `parent` atom. But the name of some functions and fields may mislead someone. For ex: function `objectname_atom_parser()` implies that it is for atom `objectname`. Let's rename all such functions and fields. Mentored-by: Christian Couder <[email protected]> Mentored-by: Heba Waly <[email protected]> Signed-off-by: Hariom Verma <[email protected]>
Sometimes while using 'tree' atom, user might want to see abbrev hash instead of full 40 character hash. Just like 'objectname', it might be convenient for users to have the `:short` and `:short=<length>` option for printing 'tree' hash. Let's introduce `short` option to 'tree' atom. Mentored-by: Christian Couder <[email protected]> Mentored-by: Heba Waly <[email protected]> Signed-off-by: Hariom Verma <[email protected]>
Sometimes while using 'parent' atom, user might want to see abbrev hash instead of full 40 character hash. Just like 'objectname', it might be convenient for users to have the `:short` and `:short=<length>` option for printing 'parent' hash. Let's introduce `short` option to 'parent' atom. Mentored-by: Christian Couder <[email protected]> Mentored-by: Heba Waly <[email protected]> Signed-off-by: Hariom Verma <[email protected]>
The function 'format_sanitized_subject()' is responsible for sanitized subject line in pretty.c e.g. the subject line the-sanitized-subject-line It would be a nice enhancement to `subject` atom to have the same feature. So in the later commits, we plan to add this feature to ref-filter. Refactor `format_sanitized_subject()`, so it can be reused in ref-filter.c for adding new modifier `sanitize` to "subject" atom. Currently, the loop inside `format_sanitized_subject()` runs until `\n` is found. But now, we stored the first occurrence of `\n` in a variable `eol` and passed it in `format_sanitized_subject()`. And the loop runs upto `eol`. Mentored-by: Christian Couder <[email protected]> Mentored-by: Heba Waly <[email protected]> Signed-off-by: Hariom Verma <[email protected]>
Currently, subject does not take any arguments. This commit introduce `sanitize` formatting option to 'subject' atom. `subject:sanitize` - print sanitized subject line, suitable for a filename. e.g. %(subject): "the subject line" %(subject:sanitize): "the-subject-line" Mentored-by: Christian Couder <[email protected]> Mentored-by: Heba Waly <[email protected]> Signed-off-by: Hariom Verma <[email protected]>
This patch series was integrated into seen via git@b055fe9. |
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
This patch series was integrated into seen via git@a545a18. |
This patch series was integrated into seen via git@1939e23. |
This patch series was integrated into seen via git@a759e52. |
This patch series was integrated into seen via git@a22e343. |
This patch series was integrated into next via git@c015fa6. |
This patch series was integrated into seen via git@c758384. |
This patch series was integrated into seen via git@2611854. |
This patch series was integrated into seen via git@4d58a7c. |
This patch series was integrated into next via git@9a8bb84. |
This patch series was integrated into seen via git@313cff1. |
This patch series was integrated into seen via git@53a8aa4. |
This patch series was integrated into seen via git@c25fba9. |
This patch series was integrated into next via git@c25fba9. |
This patch series was integrated into master via git@c25fba9. |
Closed via c25fba9. |
This is the first patch series that introduces some improvements and features to file ref-filter.{c,h}. These changes are useful to
ref-filter
, but in near future also will allow us to use ref-filter's logic in pretty.cI plan to add more to format-support.{c,h} in the upcoming patch series. That will lead to more improved and feature-rich ref-filter.c