Skip to content

[GSoC] Improvements to ref-filter #684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion Documentation/git-for-each-ref.txt
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,8 @@ worktreepath::
In addition to the above, for commit and tag objects, the header
field names (`tree`, `parent`, `object`, `type`, and `tag`) can
be used to specify the value in the header field.
Fields `tree` and `parent` can also be used with modifier `:short` and
`:short=<length>` just like `objectname`.

For commit and tag objects, the special `creatordate` and `creator`
fields will correspond to the appropriate date or name-email-date tuple
Expand All @@ -230,7 +232,10 @@ These are intended for working on a mix of annotated and lightweight tags.

Fields that have name-email-date tuple as its value (`author`,
`committer`, and `tagger`) can be suffixed with `name`, `email`,
and `date` to extract the named component.
and `date` to extract the named component. For email fields (`authoremail`,
`committeremail` and `taggeremail`), `:trim` can be appended to get the email
without angle brackets, and `:localpart` to get the part before the `@` symbol
out of the trimmed email.

The message in a commit or a tag object is `contents`, from which
`contents:<part>` can be used to extract various parts out of:
Expand All @@ -242,6 +247,9 @@ contents:subject::
The first paragraph of the message, which typically is a
single line, is taken as the "subject" of the commit or the
tag message.
Instead of `contents:subject`, field `subject` can also be used to
obtain same results. `:sanitize` can be appended to `subject` for
subject line suitable for filename.

contents:body::
The remainder of the commit or the tag message that follows
Expand Down
20 changes: 11 additions & 9 deletions pretty.c
Original file line number Diff line number Diff line change
Expand Up @@ -839,21 +839,22 @@ static int istitlechar(char c)
(c >= '0' && c <= '9') || c == '.' || c == '_';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Hariom Verma via GitGitGadget" <[email protected]> writes:

> -static void format_sanitized_subject(struct strbuf *sb, const char *msg)
> +static void format_sanitized_subject(struct strbuf *sb, const char *msg, size_t len)
>  {
> +	char *r = xmemdupz(msg, len);
>  	size_t trimlen;
>  	size_t start_len = sb->len;
>  	int space = 2;
> +	int i;
>  
> -	for (; *msg && *msg != '\n'; msg++) {
> -		if (istitlechar(*msg)) {
> +	for (i = 0; i < len; i++) {
> +		if (r[i] == '\n')
> +			r[i] = ' ';

Copying the whole string only for this one looks very wasteful.
Can't you do

	for (i = 0; i < len; i++) {
		char r = msg[i];
		if (isspace(r))
			r = ' ';
		if (istitlechar(r)) {
			...
	}

or something like that instead?  

> +		if (istitlechar(r[i])) {
>  			if (space == 1)
>  				strbuf_addch(sb, '-');
>  			space = 0;
> -			strbuf_addch(sb, *msg);
> -			if (*msg == '.')
> -				while (*(msg+1) == '.')
> -					msg++;
> +			strbuf_addch(sb, r[i]);
> +			if (r[i] == '.')
> +				while (r[i+1] == '.')
> +					i++;
>  		} else
>  			space |= 1;
>  	}
> +	free(r);

Also, because neither LF or SP is a titlechar(), wouldn't the "if
r[i] is LF, replace it with SP" a no-op wrt what will be in sb at
the end?


>  	case 'f':	/* sanitized subject */
> -		format_sanitized_subject(sb, msg + c->subject_off);
> +		eol = strchrnul(msg + c->subject_off, '\n');
> +		format_sanitized_subject(sb, msg + c->subject_off, eol - (msg + c->subject_off));

This original caller expected the helper to stop reading at the end
of the first line, but the updated helper needs to be told where to
stop, so we do so with some extra computation.  Makes sense.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Hariom verma wrote (reply to this):

Hi Junio,

On Tue, Aug 18, 2020 at 12:59 AM Junio C Hamano <[email protected]> wrote:
>
> "Hariom Verma via GitGitGadget" <[email protected]> writes:
>
> > -static void format_sanitized_subject(struct strbuf *sb, const char *msg)
> > +static void format_sanitized_subject(struct strbuf *sb, const char *msg, size_t len)
> >  {
> > +     char *r = xmemdupz(msg, len);
> >       size_t trimlen;
> >       size_t start_len = sb->len;
> >       int space = 2;
> > +     int i;
> >
> > -     for (; *msg && *msg != '\n'; msg++) {
> > -             if (istitlechar(*msg)) {
> > +     for (i = 0; i < len; i++) {
> > +             if (r[i] == '\n')
> > +                     r[i] = ' ';
>
> Copying the whole string only for this one looks very wasteful.
> Can't you do
>
>         for (i = 0; i < len; i++) {
>                 char r = msg[i];
>                 if (isspace(r))
>                         r = ' ';
>                 if (istitlechar(r)) {
>                         ...
>         }
>
> or something like that instead?

Ok, that sounds better. Noted for the next version.

> > +             if (istitlechar(r[i])) {
> >                       if (space == 1)
> >                               strbuf_addch(sb, '-');
> >                       space = 0;
> > -                     strbuf_addch(sb, *msg);
> > -                     if (*msg == '.')
> > -                             while (*(msg+1) == '.')
> > -                                     msg++;
> > +                     strbuf_addch(sb, r[i]);
> > +                     if (r[i] == '.')
> > +                             while (r[i+1] == '.')
> > +                                     i++;
> >               } else
> >                       space |= 1;
> >       }
> > +     free(r);
>
> Also, because neither LF or SP is a titlechar(), wouldn't the "if
> r[i] is LF, replace it with SP" a no-op wrt what will be in sb at
> the end?

Maybe its better to directly replace LF with hyphen? [Instead of first
replacing LF with SP and then replacing SP with '-'.]

> >       case 'f':       /* sanitized subject */
> > -             format_sanitized_subject(sb, msg + c->subject_off);
> > +             eol = strchrnul(msg + c->subject_off, '\n');
> > +             format_sanitized_subject(sb, msg + c->subject_off, eol - (msg + c->subject_off));
>
> This original caller expected the helper to stop reading at the end
> of the first line, but the updated helper needs to be told where to
> stop, so we do so with some extra computation.  Makes sense.

Yeah.

Thanks,
Hariom

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Hariom verma <[email protected]> writes:

>> Also, because neither LF or SP is a titlechar(), wouldn't the "if
>> r[i] is LF, replace it with SP" a no-op wrt what will be in sb at
>> the end?
>
> Maybe its better to directly replace LF with hyphen? [Instead of first
> replacing LF with SP and then replacing SP with '-'.]

Why do you think LF is so special?

Everything other than titlechar() including HT, '#', '*', SP is
treated in the same way as the body of that loop.  It does not
directly contribute to the final contents of sb, but just leaves
the marker in the variable "space" the fact that when adding the
next titlechar() to the resulting sb, we need a SP to wordbreak.

LF now happens to be in the set due to the way you extended the
function (it wasn't fed to this function by its sole caller), but
other than that, it is no more special than HT, SP or '*'.  And they
are not replaced with SP or replaced with '-'.

So it would be the most sensible to just drop 'if LF, replace it
with SP before doing anything else' you added.  The existing 'if
titlechar, add it to sb but if we saw non-title, add a SP before
doing so to wordbreak, and if not titlechar, just remember the fact
that we saw one' should work fine as-is without special casing LF at
all.

Or am I missing something subtle?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <[email protected]> writes:

> Hariom verma <[email protected]> writes:
>
>>> Also, because neither LF or SP is a titlechar(), wouldn't the "if
>>> r[i] is LF, replace it with SP" a no-op wrt what will be in sb at
>>> the end?
>>
>> Maybe its better to directly replace LF with hyphen? [Instead of first
>> replacing LF with SP and then replacing SP with '-'.]
>
> Why do you think LF is so special?
>
> Everything other than titlechar() including HT, '#', '*', SP is
> treated in the same way as the body of that loop.  It does not
> directly contribute to the final contents of sb, but just leaves
> the marker in the variable "space" the fact that when adding the
> next titlechar() to the resulting sb, we need a SP to wordbreak.

I was undecided between mentioning and not mentioning the variable
name "space" here.  On one hand, one _could_ argue that "space" is
used to remember we saw "space and the like" and if it were named
"seen_non_title_char", then such a confusion to treat LF so
specially might not have occurred.  But on the other hand, "space"
is what the variable exactly keeps track of; it is just the need for
space on the output side, i.e. we remember that "space needed before
the next output" with that variable.

I am inclined not to suggest renaming "space" at all, but it won't
be the end of the world if it were renamed to "need_space" (before
the next output), or "seen_nontitle".  If we were to actually
rename, I have moderately strong preference to the "need_space" over
"seen_nontitle", as it won't have to be renamed again when the logic
to require a space before the next output has to be updated to
include cases other than just "we saw a nontitle character".

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Hariom verma wrote (reply to this):

Hi,

On Wed, Aug 19, 2020 at 9:31 PM Junio C Hamano <[email protected]> wrote:
>
> Hariom verma <[email protected]> writes:
>
> >> Also, because neither LF or SP is a titlechar(), wouldn't the "if
> >> r[i] is LF, replace it with SP" a no-op wrt what will be in sb at
> >> the end?
> >
> > Maybe its better to directly replace LF with hyphen? [Instead of first
> > replacing LF with SP and then replacing SP with '-'.]
>
> Why do you think LF is so special?
>
> Everything other than titlechar() including HT, '#', '*', SP is
> treated in the same way as the body of that loop.  It does not
> directly contribute to the final contents of sb, but just leaves
> the marker in the variable "space" the fact that when adding the
> next titlechar() to the resulting sb, we need a SP to wordbreak.
>
> LF now happens to be in the set due to the way you extended the
> function (it wasn't fed to this function by its sole caller), but
> other than that, it is no more special than HT, SP or '*'.  And they
> are not replaced with SP or replaced with '-'.
>
> So it would be the most sensible to just drop 'if LF, replace it
> with SP before doing anything else' you added.  The existing 'if
> titlechar, add it to sb but if we saw non-title, add a SP before
> doing so to wordbreak, and if not titlechar, just remember the fact
> that we saw one' should work fine as-is without special casing LF at
> all.
>
> Or am I missing something subtle?

You actually got it all right. Thanks for the insight.

Now, I got it. There is no need to give special attention to LF. I
missed to see that `titlechar()` is already taking care of everything.

Thanks,
Hariom

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Hariom verma wrote (reply to this):

Hi,

On Wed, Aug 19, 2020 at 9:38 PM Junio C Hamano <[email protected]> wrote:
>
> Junio C Hamano <[email protected]> writes:
>
> > Hariom verma <[email protected]> writes:
> >
> >>> Also, because neither LF or SP is a titlechar(), wouldn't the "if
> >>> r[i] is LF, replace it with SP" a no-op wrt what will be in sb at
> >>> the end?
> >>
> >> Maybe its better to directly replace LF with hyphen? [Instead of first
> >> replacing LF with SP and then replacing SP with '-'.]
> >
> > Why do you think LF is so special?
> >
> > Everything other than titlechar() including HT, '#', '*', SP is
> > treated in the same way as the body of that loop.  It does not
> > directly contribute to the final contents of sb, but just leaves
> > the marker in the variable "space" the fact that when adding the
> > next titlechar() to the resulting sb, we need a SP to wordbreak.
>
> I was undecided between mentioning and not mentioning the variable
> name "space" here.  On one hand, one _could_ argue that "space" is
> used to remember we saw "space and the like" and if it were named
> "seen_non_title_char", then such a confusion to treat LF so
> specially might not have occurred.  But on the other hand, "space"
> is what the variable exactly keeps track of; it is just the need for
> space on the output side, i.e. we remember that "space needed before
> the next output" with that variable.
>
> I am inclined not to suggest renaming "space" at all, but it won't
> be the end of the world if it were renamed to "need_space" (before
> the next output), or "seen_nontitle".  If we were to actually
> rename, I have moderately strong preference to the "need_space" over
> "seen_nontitle", as it won't have to be renamed again when the logic
> to require a space before the next output has to be updated to
> include cases other than just "we saw a nontitle character".

Yeah, if it was named "seen_non_title_char", I might not get confused.
But now as you have already explained its working pretty well, "space"
makes more sense to me.
Well, I'm okay with both "space" and "need_space".

I wonder what others have to say on this? "space" or "need_space"?

Thanks,
Hariom

}

static void format_sanitized_subject(struct strbuf *sb, const char *msg)
void format_sanitized_subject(struct strbuf *sb, const char *msg, size_t len)
{
size_t trimlen;
size_t start_len = sb->len;
int space = 2;
int i;

for (; *msg && *msg != '\n'; msg++) {
if (istitlechar(*msg)) {
for (i = 0; i < len; i++) {
if (istitlechar(msg[i])) {
if (space == 1)
strbuf_addch(sb, '-');
space = 0;
strbuf_addch(sb, *msg);
if (*msg == '.')
while (*(msg+1) == '.')
msg++;
strbuf_addch(sb, msg[i]);
if (msg[i] == '.')
while (msg[i+1] == '.')
i++;
} else
space |= 1;
}
Expand Down Expand Up @@ -1155,7 +1156,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
const struct commit *commit = c->commit;
const char *msg = c->message;
struct commit_list *p;
const char *arg;
const char *arg, *eol;
size_t res;
char **slot;

Expand Down Expand Up @@ -1405,7 +1406,8 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
format_subject(sb, msg + c->subject_off, " ");
return 1;
case 'f': /* sanitized subject */
format_sanitized_subject(sb, msg + c->subject_off);
eol = strchrnul(msg + c->subject_off, '\n');
format_sanitized_subject(sb, msg + c->subject_off, eol - (msg + c->subject_off));
return 1;
case 'b': /* body */
strbuf_addstr(sb, msg + c->body_off);
Expand Down
3 changes: 3 additions & 0 deletions pretty.h
Original file line number Diff line number Diff line change
Expand Up @@ -139,4 +139,7 @@ const char *format_subject(struct strbuf *sb, const char *msg,
/* Check if "cmit_fmt" will produce an empty output. */
int commit_format_is_empty(enum cmit_fmt);

/* Make subject of commit message suitable for filename */
void format_sanitized_subject(struct strbuf *sb, const char *msg, size_t len);

#endif /* PRETTY_H */
158 changes: 103 additions & 55 deletions ref-filter.c
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ static struct used_atom {
unsigned int nobracket : 1, push : 1, push_remote : 1;
} remote_ref;
struct {
enum { C_BARE, C_BODY, C_BODY_DEP, C_LENGTH,
C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
enum { C_BARE, C_BODY, C_BODY_DEP, C_LENGTH, C_LINES,
C_SIG, C_SUB, C_SUB_SANITIZE, C_TRAILERS } option;
struct process_trailer_options trailer_opts;
unsigned int nlines;
} contents;
Expand All @@ -139,7 +139,10 @@ static struct used_atom {
struct {
enum { O_FULL, O_LENGTH, O_SHORT } option;
unsigned int length;
} objectname;
} oid;
struct email_option {
enum { EO_RAW, EO_TRIM, EO_LOCALPART } option;
} email_option;
struct refname_atom refname;
char *head;
} u;
Expand Down Expand Up @@ -298,9 +301,12 @@ static int body_atom_parser(const struct ref_format *format, struct used_atom *a
static int subject_atom_parser(const struct ref_format *format, struct used_atom *atom,
const char *arg, struct strbuf *err)
{
if (arg)
return strbuf_addf_ret(err, -1, _("%%(subject) does not take arguments"));
atom->u.contents.option = C_SUB;
if (!arg)
atom->u.contents.option = C_SUB;
else if (!strcmp(arg, "sanitize"))
atom->u.contents.option = C_SUB_SANITIZE;
else
return strbuf_addf_ret(err, -1, _("unrecognized %%(subject) argument: %s"), arg);
return 0;
}

Expand Down Expand Up @@ -358,22 +364,36 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
return 0;
}

static int objectname_atom_parser(const struct ref_format *format, struct used_atom *atom,
const char *arg, struct strbuf *err)
static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
const char *arg, struct strbuf *err)
{
if (!arg)
atom->u.objectname.option = O_FULL;
atom->u.oid.option = O_FULL;
else if (!strcmp(arg, "short"))
atom->u.objectname.option = O_SHORT;
atom->u.oid.option = O_SHORT;
else if (skip_prefix(arg, "short=", &arg)) {
atom->u.objectname.option = O_LENGTH;
if (strtoul_ui(arg, 10, &atom->u.objectname.length) ||
atom->u.objectname.length == 0)
return strbuf_addf_ret(err, -1, _("positive value expected objectname:short=%s"), arg);
if (atom->u.objectname.length < MINIMUM_ABBREV)
atom->u.objectname.length = MINIMUM_ABBREV;
atom->u.oid.option = O_LENGTH;
if (strtoul_ui(arg, 10, &atom->u.oid.length) ||
atom->u.oid.length == 0)
return strbuf_addf_ret(err, -1, _("positive value expected '%s' in %%(%s)"), arg, atom->name);
if (atom->u.oid.length < MINIMUM_ABBREV)
atom->u.oid.length = MINIMUM_ABBREV;
} else
return strbuf_addf_ret(err, -1, _("unrecognized %%(objectname) argument: %s"), arg);
return strbuf_addf_ret(err, -1, _("unrecognized argument '%s' in %%(%s)"), arg, atom->name);
return 0;
}

static int person_email_atom_parser(const struct ref_format *format, struct used_atom *atom,
const char *arg, struct strbuf *err)
{
if (!arg)
atom->u.email_option.option = EO_RAW;
else if (!strcmp(arg, "trim"))
atom->u.email_option.option = EO_TRIM;
else if (!strcmp(arg, "localpart"))
atom->u.email_option.option = EO_LOCALPART;
else
return strbuf_addf_ret(err, -1, _("unrecognized email option: %s"), arg);
return 0;
}

Expand Down Expand Up @@ -478,25 +498,25 @@ static struct {
{ "refname", SOURCE_NONE, FIELD_STR, refname_atom_parser },
{ "objecttype", SOURCE_OTHER, FIELD_STR, objecttype_atom_parser },
{ "objectsize", SOURCE_OTHER, FIELD_ULONG, objectsize_atom_parser },
{ "objectname", SOURCE_OTHER, FIELD_STR, objectname_atom_parser },
{ "objectname", SOURCE_OTHER, FIELD_STR, oid_atom_parser },
{ "deltabase", SOURCE_OTHER, FIELD_STR, deltabase_atom_parser },
{ "tree", SOURCE_OBJ },
{ "parent", SOURCE_OBJ },
{ "tree", SOURCE_OBJ, FIELD_STR, oid_atom_parser },
{ "parent", SOURCE_OBJ, FIELD_STR, oid_atom_parser },
{ "numparent", SOURCE_OBJ, FIELD_ULONG },
{ "object", SOURCE_OBJ },
{ "type", SOURCE_OBJ },
{ "tag", SOURCE_OBJ },
{ "author", SOURCE_OBJ },
{ "authorname", SOURCE_OBJ },
{ "authoremail", SOURCE_OBJ },
{ "authoremail", SOURCE_OBJ, FIELD_STR, person_email_atom_parser },
{ "authordate", SOURCE_OBJ, FIELD_TIME },
{ "committer", SOURCE_OBJ },
{ "committername", SOURCE_OBJ },
{ "committeremail", SOURCE_OBJ },
{ "committeremail", SOURCE_OBJ, FIELD_STR, person_email_atom_parser },
{ "committerdate", SOURCE_OBJ, FIELD_TIME },
{ "tagger", SOURCE_OBJ },
{ "taggername", SOURCE_OBJ },
{ "taggeremail", SOURCE_OBJ },
{ "taggeremail", SOURCE_OBJ, FIELD_STR, person_email_atom_parser },
{ "taggerdate", SOURCE_OBJ, FIELD_TIME },
{ "creator", SOURCE_OBJ },
{ "creatordate", SOURCE_OBJ, FIELD_TIME },
Expand Down Expand Up @@ -901,21 +921,27 @@ int verify_ref_format(struct ref_format *format)
return 0;
}

static int grab_objectname(const char *name, const struct object_id *oid,
struct atom_value *v, struct used_atom *atom)
static const char *do_grab_oid(const char *field, const struct object_id *oid,
struct used_atom *atom)
{
if (starts_with(name, "objectname")) {
if (atom->u.objectname.option == O_SHORT) {
v->s = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV));
return 1;
} else if (atom->u.objectname.option == O_FULL) {
v->s = xstrdup(oid_to_hex(oid));
return 1;
} else if (atom->u.objectname.option == O_LENGTH) {
v->s = xstrdup(find_unique_abbrev(oid, atom->u.objectname.length));
return 1;
} else
BUG("unknown %%(objectname) option");
switch (atom->u.oid.option) {
case O_FULL:
return oid_to_hex(oid);
case O_LENGTH:
return find_unique_abbrev(oid, atom->u.oid.length);
case O_SHORT:
return find_unique_abbrev(oid, DEFAULT_ABBREV);
default:
BUG("unknown %%(%s) option", field);
}
}

static int grab_oid(const char *name, const char *field, const struct object_id *oid,
struct atom_value *v, struct used_atom *atom)
{
if (starts_with(name, field)) {
v->s = xstrdup(do_grab_oid(field, oid, atom));
return 1;
}
return 0;
}
Expand Down Expand Up @@ -943,7 +969,7 @@ static void grab_common_values(struct atom_value *val, int deref, struct expand_
} else if (!strcmp(name, "deltabase"))
v->s = xstrdup(oid_to_hex(&oi->delta_base_oid));
else if (deref)
grab_objectname(name, &oi->oid, v, &used_atom[i]);
grab_oid(name, "objectname", &oi->oid, v, &used_atom[i]);
}
}

Expand Down Expand Up @@ -982,21 +1008,20 @@ static void grab_commit_values(struct atom_value *val, int deref, struct object
continue;
if (deref)
name++;
if (!strcmp(name, "tree")) {
v->s = xstrdup(oid_to_hex(get_commit_tree_oid(commit)));
}
else if (!strcmp(name, "numparent")) {
if (grab_oid(name, "tree", get_commit_tree_oid(commit), v, &used_atom[i]))
continue;
if (!strcmp(name, "numparent")) {
v->value = commit_list_count(commit->parents);
v->s = xstrfmt("%lu", (unsigned long)v->value);
}
else if (!strcmp(name, "parent")) {
else if (starts_with(name, "parent")) {
struct commit_list *parents;
struct strbuf s = STRBUF_INIT;
for (parents = commit->parents; parents; parents = parents->next) {
struct commit *parent = parents->item;
struct object_id *oid = &parents->item->object.oid;
if (parents != commit->parents)
strbuf_addch(&s, ' ');
strbuf_addstr(&s, oid_to_hex(&parent->object.oid));
strbuf_addstr(&s, do_grab_oid("parent", oid, &used_atom[i]));
}
v->s = strbuf_detach(&s, NULL);
}
Expand Down Expand Up @@ -1037,16 +1062,35 @@ static const char *copy_name(const char *buf)
return xstrdup("");
}

static const char *copy_email(const char *buf)
static const char *copy_email(const char *buf, struct used_atom *atom)
{
const char *email = strchr(buf, '<');
const char *eoemail;
if (!email)
return xstrdup("");
eoemail = strchr(email, '>');
switch (atom->u.email_option.option) {
case EO_RAW:
eoemail = strchr(email, '>');
if (eoemail)
eoemail++;
break;
case EO_TRIM:
email++;
eoemail = strchr(email, '>');
break;
case EO_LOCALPART:
email++;
eoemail = strchr(email, '@');
if (!eoemail)
eoemail = strchr(email, '>');
break;
default:
BUG("unknown email option");
}

if (!eoemail)
return xstrdup("");
return xmemdupz(email, eoemail + 1 - email);
return xmemdupz(email, eoemail - email);
}

static char *copy_subject(const char *buf, unsigned long len)
Expand Down Expand Up @@ -1116,7 +1160,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
continue;
if (name[wholen] != 0 &&
strcmp(name + wholen, "name") &&
strcmp(name + wholen, "email") &&
!starts_with(name + wholen, "email") &&
!starts_with(name + wholen, "date"))
continue;
if (!wholine)
Expand All @@ -1127,8 +1171,8 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
v->s = copy_line(wholine);
else if (!strcmp(name + wholen, "name"))
v->s = copy_name(wholine);
else if (!strcmp(name + wholen, "email"))
v->s = copy_email(wholine);
else if (starts_with(name + wholen, "email"))
v->s = copy_email(wholine, &used_atom[i]);
else if (starts_with(name + wholen, "date"))
grab_date(wholine, v, name);
}
Expand Down Expand Up @@ -1241,8 +1285,8 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
continue;
if (deref)
name++;
if (strcmp(name, "subject") &&
strcmp(name, "body") &&
if (strcmp(name, "body") &&
!starts_with(name, "subject") &&
!starts_with(name, "trailers") &&
!starts_with(name, "contents"))
continue;
Expand All @@ -1254,7 +1298,11 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)

if (atom->u.contents.option == C_SUB)
v->s = copy_subject(subpos, sublen);
else if (atom->u.contents.option == C_BODY_DEP)
else if (atom->u.contents.option == C_SUB_SANITIZE) {
struct strbuf sb = STRBUF_INIT;
format_sanitized_subject(&sb, subpos, sublen);
v->s = strbuf_detach(&sb, NULL);
} else if (atom->u.contents.option == C_BODY_DEP)
v->s = xmemdupz(bodypos, bodylen);
else if (atom->u.contents.option == C_LENGTH)
v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos));
Expand Down Expand Up @@ -1704,7 +1752,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
v->s = xstrdup(buf + 1);
}
continue;
} else if (!deref && grab_objectname(name, &ref->objectname, v, atom)) {
} else if (!deref && grab_oid(name, "objectname", &ref->objectname, v, atom)) {
continue;
} else if (!strcmp(name, "HEAD")) {
if (atom->u.head && !strcmp(ref->refname, atom->u.head))
Expand Down
Loading