Skip to content

Commit ddb14ff

Browse files
author
tnfchris
committed
AArch64: Fix command line options canonicalization version gcc-mirror#2. (PR target/88530)
Commandline options on AArch64 don't get canonicalized into the smallest possible set before output to the assembler. This means that overlapping feature sets are emitted with superfluous parts. Normally this isn't an issue, but in the case of crypto we have retro-actively split it into aes and sha2. We need to emit only +crypto to the assembler so old assemblers continue to work. Because of how -mcpu=native and -march=native work they end up enabling all feature bits. Instead we need to get the smallest possible set, which would also fix the problem with older the assemblers and the retro-active split. The function that handles this is called quite often. It is called for any push/pop options or attribute that changes arch, cpu etc. In order to not make this search for the smallest set too expensive we sort the options based on the number of features (bits) they enable. This allows us to process the list linearly instead of quadratically (Once we have enabled a feature, we know that anything else that enables it can be ignored. By sorting we'll get the biggest groups first and thus the smallest combination of commandline flags). The Option handling structures have been extended to have a boolean to indicate whether the option is synthetic, with that I mean if the option flag itself enables a new feature. e.g. +crypto isn't an actual feature, it just enables other features, but others like +rdma enable multiple dependent features but is itself also a feature. There are two ways to solve this. 1) Either have the options that are feature bits also turn themselves on, e.g. change rdma to turn on FP, SIMD and RDMA as dependency bits. 2) Make a distinction between these two different type of features and have the framework handle it correctly. Even though it's more code I went for the second approach, as it's the one that'll be less fragile (people can't forget it) and gives the least surprises. Effectively this patch changes the following: The values before the => are the old compiler and after the => the new code. -march=armv8.2-a+crypto+sha2 => -march=armv8.2-a+crypto -march=armv8.2-a+sha2+aes => -march=armv8.2-a+crypto The remaining behaviors stay the same. gcc/ChangeLog: PR target/88530 * common/config/aarch64/aarch64-common.c (struct aarch64_option_extension): Add is_synthetic. (all_extensions): Use it. (TARGET_OPTION_INIT_STRUCT): Define hook. (struct gcc_targetm_common): Moved to end. (all_extensions_by_on): New. (opt_ext_cmp, typedef opt_ext): New. (aarch64_option_init_struct): New. (aarch64_contains_opt): New. (aarch64_get_extension_string_for_isa_flags): Output smallest set. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Explicitly include AES and SHA2 in crypto. (fp, simd, crc, lse, fp16, rcpc, rdma, dotprod, aes, sha2, sha3, sm4, fp16fml, sve, profile, rng, memtag, sb, ssbs, predres): Set is_synthetic to false. (crypto): Set is_synthetic to true. * config/aarch64/driver-aarch64.c (AARCH64_OPT_EXTENSION): Add SYNTHETIC. gcc/testsuite/ChangeLog: PR target/88530 * gcc.target/aarch64/options_set_1.c: New test. * gcc.target/aarch64/options_set_2.c: New test. * gcc.target/aarch64/options_set_3.c: New test. * gcc.target/aarch64/options_set_4.c: New test. * gcc.target/aarch64/options_set_5.c: New test. * gcc.target/aarch64/options_set_6.c: New test. * gcc.target/aarch64/options_set_7.c: New test. * gcc.target/aarch64/options_set_8.c: New test. * gcc.target/aarch64/options_set_9.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@269193 138bc75d-0d04-0410-961f-82ee72b054a4
1 parent bff28e0 commit ddb14ff

File tree

14 files changed

+379
-62
lines changed

14 files changed

+379
-62
lines changed

gcc/ChangeLog

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,25 @@
1+
2019-02-25 Tamar Christina <[email protected]>
2+
3+
PR target/88530
4+
* common/config/aarch64/aarch64-common.c
5+
(struct aarch64_option_extension): Add is_synthetic.
6+
(all_extensions): Use it.
7+
(TARGET_OPTION_INIT_STRUCT): Define hook.
8+
(struct gcc_targetm_common): Moved to end.
9+
(all_extensions_by_on): New.
10+
(opt_ext_cmp, typedef opt_ext): New.
11+
(aarch64_option_init_struct): New.
12+
(aarch64_contains_opt): New.
13+
(aarch64_get_extension_string_for_isa_flags): Output smallest set.
14+
* config/aarch64/aarch64-option-extensions.def
15+
(AARCH64_OPT_EXTENSION): Explicitly include AES and SHA2 in crypto.
16+
(fp, simd, crc, lse, fp16, rcpc, rdma, dotprod, aes, sha2, sha3,
17+
sm4, fp16fml, sve, profile, rng, memtag, sb, ssbs, predres):
18+
Set is_synthetic to false.
19+
(crypto): Set is_synthetic to true.
20+
* config/aarch64/driver-aarch64.c (AARCH64_OPT_EXTENSION): Add
21+
SYNTHETIC.
22+
123
2019-02-25 Tamar Christina <[email protected]>
224

325
* config/arm/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,

gcc/common/config/aarch64/aarch64-common.c

Lines changed: 178 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@
4646
#define TARGET_OPTION_DEFAULT_PARAMS aarch64_option_default_params
4747
#undef TARGET_OPTION_VALIDATE_PARAM
4848
#define TARGET_OPTION_VALIDATE_PARAM aarch64_option_validate_param
49+
#undef TARGET_OPTION_INIT_STRUCT
50+
#define TARGET_OPTION_INIT_STRUCT aarch64_option_init_struct
4951

5052
/* Set default optimization options. */
5153
static const struct default_options aarch_option_optimization_table[] =
@@ -164,24 +166,35 @@ aarch64_handle_option (struct gcc_options *opts,
164166
}
165167
}
166168

167-
struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
168-
169169
/* An ISA extension in the co-processor and main instruction set space. */
170170
struct aarch64_option_extension
171171
{
172172
const char *const name;
173173
const unsigned long flag_canonical;
174174
const unsigned long flags_on;
175175
const unsigned long flags_off;
176+
const bool is_synthetic;
176177
};
177178

178179
/* ISA extensions in AArch64. */
179180
static const struct aarch64_option_extension all_extensions[] =
180181
{
181-
#define AARCH64_OPT_EXTENSION(NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, Z) \
182-
{NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF},
182+
#define AARCH64_OPT_EXTENSION(NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, \
183+
SYNTHETIC, Z) \
184+
{NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, SYNTHETIC},
185+
#include "config/aarch64/aarch64-option-extensions.def"
186+
{NULL, 0, 0, 0, false}
187+
};
188+
189+
/* A copy of the ISA extensions list for AArch64 sorted by the popcount of
190+
bits and extension turned on. Cached for efficiency. */
191+
static struct aarch64_option_extension all_extensions_by_on[] =
192+
{
193+
#define AARCH64_OPT_EXTENSION(NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, \
194+
SYNTHETIC, Z) \
195+
{NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, SYNTHETIC},
183196
#include "config/aarch64/aarch64-option-extensions.def"
184-
{NULL, 0, 0, 0}
197+
{NULL, 0, 0, 0, false}
185198
};
186199

187200
struct processor_name_to_arch
@@ -298,6 +311,76 @@ aarch64_get_all_extension_candidates (auto_vec<const char *> *candidates)
298311
candidates->safe_push (opt->name);
299312
}
300313

314+
/* Comparer to sort aarch64's feature extensions by population count. Largest
315+
first. */
316+
317+
typedef const struct aarch64_option_extension opt_ext;
318+
319+
int opt_ext_cmp (const void* a, const void* b)
320+
{
321+
opt_ext *opt_a = (opt_ext *)a;
322+
opt_ext *opt_b = (opt_ext *)b;
323+
324+
/* We consider the total set of bits an options turns on to be the union of
325+
the singleton set containing the option itself and the set of options it
326+
turns on as a dependency. As an example +dotprod turns on FL_DOTPROD and
327+
FL_SIMD. As such the set of bits represented by this option is
328+
{FL_DOTPROD, FL_SIMD}. */
329+
unsigned long total_flags_a = opt_a->flag_canonical & opt_a->flags_on;
330+
unsigned long total_flags_b = opt_b->flag_canonical & opt_b->flags_on;
331+
int popcnt_a = popcount_hwi ((HOST_WIDE_INT)total_flags_a);
332+
int popcnt_b = popcount_hwi ((HOST_WIDE_INT)total_flags_b);
333+
int order = popcnt_b - popcnt_a;
334+
335+
/* If they have the same amount of bits set, give it a more
336+
deterministic ordering by using the value of the bits themselves. */
337+
if (order == 0)
338+
return total_flags_b - total_flags_a;
339+
340+
return order;
341+
}
342+
343+
/* Implement TARGET_OPTION_INIT_STRUCT. */
344+
345+
static void
346+
aarch64_option_init_struct (struct gcc_options *opts ATTRIBUTE_UNUSED)
347+
{
348+
/* Sort the extensions based on how many bits they set, order the larger
349+
counts first. We sort the list because this makes processing the
350+
feature bits O(n) instead of O(n^2). While n is small, the function
351+
to calculate the feature strings is called on every options push,
352+
pop and attribute change (arm_neon headers, lto etc all cause this to
353+
happen quite frequently). It is a trade-off between time and space and
354+
so time won. */
355+
int n_extensions
356+
= sizeof (all_extensions) / sizeof (struct aarch64_option_extension);
357+
qsort (&all_extensions_by_on, n_extensions,
358+
sizeof (struct aarch64_option_extension), opt_ext_cmp);
359+
}
360+
361+
/* Checks to see if enough bits from the option OPT are enabled in
362+
ISA_FLAG_BITS to be able to replace the individual options with the
363+
canonicalized version of the option. This is done based on two rules:
364+
365+
1) Synthetic groups, such as +crypto we only care about the bits that are
366+
turned on. e.g. +aes+sha2 can be replaced with +crypto.
367+
368+
2) Options that themselves have a bit, such as +rdma, in this case, all the
369+
feature bits they turn on must be available and the bit for the option
370+
itself must be. In this case it's effectively a reduction rather than a
371+
grouping. e.g. +fp+simd is not enough to turn on +rdma, for that you would
372+
need +rdma+fp+simd which is reduced down to +rdma.
373+
*/
374+
375+
static bool
376+
aarch64_contains_opt (unsigned long isa_flag_bits, opt_ext *opt)
377+
{
378+
unsigned long flags_check
379+
= opt->is_synthetic ? opt->flags_on : opt->flag_canonical;
380+
381+
return (isa_flag_bits & flags_check) == flags_check;
382+
}
383+
301384
/* Return a string representation of ISA_FLAGS. DEFAULT_ARCH_FLAGS
302385
gives the default set of flags which are implied by whatever -march
303386
we'd put out. Our job is to figure out the minimal set of "+" and
@@ -311,26 +394,97 @@ aarch64_get_extension_string_for_isa_flags (unsigned long isa_flags,
311394
const struct aarch64_option_extension *opt = NULL;
312395
std::string outstr = "";
313396

314-
/* Pass one: Find all the things we need to turn on. As a special case,
315-
we always want to put out +crc if it is enabled. */
316-
for (opt = all_extensions; opt->name != NULL; opt++)
317-
if ((isa_flags & opt->flag_canonical
318-
&& !(default_arch_flags & opt->flag_canonical))
319-
|| (default_arch_flags & opt->flag_canonical
320-
&& opt->flag_canonical == AARCH64_ISA_CRC))
321-
{
322-
outstr += "+";
323-
outstr += opt->name;
324-
}
397+
unsigned long isa_flag_bits = isa_flags;
325398

326-
/* Pass two: Find all the things we need to turn off. */
327-
for (opt = all_extensions; opt->name != NULL; opt++)
328-
if ((~isa_flags) & opt->flag_canonical
329-
&& !((~default_arch_flags) & opt->flag_canonical))
399+
/* Pass one: Minimize the search space by reducing the set of options
400+
to the smallest set that still turns on the same features as before in
401+
conjunction with the bits that are turned on by default for the selected
402+
architecture. */
403+
for (opt = all_extensions_by_on; opt->name != NULL; opt++)
404+
{
405+
/* If the bit is on by default, then all the options it turns on are also
406+
on by default due to the transitive dependencies.
407+
408+
If the option is enabled explicitly in the set then we need to emit
409+
an option for it. Since this list is sorted by extensions setting the
410+
largest number of featers first, we can be sure that nothing else will
411+
ever need to set the bits we already set. Consider the following
412+
situation:
413+
414+
Feat1 = A + B + C
415+
Feat2 = A + B
416+
Feat3 = A + D
417+
Feat4 = B + C
418+
Feat5 = C
419+
420+
The following results are expected:
421+
422+
A + C = A + Feat5
423+
B + C = Feat4
424+
Feat4 + A = Feat1
425+
Feat2 + Feat5 = Feat1
426+
Feat1 + C = Feat1
427+
Feat3 + Feat4 = Feat1 + D
428+
429+
This search assumes that all invidual feature bits are use visible,
430+
in other words the user must be able to do +A, +B, +C and +D. */
431+
if (aarch64_contains_opt (isa_flag_bits | default_arch_flags, opt))
330432
{
331-
outstr += "+no";
332-
outstr += opt->name;
433+
/* We remove all the dependent bits, to prevent them from being turned
434+
on twice. This only works because we assume that all there are
435+
individual options to set all bits standalone. */
436+
isa_flag_bits &= ~opt->flags_on;
437+
isa_flag_bits |= opt->flag_canonical;
333438
}
439+
}
440+
441+
/* By toggling bits on and off, we may have set bits on that are already
442+
enabled by default. So we mask the default set out so we don't emit an
443+
option for them. Instead of checking for this each time during Pass One
444+
we just mask all default bits away at the end. */
445+
isa_flag_bits &= ~default_arch_flags;
446+
447+
/* We now have the smallest set of features we need to process. A subsequent
448+
linear scan of the bits in isa_flag_bits will allow us to print the ext
449+
names. However as a special case if CRC was enabled before, always print
450+
it. This is required because some CPUs have an incorrect specification
451+
in older assemblers. Even though CRC should be the default for these
452+
cases the -mcpu values won't turn it on. */
453+
if (isa_flags & AARCH64_ISA_CRC)
454+
isa_flag_bits |= AARCH64_ISA_CRC;
455+
456+
/* Pass Two:
457+
Print the option names that we're sure we must turn on. These are only
458+
optional extension names. Mandatory ones have already been removed and
459+
ones we explicitly want off have been too. */
460+
for (opt = all_extensions_by_on; opt->name != NULL; opt++)
461+
{
462+
if (isa_flag_bits & opt->flag_canonical)
463+
{
464+
outstr += "+";
465+
outstr += opt->name;
466+
}
467+
}
468+
469+
/* Pass Three:
470+
Print out a +no for any mandatory extension that we are
471+
turning off. By this point aarch64_parse_extension would have ensured
472+
that any optional extensions are turned off. The only things left are
473+
things that can't be turned off usually, e.g. something that is on by
474+
default because it's mandatory and we want it off. For turning off bits
475+
we don't guarantee the smallest set of flags, but instead just emit all
476+
options the user has specified.
477+
478+
The assembler requires all +<opts> to be printed before +no<opts>. */
479+
for (opt = all_extensions_by_on; opt->name != NULL; opt++)
480+
{
481+
if ((~isa_flags) & opt->flag_canonical
482+
&& !((~default_arch_flags) & opt->flag_canonical))
483+
{
484+
outstr += "+no";
485+
outstr += opt->name;
486+
}
487+
}
334488

335489
return outstr;
336490
}
@@ -411,5 +565,7 @@ aarch64_rewrite_mcpu (int argc, const char **argv)
411565
return aarch64_rewrite_selected_cpu (argv[argc - 1]);
412566
}
413567

568+
struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
569+
414570
#undef AARCH64_CPU_NAME_LENGTH
415571

0 commit comments

Comments
 (0)