Skip to content

Speeding up builds for multiple translations #4295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dhalbert opened this issue Mar 1, 2021 · 1 comment
Open

Speeding up builds for multiple translations #4295

dhalbert opened this issue Mar 1, 2021 · 1 comment

Comments

@dhalbert
Copy link
Collaborator

dhalbert commented Mar 1, 2021

Currently our builds are a Cartesian product of boards * translations. We save time on the translation builds by starting with a previous build, so that only files with translation strings need to be recompiled. This saves a lot of time, but I think we could save more.

A small build like trinket_m0 takes an additional 14-15 seconds for each translation. A large build like metro_m4_express takes about 40 seconds for each.

Right now the translate() call is a true function call. It calls a generated routine that is a huge if-else-else-else... statement that does strcmp() against the passed-in string. The compiler does compile-time constant folding to reduce this to a single reference to a macro invocation TRANSLATION() that contains the compressed string data. So this call must be recompiled for each language.

Instead of a recompilation for each language, it would be faster just do a (1) simple link (not LTO) or even just to (2) concatenate the translation data onto the original binary. There would be single compilation for all languages for a particular board of all the rest of the code.

For case (1) the linker would snap links for a symbol for each unique translation string that pointed to the translation data for that string. Or, for either case (1) or (2), there could a table of translations that specifed offsets into the compressed string data. So for instance, a particular compressed string would be message #3 in this table, for all languages. Each translation would have a separate table of compressed string data, so that for instance for de_DE, the data for message #3 might start at offset #21.

The offsets and the strings could be 16-bit values, since there are certain no more than 64k messages and I think it's very unlikely the combined compressed string data is > 64kB. Using 16-bit values I think would trade off against the size of this indirection table, because the values are 16-bit, and pointers would be 32-bit.

The question then arises about how to get the string numbers or symbol names for each message. The classic way of doing this is to #define constants by ahnd with unique numbers for all messages, e.g., #define MSG_INVALID PIN 3. This is somewhat painful, and we have avoided it. I would like to try to automate this, and continue to use the English-language string as a key. But the C preprocessor is very weak and doesn't provide table lookup, etc. We could transform each C source file before its compilation with our own translation preprocessor and compile the transformed version instead of the original file, but that is awkward, messes up debug information, etc.

Instead, I have the idea of using the __LINE__ number of each translate() invocation to define a unique value. In each file that has translations, there would be a unique cpp macro definition. For example, in shared-bindings/busio/I2C.c, we'd define these two macros at the top:

#define TR_shared_bindings__busio__I2C
#define translate(s)       TR_shared_bindings__busio__I2C ## _ ## __LINE__

So for instance, this call on, say, line 132:

    mp_raise_ValueError(translate("Invalid pin"));

would expand into:

    mp_raise_ValueError(TR_shared_bindings__busio__I2C_132);

This name would be constant across compilations.

Separately from this, we'd run a translate()-finder program over all the source files. It would extract the strings from the translate() files, remember what files and lines they were on, deduplicate them, generate the compressed string data, and remember the offset for each compressed string. It would produce a compressed string data file that would be compiled for each language. It would also produce a single include file, looking something like this (for case (2) above):

...
#ifdef TR_shared_bindings__busio__I2C
#define TR_shared_bindings__busio__I2C_132 21
#define TR_shared_bindings__busio__I2C_247 23
#endif

#ifdef TR_shared_bindings__busio__SPI
...

Or these definitions could be const uint16_t TR_shared_bindings__busio__I2C_132 = 21;, etc. I'm not sure it matters. If we are snapping links, case (1), they might be actual definitions instead of #defines.

The #ifdef stuff is optional, but it saves compilation time. If it were omitted, the #define TR_shared_bindings__busio__I2C kind of definition at the top of each source could also be omitted.

The compressed string table, as mentioned for case (1), could be linked (without LTO) against the entire rest of the program, or even just pasted on the end of the .bin produced.

Thanks to @jepler for a long discussion about earlier thoughts along this line. He has already mentioned in discord some minor per-translation variations due to compile-time choices made based on the results of doing the string compressions.Comments welcome.

@dhalbert
Copy link
Collaborator Author

dhalbert commented Mar 1, 2021

Cautionary note: each build+translation needs its own compressed string table, because there are shared dictionaries among all the strings that depend on the exact strings chosen for the build. So something like this wins if the cost of doing the translation preprocessing is substantially less than recompiling on each build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants