Testing Tectonic on the arXiv paper dataset #397

Mrmaxmeier · 2019-06-20T20:06:52Z

I'm experimenting with a few low-level changes in the engine. Tectonic's test suite is a nice early warning but passing the tests says little about being able to build real LaTeX documents in my experience.
The arXiv archive provides TeX sources for most of the published papers.
I've hacked up a proof of concept of a crater-like regression testing tool using parts of the arXiv dataset.

Here's a quick rundown:
It'll run the engine on each source file and store logs and output files.
This takes about 30 minutes for 2500 papers.
~~It currently relies on libfaketime to get as close to reproducible builds as possible.~~ Tectonic's output is now reproducible when setting SOURCE_DATE_EPOCH. This means we're able to detect regressions by checking the output hashes. Visual comparisons of output PDFs might still be interesting though in the future.
There are some issues with deterministic outputs as some samples trigger undefined behavior.
I've borrowed some of crater's CSS and whipped up a simple front-end for the test results.

I'll open separate issues for the false-positives caused by undefined behavior and will keep this one open as a kind-of tracking issue on reproducibility and regression testing.

Ideas / TODOs:

valgrind / asan / msan
visual diff for PDFs
remove libfaketime hacks
track performance
run all samples in-process (might uncover some state corruption)
automated runs for PRs (CI Bot Mrmaxmeier/tectonic-on-arXiv#5)

The text was updated successfully, but these errors were encountered:

pkgw · 2019-06-20T20:12:14Z

I've wanted to do this for so long, this is amazing!!!

pkgw · 2019-06-20T20:19:43Z

Let me know if you're ever in the Boston area and I'll buy you the beverage(s) of your choice.

XVilka · 2019-09-30T03:28:05Z

Would be interesting to test also the Oxidized fork/branch:

Mrmaxmeier · 2019-10-01T12:26:18Z

Would be interesting to test also the Oxidized fork/branch:

https://github.com/crlf0710/tectonic/tree/oxidize

Here's the current results with crlf0710#122 applied.
https://tt.ente.ninja/#/compare/master-v0.1.11-326-g926360a/oxidize-v0.1.11-670-g8519cfc4-dirty

pkgw · 2019-10-01T12:39:26Z

@Mrmaxmeier Thank you, that's super helpful!

burrbull · 2019-10-01T15:21:04Z

404 on right column.

cormacrelf · 2019-10-01T15:38:57Z

Where are the sources for the test runner & frontend?

burrbull · 2019-10-01T15:42:38Z

On arxiv.org under Other formats link.

cormacrelf · 2019-10-01T16:12:10Z

@burrbull not the TeX sources, the code that runs the tests and displays them on https://tt.ente.ninja.

Mrmaxmeier · 2019-10-01T16:55:23Z

@cormacrelf
It's a collection of hacky python scripts and a web page that consumes .json reports:
https://github.com/Mrmaxmeier/tectonic-on-arXiv

It might make sense to rewrite this to consume tectonic as a library. It currently just spawn a bunch of tectonic processes.

burrbull · 2019-10-01T17:24:52Z

@Mrmaxmeier can you rerun tests with crlf0710#123 ?

pkgw · 2021-01-16T19:22:27Z

Now that these tests are running automatically in the main repo, I am going to say that this issue is closed. Thank you for you work to set up this service @Mrmaxmeier, it is hugely valuable!

Mrmaxmeier mentioned this issue Jun 20, 2019

Fix an off-by-one in usages of the name_of_file variable. #398

Merged

This was referenced Jun 20, 2019

xdvipdfmx's PNG loader does not handle images with color bit depth < 8 #399

Closed

Rare segfault in isOpenTypeMathFont #400

Closed

Mrmaxmeier mentioned this issue Jun 28, 2019

Fix uninitialized reads encountered in the arXiv dataset #401

Merged

XVilka mentioned this issue Aug 20, 2019

Exporting to PDF and ePub rust-lang/mdBook#815

Open

XVilka mentioned this issue Oct 2, 2019

Use c2rust for the rest of the C code. remacs/remacs#1544

Open

pkgw closed this as completed Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing Tectonic on the arXiv paper dataset #397

Testing Tectonic on the arXiv paper dataset #397

Mrmaxmeier commented Jun 20, 2019 •

edited

Loading

pkgw commented Jun 20, 2019

Uh oh!

pkgw commented Jun 20, 2019

Uh oh!

XVilka commented Sep 30, 2019

Uh oh!

Mrmaxmeier commented Oct 1, 2019

Uh oh!

pkgw commented Oct 1, 2019

Uh oh!

burrbull commented Oct 1, 2019

Uh oh!

cormacrelf commented Oct 1, 2019

Uh oh!

burrbull commented Oct 1, 2019

Uh oh!

cormacrelf commented Oct 1, 2019

Uh oh!

Mrmaxmeier commented Oct 1, 2019

Uh oh!

burrbull commented Oct 1, 2019

Uh oh!

pkgw commented Jan 16, 2021

Uh oh!

Testing Tectonic on the arXiv paper dataset #397

Testing Tectonic on the arXiv paper dataset #397

Comments

Mrmaxmeier commented Jun 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pkgw commented Jun 20, 2019

Uh oh!

pkgw commented Jun 20, 2019

Uh oh!

XVilka commented Sep 30, 2019

Uh oh!

Mrmaxmeier commented Oct 1, 2019

Uh oh!

pkgw commented Oct 1, 2019

Uh oh!

burrbull commented Oct 1, 2019

Uh oh!

cormacrelf commented Oct 1, 2019

Uh oh!

burrbull commented Oct 1, 2019

Uh oh!

cormacrelf commented Oct 1, 2019

Uh oh!

Mrmaxmeier commented Oct 1, 2019

Uh oh!

burrbull commented Oct 1, 2019

Uh oh!

pkgw commented Jan 16, 2021

Uh oh!

Mrmaxmeier commented Jun 20, 2019 •

edited

Loading