Skip to content

Commit 93662e7

Browse files
committed
aarch64: add vector routines (among other goodies)
This PR doesn't just add `aarch64`-specific code, but it refactors pretty much everything about how the code is organized. There are big perf wins for `aarch64` (see benchmark results below), and also latency improvements across the board. A brief summary of the changes in this PR: * I've added `aarch64` NEON vector implementations for `memchr`, `memrchr`, `memchr2`, `memrchr2`, `memchr3`, `memrchr3` and `memmem`. This should lead to massive speed improvements on an increasing popular target, due in large part to Apple silicon. * I've added `wasm32` simd128 vector implementations for `memchr`, `memrchr`, `memchr2`, `memrchr2`, `memchr3` and `memrchr3`. (alexcrichton previously contributed a vector implementation for `memmem` and that remains.) * `x86_64` has no real additions other than the `memchr_iter(needle, haystack).count()` specialization. It already has SSE2 and AVX2 implementations of `memchr` (and friends) and `memmem`. It uses AVX2 automatically via runtime inspection of what the current CPU supports. There is no need to compile with the `avx2` feature enabled. * I've replaced the benchmark suite using Criterion with a benchmark suite using [rebar](https://github.com/BurntSushi/rebar). While I designed rebar to be used for regex engines, it can be used for [any substring or multi-substring search task](https://github.com/BurntSushi/rebar/blob/45afe89f437173d2dd970fee 7d7f1db5d0e05588/BYOB.md). * I've added a new `arch` sub-module that exposes a lot of the internal routines (including target specific routines) used to implement `memchr` and `memmem`. This module is part of a major refactoring of how this crate is organized and it seemed prudent to expose the internals as their APIs are pretty straight-forward. That is, there isn't a huge API design space IMO. This module includes scalar substring search implementations of Shift-Or, Rabin-Karp and Two-Way. * As a result of the refactoring mentioned above, most of the conditional compilation stuff has been pushed down and mostly abstracted away. Moreover, since each implementation now has its own proper API surface that is uniform across other implementations, each thing can be easily independently tested. Because of this, I was able to remove a reliance on the variety of custom `cfg` knobs that the previous version of `memchr` setup in its build script. This in turn **allowed me to remove the build script entirely.** Given the ubiquity of this crate, this may lead to compile time improvements downstream. (Likely small in each individual case but perhaps large in aggregate.) I can't promise that a build script will never re-appear, but I'll try to resist adding one in the future if possible. * Despite the above, compile times for this crate have sadly seemed to increase slightly. Namely, a fresh `time rebar build -e '^rust/memchrold/memmem/prebuilt$'` reports 0.944 seconds on my system while a fresh `time rebar build -e '^rust/memchr/memmem/prebuilt$'` reports 1.164 seconds. This is on `x86_64` where no real additional code was added. This could be because of the "nicer" abstractions now present in the `arch` sub-module or perhaps how the internals are structured. (Previously there were multiple monomorphic implementations of `memchr` for example and now there is a single generic implementation that is monomorphized automatically by the compiler via generics. Perhaps that is more expensive?) * I've specialized `memchr_iter(needle, haystack).count()` to use a different vector implementation that specifically only counts matches instead of reporting the offsets of each match. This can make *huge* (potentially over an order of magnitude) differences when counting the number of matches of a frequently (even semi-frequently) occurring byte in a large haystack. This is effectively what the [`bytecount`](https://crates.io/crates/bytecount) crate does (which is what ripgrep currently uses to compute line numbers for matches), but the marginal cost of adding it to the `memchr` crate was very low. So I did. And I plan to move ripgrep to using `memchr_iter(needle, haystack).count()`. (Also, the benchmarks below suggest that the counting implementation I wrote is faster than the one in `bytecount` in some cases which look like they'll be relevant for ripgrep. This was surprising to me.) * I've added an `alloc` feature which permits compiling this crate without the standard library but with the `alloc` crate. This crate is designed through-and-through to work in a core-only context, so this doesn't unlock much compared to just disabling the `std` feature. It adds a couple of APIs requiring allocation (like `memmem::Finder::into_owned`) and other things like `arch::all::shiftor` which really want an allocation to store its bit-parallel state machine. * The `libc` feature is **DEPRECATED** and is now a no-op. I don't think there is any real benefit to it any more. * A new disabled-by-default `logging` feature has been added. When enabled, this crate will emit a smattering of log messages. Usually these messages are used to indicate what kind of strategy is selected. For example, whether a vector or scalar algorithm is used for substring search. Differences across the board from the status quo. Showing only measurements with a 1.2x (or greater) difference. ``` $ rebar diff tmp/old.csv tmp/new.csv -t 1.2 -e memmem -E oneshot benchmark engine tmp/old.csv tmp/new.csv --------- ------ ----------- ----------- memmem/code/rust-library-never-fn-strength rust/memchr/memmem/prebuilt 42.8 GB/s (1.25x) 53.6 GB/s (1.00x) memmem/code/rust-library-never-fn-strength-paren rust/memchr/memmem/prebuilt 40.8 GB/s (1.32x) 53.8 GB/s (1.00x) memmem/code/rust-library-never-fn-quux rust/memchr/memmem/prebuilt 40.5 GB/s (1.37x) 55.6 GB/s (1.00x) memmem/code/rust-library-rare-fn-from-str rust/memchr/memmem/prebuilt 39.3 GB/s (1.37x) 53.8 GB/s (1.00x) memmem/code/rust-library-common-fn-is-empty rust/memchr/memmem/prebuilt 40.5 GB/s (1.30x) 52.6 GB/s (1.00x) memmem/code/rust-library-common-fn rust/memchr/memmem/prebuilt 21.6 GB/s (1.27x) 27.5 GB/s (1.00x) memmem/pathological/rare-repeated-huge-tricky rust/memchr/memmem/prebuilt 40.9 GB/s (1.55x) 63.4 GB/s (1.00x) memmem/pathological/rare-repeated-small-match rust/memchr/memmem/prebuilt 1468.7 MB/s (1.23x) 1811.4 MB/s (1.00x) memmem/sliceslice/short rust/memchr/memmem/prebuilt 14.74ms (2.08x) 7.08ms (1.00x) memmem/sliceslice/seemingly-random rust/memchr/memmem/prebuilt 9.1 MB/s (1.23x) 11.2 MB/s (1.00x) memmem/sliceslice/i386 rust/memchr/memmem/prebuilt 41.4 MB/s (1.35x) 55.8 MB/s (1.00x) memmem/subtitles/common/huge-en-you rust/memchr/memmem/prebuilt 10.7 GB/s (1.26x) 13.5 GB/s (1.00x) memmem/subtitles/common/huge-zh-that rust/memchr/memmem/prebuilt 25.2 GB/s (1.49x) 37.5 GB/s (1.00x) memmem/subtitles/never/huge-en-john-watson rust/memchr/memmem/prebuilt 42.9 GB/s (1.48x) 63.6 GB/s (1.00x) memmem/subtitles/never/huge-en-all-common-bytes rust/memchr/memmem/prebuilt 41.9 GB/s (1.26x) 52.7 GB/s (1.00x) memmem/subtitles/never/teeny-en-all-common-bytes rust/memchr/memmem/prebuilt 1161.0 MB/s (1.53x) 1780.2 MB/s (1.00x) memmem/subtitles/never/teeny-en-some-rare-bytes rust/memchr/memmem/prebuilt 1161.0 MB/s (1.53x) 1780.2 MB/s (1.00x) memmem/subtitles/never/teeny-en-two-space rust/memchr/memmem/prebuilt 1161.0 MB/s (1.53x) 1780.2 MB/s (1.00x) memmem/subtitles/never/huge-ru-john-watson rust/memchr/memmem/prebuilt 40.6 GB/s (1.56x) 63.5 GB/s (1.00x) memmem/subtitles/never/teeny-ru-john-watson rust/memchr/memmem/prebuilt 1741.5 MB/s (1.44x) 2.4 GB/s (1.00x) memmem/subtitles/never/huge-zh-john-watson rust/memchr/memmem/prebuilt 41.1 GB/s (1.46x) 59.9 GB/s (1.00x) memmem/subtitles/never/teeny-zh-john-watson rust/memchr/memmem/prebuilt 1285.4 MB/s (1.53x) 1970.9 MB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock-holmes rust/memchr/memmem/prebuilt 41.9 GB/s (1.52x) 63.5 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock rust/memchr/memmem/prebuilt 41.9 GB/s (1.46x) 61.3 GB/s (1.00x) memmem/subtitles/rare/huge-en-medium-needle rust/memchr/memmem/prebuilt 38.3 GB/s (1.46x) 55.9 GB/s (1.00x) memmem/subtitles/rare/huge-en-long-needle rust/memchr/memmem/prebuilt 2.5 GB/s (17.34x) 44.0 GB/s (1.00x) memmem/subtitles/rare/huge-en-huge-needle rust/memchr/memmem/prebuilt 2.3 GB/s (20.24x) 45.7 GB/s (1.00x) memmem/subtitles/rare/teeny-en-sherlock-holmes rust/memchr/memmem/prebuilt 1068.1 MB/s (1.47x) 1570.8 MB/s (1.00x) memmem/subtitles/rare/teeny-en-sherlock rust/memchr/memmem/prebuilt 953.7 MB/s (1.27x) 1213.8 MB/s (1.00x) memmem/subtitles/rare/teeny-ru-sherlock-holmes rust/memchr/memmem/prebuilt 1430.5 MB/s (1.47x) 2.1 GB/s (1.00x) memmem/subtitles/rare/teeny-ru-sherlock rust/memchr/memmem/prebuilt 1213.8 MB/s (1.32x) 1602.2 MB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock-holmes rust/memchr/memmem/prebuilt 41.8 GB/s (1.33x) 55.5 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock rust/memchr/memmem/prebuilt 43.0 GB/s (1.38x) 59.4 GB/s (1.00x) memmem/subtitles/rare/teeny-zh-sherlock rust/memchr/memmem/prebuilt 895.9 MB/s (1.27x) 1137.1 MB/s (1.00x) ``` A comparison with the [`sliceslice`](https://crates.io/crates/sliceslice) crate for just substring search. We only include measurements with a 1.2x difference or greater. ``` $ rebar cmp benchmarks/record/x86_64/2023-08-26.csv -e sliceslice/memmem/prebuilt -e rust/memchr/memmem/prebuilt -t 1.2 benchmark rust/memchr/memmem/prebuilt rust/sliceslice/memmem/prebuilt --------- --------------------------- ------------------------------- memmem/byterank/binary 4.4 GB/s (1.32x) 5.8 GB/s (1.00x) memmem/code/rust-library-never-fn-strength 53.6 GB/s (1.00x) 39.8 GB/s (1.35x) memmem/code/rust-library-never-fn-strength-paren 53.8 GB/s (1.00x) 39.7 GB/s (1.35x) memmem/code/rust-library-never-fn-quux 55.6 GB/s (1.00x) 38.7 GB/s (1.44x) memmem/code/rust-library-rare-fn-from-str 53.8 GB/s (2.65x) 142.7 GB/s (1.00x) memmem/pathological/md5-huge-no-hash 50.1 GB/s (1.00x) 25.7 GB/s (1.95x) memmem/pathological/md5-huge-last-hash 47.6 GB/s (1.00x) 27.7 GB/s (1.72x) memmem/pathological/rare-repeated-huge-tricky 63.4 GB/s (1.00x) 41.9 GB/s (1.51x) memmem/pathological/rare-repeated-small-tricky 25.2 GB/s (1.32x) 33.3 GB/s (1.00x) memmem/pathological/defeat-simple-vector-alphabet 4.1 GB/s (1.65x) 6.7 GB/s (1.00x) memmem/pathological/defeat-simple-vector-freq-alphabet 19.2 GB/s (1.00x) 2.6 GB/s (7.33x) memmem/pathological/defeat-simple-vector-repeated-alphabet 1234.5 MB/s (1.00x) 508.7 MB/s (2.43x) memmem/sliceslice/short 7.08ms (1.00x) 14.10ms (1.99x) memmem/sliceslice/i386 55.8 MB/s (1.00x) 39.6 MB/s (1.41x) memmem/subtitles/never/huge-en-john-watson 63.6 GB/s (1.00x) 41.7 GB/s (1.53x) memmem/subtitles/never/huge-en-all-common-bytes 52.7 GB/s (1.00x) 42.6 GB/s (1.24x) memmem/subtitles/never/teeny-en-john-watson 1027.0 MB/s (2.17x) 2.2 GB/s (1.00x) memmem/subtitles/never/teeny-en-all-common-bytes 1780.2 MB/s (1.25x) 2.2 GB/s (1.00x) memmem/subtitles/never/teeny-en-some-rare-bytes 1780.2 MB/s (1.25x) 2.2 GB/s (1.00x) memmem/subtitles/never/teeny-en-two-space 1780.2 MB/s (1.25x) 2.2 GB/s (1.00x) memmem/subtitles/never/huge-ru-john-watson 63.5 GB/s (1.00x) 12.7 GB/s (4.99x) memmem/subtitles/never/teeny-ru-john-watson 2.4 GB/s (1.23x) 3.0 GB/s (1.00x) memmem/subtitles/never/huge-zh-john-watson 59.9 GB/s (1.00x) 41.1 GB/s (1.46x) memmem/subtitles/never/teeny-zh-john-watson 1970.9 MB/s (1.25x) 2.4 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock-holmes 63.5 GB/s (1.00x) 41.6 GB/s (1.53x) memmem/subtitles/rare/huge-en-sherlock 61.3 GB/s (1.00x) 43.0 GB/s (1.42x) memmem/subtitles/rare/huge-en-medium-needle 55.9 GB/s (1.00x) 25.7 GB/s (2.17x) memmem/subtitles/rare/huge-en-long-needle 44.0 GB/s (1.00x) 25.9 GB/s (1.70x) memmem/subtitles/rare/huge-en-huge-needle 45.7 GB/s (1.00x) 29.3 GB/s (1.56x) memmem/subtitles/rare/teeny-en-sherlock 1213.8 MB/s (1.37x) 1668.9 MB/s (1.00x) memmem/subtitles/rare/huge-ru-sherlock-holmes 40.7 GB/s (1.00x) 15.2 GB/s (2.67x) memmem/subtitles/rare/teeny-ru-sherlock 1602.2 MB/s (1.56x) 2.4 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock-holmes 55.5 GB/s (1.00x) 26.6 GB/s (2.09x) memmem/subtitles/rare/huge-zh-sherlock 59.4 GB/s (1.00x) 42.4 GB/s (1.40x) memmem/subtitles/rare/teeny-zh-sherlock-holmes 1055.9 MB/s (1.87x) 1970.9 MB/s (1.00x) memmem/subtitles/rare/teeny-zh-sherlock 1137.1 MB/s (1.86x) 2.1 GB/s (1.00x) ``` Differences with the substring search implementation and `memmem` as provided by GNU libc. Showing only measurements with 2x difference or greater. ``` $ rebar cmp benchmarks/record/x86_64/2023-08-26.csv -e libc/memmem/oneshot -e rust/memchr/memmem/oneshot -t 2 benchmark libc/memmem/oneshot rust/memchr/memmem/oneshot --------- ------------------- -------------------------- memmem/code/rust-library-never-fn-strength 11.4 GB/s (4.75x) 54.1 GB/s (1.00x) memmem/code/rust-library-never-fn-strength-paren 12.4 GB/s (4.36x) 54.0 GB/s (1.00x) memmem/code/rust-library-never-fn-quux 8.1 GB/s (6.91x) 55.8 GB/s (1.00x) memmem/code/rust-library-rare-fn-from-str 15.0 GB/s (3.59x) 53.8 GB/s (1.00x) memmem/code/rust-library-common-fn-is-empty 12.5 GB/s (4.16x) 51.9 GB/s (1.00x) memmem/code/rust-library-common-fn 2.2 GB/s (5.89x) 13.0 GB/s (1.00x) memmem/code/rust-library-common-let 3.2 GB/s (2.65x) 8.5 GB/s (1.00x) memmem/pathological/rare-repeated-huge-tricky 17.8 GB/s (3.56x) 63.3 GB/s (1.00x) memmem/pathological/rare-repeated-huge-match 718.0 MB/s (1.00x) 289.1 MB/s (2.48x) memmem/pathological/rare-repeated-small-match 707.1 MB/s (1.00x) 303.1 MB/s (2.33x) memmem/subtitles/common/huge-en-that 3.7 GB/s (4.22x) 15.7 GB/s (1.00x) memmem/subtitles/common/huge-en-one-space 1543.9 MB/s (1.00x) 541.6 MB/s (2.85x) memmem/subtitles/common/huge-ru-that 2.7 GB/s (4.22x) 11.6 GB/s (1.00x) memmem/subtitles/common/huge-ru-not 2.0 GB/s (2.47x) 5.0 GB/s (1.00x) memmem/subtitles/common/huge-ru-one-space 2.9 GB/s (1.00x) 1081.0 MB/s (2.71x) memmem/subtitles/common/huge-zh-that 4.2 GB/s (3.20x) 13.4 GB/s (1.00x) memmem/subtitles/common/huge-zh-do-not 2.6 GB/s (2.40x) 6.3 GB/s (1.00x) memmem/subtitles/common/huge-zh-one-space 5.7 GB/s (1.00x) 2.4 GB/s (2.38x) memmem/subtitles/never/huge-en-john-watson 15.4 GB/s (4.12x) 63.3 GB/s (1.00x) memmem/subtitles/never/huge-en-all-common-bytes 11.9 GB/s (4.41x) 52.2 GB/s (1.00x) memmem/subtitles/never/huge-en-some-rare-bytes 11.0 GB/s (5.77x) 63.6 GB/s (1.00x) memmem/subtitles/never/huge-en-two-space 2.3 GB/s (27.77x) 63.5 GB/s (1.00x) memmem/subtitles/never/huge-ru-john-watson 5.2 GB/s (11.56x) 59.9 GB/s (1.00x) memmem/subtitles/never/huge-zh-john-watson 20.7 GB/s (2.86x) 59.2 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock-holmes 17.0 GB/s (3.71x) 63.1 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock 11.8 GB/s (5.18x) 60.9 GB/s (1.00x) memmem/subtitles/rare/huge-en-huge-needle 19.3 GB/s (2.02x) 38.9 GB/s (1.00x) memmem/subtitles/rare/huge-ru-sherlock-holmes 6.5 GB/s (9.47x) 61.5 GB/s (1.00x) memmem/subtitles/rare/huge-ru-sherlock 3.8 GB/s (16.23x) 61.6 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock 10.8 GB/s (5.48x) 59.1 GB/s (1.00x) ``` Differences with the [`bytecount`](https://crates.io/crates/bytecount) crate as `memchr_iter(needle, haystack).count()` is now specialized to its own vector implementation just for counting the number of matches (instead of reporting the offset of each match). The thoughput improvements as compared to `bytecount` on large haystacks are most interesting IMO. (I was somewhat surprised by this, as `bytecount` seems to do something clever while `memchr_iter(needle, haystack).count()` is basically just `memchr` but with the branching for reporting matches removed.) Either way, I expect this to translate directly to improvements in ripgrep, although I haven't measured that yet. ``` $ rebar cmp benchmarks/record/x86_64/2023-08-26.csv -e '^rust/bytecount/memchr/oneshot$' -e '^rust/memchr/memchr/onlycount$' benchmark rust/bytecount/memchr/oneshot rust/memchr/memchr/onlycount --------- ----------------------------- ---------------------------- memchr/sherlock/common/huge1 28.5 GB/s (1.94x) 55.3 GB/s (1.00x) memchr/sherlock/common/small1 17.7 GB/s (1.25x) 22.1 GB/s (1.00x) memchr/sherlock/common/tiny1 4.3 GB/s (1.00x) 3.8 GB/s (1.13x) memchr/sherlock/never/huge1 28.4 GB/s (2.09x) 59.3 GB/s (1.00x) memchr/sherlock/never/small1 17.7 GB/s (1.25x) 22.1 GB/s (1.00x) memchr/sherlock/never/tiny1 4.3 GB/s (1.00x) 3.8 GB/s (1.13x) memchr/sherlock/never/empty1 11.00ns (1.00x) 11.00ns (1.00x) memchr/sherlock/rare/huge1 28.5 GB/s (1.94x) 55.2 GB/s (1.00x) memchr/sherlock/rare/small1 17.7 GB/s (1.25x) 22.1 GB/s (1.00x) memchr/sherlock/rare/tiny1 4.3 GB/s (1.00x) 3.8 GB/s (1.13x) memchr/sherlock/uncommon/huge1 26.9 GB/s (2.20x) 59.3 GB/s (1.00x) memchr/sherlock/uncommon/small1 17.7 GB/s (1.25x) 22.1 GB/s (1.00x) memchr/sherlock/uncommon/tiny1 4.3 GB/s (1.00x) 3.8 GB/s (1.13x) memchr/sherlock/verycommon/huge1 28.4 GB/s (2.09x) 59.3 GB/s (1.00x) memchr/sherlock/verycommon/small1 17.7 GB/s (1.25x) 22.1 GB/s (1.00x) ``` Differences across the board from the status quo. Note that here, I've only included measurements with a 4x difference from the old memchr crate. Otherwise, pretty much every benchmark has a pretty sizeable improvement from the old version. (Because previously, `aarch64` had no vector implementations at all.) ``` $ rebar diff tmp/old-aarch64.csv tmp/new-aarch64.csv -t 4 -E oneshot benchmark engine tmp/old-aarch64.csv tmp/new-aarch64.csv --------- ------ ------------------- ------------------- memchr/sherlock/never/huge2 rust/memchr/memchr2 10.8 GB/s (4.27x) 46.3 GB/s (1.00x) memchr/sherlock/never/small1 rust/memchr/memchr/prebuilt 15.1 GB/s (41.00x) 618.4 GB/s (1.00x) memchr/sherlock/never/small1 rust/memchr/memrchr 14.7 GB/s (42.00x) 618.4 GB/s (1.00x) memchr/sherlock/never/small2 rust/memchr/memchr2 7.5 GB/s (83.00x) 618.4 GB/s (1.00x) memchr/sherlock/never/small2 rust/memchr/memrchr2 7.5 GB/s (83.00x) 618.4 GB/s (1.00x) memchr/sherlock/never/small3 rust/memchr/memchr3 7.5 GB/s (83.00x) 618.4 GB/s (1.00x) memchr/sherlock/never/small3 rust/memchr/memrchr3 7.5 GB/s (83.00x) 618.4 GB/s (1.00x) memchr/sherlock/rare/small1 rust/memchr/memchr/prebuilt 14.7 GB/s (42.00x) 618.4 GB/s (1.00x) memchr/sherlock/rare/small1 rust/memchr/memrchr 14.7 GB/s (42.00x) 618.4 GB/s (1.00x) memchr/sherlock/rare/small2 rust/memchr/memchr2 7.5 GB/s (83.00x) 618.4 GB/s (1.00x) memchr/sherlock/rare/small2 rust/memchr/memrchr2 7.5 GB/s (83.00x) 618.4 GB/s (1.00x) memchr/sherlock/uncommon/tiny1 rust/memchr/memchr/prebuilt 1605.0 MB/s (41.00x) 64.3 GB/s (1.00x) memchr/sherlock/uncommon/tiny1 rust/memchr/memrchr 1605.0 MB/s (41.00x) 64.3 GB/s (1.00x) memmem/code/rust-library-never-fn-strength rust/memchr/memmem/prebuilt 7.1 GB/s (4.17x) 29.6 GB/s (1.00x) memmem/code/rust-library-never-fn-strength-paren rust/memchr/memmem/prebuilt 6.9 GB/s (4.19x) 29.0 GB/s (1.00x) memmem/code/rust-library-rare-fn-from-str rust/memchr/memmem/prebuilt 6.5 GB/s (4.42x) 28.7 GB/s (1.00x) memmem/code/rust-library-common-fn rust/memchr/memmem/prebuilt 3.2 GB/s (5.58x) 18.0 GB/s (1.00x) memmem/code/rust-library-common-let rust/memchr/memmem/prebuilt 2012.9 MB/s (6.45x) 12.7 GB/s (1.00x) memmem/pathological/md5-huge-no-hash rust/memchr/memmem/prebuilt 1070.2 MB/s (24.69x) 25.8 GB/s (1.00x) memmem/pathological/md5-huge-last-hash rust/memchr/memmem/prebuilt 1148.2 MB/s (22.85x) 25.6 GB/s (1.00x) memmem/pathological/rare-repeated-huge-tricky rust/memchr/memmem/prebuilt 1299.3 MB/s (23.87x) 30.3 GB/s (1.00x) memmem/pathological/rare-repeated-small-tricky rust/memchr/memmem/prebuilt 1146.0 MB/s (19.83x) 22.2 GB/s (1.00x) memmem/sliceslice/seemingly-random rust/memchr/memmem/prebuilt 1485.7 KB/s (4.13x) 6.0 MB/s (1.00x) memmem/sliceslice/i386 rust/memchr/memmem/prebuilt 6.0 MB/s (5.07x) 30.3 MB/s (1.00x) memmem/subtitles/common/huge-en-that rust/memchr/memmem/prebuilt 1418.2 MB/s (11.50x) 15.9 GB/s (1.00x) memmem/subtitles/common/huge-ru-that rust/memchr/memmem/prebuilt 1389.1 MB/s (13.44x) 18.2 GB/s (1.00x) memmem/subtitles/common/huge-ru-not rust/memchr/memmem/prebuilt 1482.7 MB/s (7.06x) 10.2 GB/s (1.00x) memmem/subtitles/never/huge-en-all-common-bytes rust/memchr/memmem/prebuilt 1813.7 MB/s (12.81x) 22.7 GB/s (1.00x) memmem/subtitles/never/huge-en-two-space rust/memchr/memmem/prebuilt 1370.2 MB/s (25.23x) 33.8 GB/s (1.00x) memmem/subtitles/never/teeny-en-two-space rust/memchr/memmem/prebuilt 651.3 MB/s (41.00x) 26.1 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock rust/memchr/memmem/prebuilt 7.0 GB/s (4.40x) 30.6 GB/s (1.00x) memmem/subtitles/rare/huge-en-medium-needle rust/memchr/memmem/prebuilt 6.4 GB/s (4.43x) 28.3 GB/s (1.00x) memmem/subtitles/rare/huge-en-long-needle rust/memchr/memmem/prebuilt 7.1 GB/s (4.64x) 32.8 GB/s (1.00x) memmem/subtitles/rare/teeny-en-sherlock-holmes rust/memchr/memmem/prebuilt 651.3 MB/s (41.00x) 26.1 GB/s (1.00x) memmem/subtitles/rare/teeny-en-sherlock rust/memchr/memmem/prebuilt 651.3 MB/s (41.00x) 26.1 GB/s (1.00x) memmem/subtitles/rare/teeny-ru-sherlock-holmes rust/memchr/memmem/prebuilt 953.7 MB/s (42.00x) 39.1 GB/s (1.00x) memmem/subtitles/rare/teeny-ru-sherlock rust/memchr/memmem/prebuilt 976.9 MB/s (41.00x) 39.1 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock-holmes rust/memchr/memmem/prebuilt 4.1 GB/s (7.06x) 28.8 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock rust/memchr/memmem/prebuilt 6.1 GB/s (4.81x) 29.6 GB/s (1.00x) memmem/subtitles/rare/teeny-zh-sherlock-holmes rust/memchr/memmem/prebuilt 721.1 MB/s (41.00x) 28.9 GB/s (1.00x) memmem/subtitles/rare/teeny-zh-sherlock rust/memchr/memmem/prebuilt 721.1 MB/s (41.00x) 28.9 GB/s (1.00x) ``` A comparison with the [`sliceslice`](https://crates.io/crates/sliceslice) crate, which has its own custom `aarch64` vector implementation of substring search. We only show measurements with 1.2x or greater difference. ``` $ rebar cmp benchmarks/record/aarch64/2023-08-26.csv -e sliceslice/memmem/prebuilt -e rust/memchr/memmem/prebuilt -t 1.2 benchmark rust/memchr/memmem/prebuilt rust/sliceslice/memmem/prebuilt --------- --------------------------- ------------------------------- memmem/byterank/binary 3.1 GB/s (1.00x) 1586.4 MB/s (2.01x) memmem/code/rust-library-never-fn-strength 29.6 GB/s (1.00x) 16.1 GB/s (1.84x) memmem/code/rust-library-never-fn-strength-paren 29.0 GB/s (1.00x) 15.6 GB/s (1.86x) memmem/code/rust-library-never-fn-quux 30.2 GB/s (1.00x) 15.1 GB/s (2.00x) memmem/code/rust-library-rare-fn-from-str 28.7 GB/s (1.93x) 55.5 GB/s (1.00x) memmem/pathological/md5-huge-no-hash 25.8 GB/s (1.00x) 13.6 GB/s (1.89x) memmem/pathological/md5-huge-last-hash 25.6 GB/s (1.00x) 13.5 GB/s (1.90x) memmem/pathological/rare-repeated-huge-tricky 30.3 GB/s (1.00x) 16.6 GB/s (1.83x) memmem/pathological/rare-repeated-small-tricky 22.2 GB/s (1.00x) 11.2 GB/s (1.98x) memmem/pathological/defeat-simple-vector-alphabet 3.0 GB/s (1.00x) 1114.1 MB/s (2.77x) memmem/pathological/defeat-simple-vector-freq-alphabet 14.8 GB/s (1.00x) 2.2 GB/s (6.72x) memmem/pathological/defeat-simple-vector-repeated-alphabet 835.1 MB/s (1.00x) 173.8 MB/s (4.80x) memmem/sliceslice/short 7.33ms (1.00x) 36.55ms (4.99x) memmem/sliceslice/seemingly-random 6.0 MB/s (1.00x) 3.6 MB/s (1.67x) memmem/sliceslice/i386 30.3 MB/s (1.00x) 15.1 MB/s (2.00x) memmem/subtitles/never/huge-en-john-watson 30.9 GB/s (1.00x) 16.6 GB/s (1.86x) memmem/subtitles/never/huge-en-all-common-bytes 22.7 GB/s (1.00x) 13.8 GB/s (1.64x) memmem/subtitles/never/huge-en-some-rare-bytes 30.9 GB/s (1.00x) 16.6 GB/s (1.86x) memmem/subtitles/never/huge-en-two-space 33.8 GB/s (1.00x) 16.6 GB/s (2.03x) memmem/subtitles/never/huge-ru-john-watson 30.3 GB/s (1.00x) 7.1 GB/s (4.25x) memmem/subtitles/never/huge-zh-john-watson 29.2 GB/s (1.00x) 16.0 GB/s (1.83x) memmem/subtitles/rare/huge-en-sherlock-holmes 30.3 GB/s (1.00x) 16.3 GB/s (1.86x) memmem/subtitles/rare/huge-en-sherlock 30.6 GB/s (1.00x) 16.6 GB/s (1.85x) memmem/subtitles/rare/huge-en-medium-needle 28.3 GB/s (1.00x) 12.4 GB/s (2.28x) memmem/subtitles/rare/huge-en-long-needle 32.8 GB/s (1.00x) 15.7 GB/s (2.08x) memmem/subtitles/rare/huge-en-huge-needle 32.9 GB/s (1.00x) 16.1 GB/s (2.05x) memmem/subtitles/rare/huge-ru-sherlock-holmes 30.3 GB/s (1.00x) 8.0 GB/s (3.80x) memmem/subtitles/rare/huge-ru-sherlock 30.2 GB/s (1.00x) 10.1 GB/s (3.00x) memmem/subtitles/rare/huge-zh-sherlock-holmes 28.8 GB/s (1.00x) 14.7 GB/s (1.95x) memmem/subtitles/rare/huge-zh-sherlock 29.6 GB/s (1.00x) 14.0 GB/s (2.12x) ``` Differences with the substring search implementation and `memmem` as provided by macOS's libc. Showing only measurements with 2x difference or greater. This is what utter destruction looks like. (I'm not sure what's going on in benchmarks like `memmem/subtitles/rare/teeny-en-sherlock-holmes`. It's a tiny haystack and macOS seems to either measure 1ns or 41ns. I wonder if there's something odd about time precision on macOS? You can see the reverse happen in `memmem/subtitles/rare/teeny-zh-sherlock`.) ``` $ rebar cmp benchmarks/record/aarch64/2023-08-26.csv -e libc/memmem/oneshot -e rust/memchr/memmem/oneshot -t 2 benchmark libc/memmem/oneshot rust/memchr/memmem/oneshot --------- ------------------- -------------------------- memmem/byterank/binary 626.1 MB/s (5.11x) 3.1 GB/s (1.00x) memmem/code/rust-library-never-fn-strength 1320.8 MB/s (22.98x) 29.6 GB/s (1.00x) memmem/code/rust-library-never-fn-strength-paren 1320.8 MB/s (22.49x) 29.0 GB/s (1.00x) memmem/code/rust-library-never-fn-quux 1332.0 MB/s (23.25x) 30.2 GB/s (1.00x) memmem/code/rust-library-rare-fn-from-str 1442.0 MB/s (20.37x) 28.7 GB/s (1.00x) memmem/code/rust-library-common-fn-is-empty 1320.8 MB/s (22.02x) 28.4 GB/s (1.00x) memmem/code/rust-library-common-fn 1320.8 MB/s (11.44x) 14.8 GB/s (1.00x) memmem/code/rust-library-common-let 1114.7 MB/s (8.59x) 9.4 GB/s (1.00x) memmem/pathological/md5-huge-no-hash 994.0 MB/s (26.39x) 25.6 GB/s (1.00x) memmem/pathological/md5-huge-last-hash 994.3 MB/s (26.39x) 25.6 GB/s (1.00x) memmem/pathological/rare-repeated-huge-tricky 1670.8 MB/s (18.56x) 30.3 GB/s (1.00x) memmem/pathological/rare-repeated-huge-match 1353.0 MB/s (1.00x) 378.5 MB/s (3.57x) memmem/pathological/rare-repeated-small-tricky 1637.4 MB/s (13.88x) 22.2 GB/s (1.00x) memmem/pathological/rare-repeated-small-match 1348.3 MB/s (1.00x) 394.5 MB/s (3.42x) memmem/pathological/defeat-simple-vector-alphabet 568.1 MB/s (5.43x) 3.0 GB/s (1.00x) memmem/pathological/defeat-simple-vector-freq-alphabet 1027.2 MB/s (14.55x) 14.6 GB/s (1.00x) memmem/pathological/defeat-simple-vector-repeated-alphabet 173.8 MB/s (4.80x) 834.2 MB/s (1.00x) memmem/subtitles/common/huge-en-that 841.6 MB/s (13.19x) 10.8 GB/s (1.00x) memmem/subtitles/common/huge-en-you 1161.7 MB/s (4.00x) 4.5 GB/s (1.00x) memmem/subtitles/common/huge-ru-that 590.9 MB/s (19.48x) 11.2 GB/s (1.00x) memmem/subtitles/common/huge-ru-not 334.3 MB/s (18.62x) 6.1 GB/s (1.00x) memmem/subtitles/common/huge-zh-that 1340.1 MB/s (11.49x) 15.0 GB/s (1.00x) memmem/subtitles/common/huge-zh-do-not 858.5 MB/s (9.15x) 7.7 GB/s (1.00x) memmem/subtitles/never/huge-en-john-watson 1648.3 MB/s (19.14x) 30.8 GB/s (1.00x) memmem/subtitles/never/huge-en-all-common-bytes 1075.4 MB/s (21.65x) 22.7 GB/s (1.00x) memmem/subtitles/never/huge-en-some-rare-bytes 1655.7 MB/s (19.10x) 30.9 GB/s (1.00x) memmem/subtitles/never/huge-en-two-space 541.6 MB/s (63.83x) 33.8 GB/s (1.00x) memmem/subtitles/never/teeny-en-two-space 651.3 MB/s (41.00x) 26.1 GB/s (1.00x) memmem/subtitles/never/huge-ru-john-watson 427.0 MB/s (72.56x) 30.3 GB/s (1.00x) memmem/subtitles/never/huge-zh-john-watson 1155.4 MB/s (25.81x) 29.1 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock-holmes 1577.4 MB/s (19.60x) 30.2 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock 1577.4 MB/s (19.78x) 30.5 GB/s (1.00x) memmem/subtitles/rare/huge-en-medium-needle 1155.6 MB/s (24.95x) 28.2 GB/s (1.00x) memmem/subtitles/rare/huge-en-long-needle 1488.8 MB/s (20.77x) 30.2 GB/s (1.00x) memmem/subtitles/rare/huge-en-huge-needle 1609.5 MB/s (17.27x) 27.1 GB/s (1.00x) memmem/subtitles/rare/teeny-en-sherlock-holmes 26.1 GB/s (1.00x) 651.3 MB/s (41.00x) memmem/subtitles/rare/huge-ru-sherlock-holmes 427.0 MB/s (72.41x) 30.2 GB/s (1.00x) memmem/subtitles/rare/huge-ru-sherlock 348.2 MB/s (91.21x) 31.0 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock-holmes 955.8 MB/s (31.66x) 29.6 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock 853.4 MB/s (35.46x) 29.6 GB/s (1.00x) memmem/subtitles/rare/teeny-zh-sherlock-holmes 28.9 GB/s (1.00x) 721.1 MB/s (41.00x) memmem/subtitles/rare/teeny-zh-sherlock 721.1 MB/s (41.00x) 28.9 GB/s (1.00x) ``` Differences with the [`bytecount`](https://crates.io/crates/bytecount) crate as `memchr_iter(needle, haystack).count()` is now specialized to its own vector implementation just for counting the number of matches (instead of reporting the offset of each match). ``` $ rebar cmp benchmarks/record/aarch64/2023-08-26.csv -e '^rust/bytecount/memchr/oneshot$' -e '^rust/memchr/memchr/onlycount$' benchmark rust/bytecount/memchr/oneshot rust/memchr/memchr/onlycount --------- ----------------------------- ---------------------------- memchr/sherlock/common/huge1 29.5 GB/s (1.40x) 41.4 GB/s (1.00x) memchr/sherlock/common/small1 618.4 GB/s (1.00x) 618.4 GB/s (1.00x) memchr/sherlock/common/tiny1 64.3 GB/s (1.00x) 64.3 GB/s (1.00x) memchr/sherlock/never/huge1 29.5 GB/s (1.40x) 41.4 GB/s (1.00x) memchr/sherlock/never/small1 618.4 GB/s (1.00x) 618.4 GB/s (1.00x) memchr/sherlock/never/tiny1 64.3 GB/s (1.00x) 64.3 GB/s (1.00x) memchr/sherlock/never/empty1 1.00ns (1.00x) 1.00ns (1.00x) memchr/sherlock/rare/huge1 29.5 GB/s (1.40x) 41.4 GB/s (1.00x) memchr/sherlock/rare/small1 618.4 GB/s (1.00x) 618.4 GB/s (1.00x) memchr/sherlock/rare/tiny1 64.3 GB/s (1.00x) 64.3 GB/s (1.00x) memchr/sherlock/uncommon/huge1 29.5 GB/s (1.40x) 41.4 GB/s (1.00x) memchr/sherlock/uncommon/small1 618.4 GB/s (1.00x) 618.4 GB/s (1.00x) memchr/sherlock/uncommon/tiny1 64.3 GB/s (1.00x) 64.3 GB/s (1.00x) memchr/sherlock/verycommon/huge1 28.7 GB/s (1.44x) 41.4 GB/s (1.00x) memchr/sherlock/verycommon/small1 618.4 GB/s (1.00x) 618.4 GB/s (1.00x) ```
1 parent abcc473 commit 93662e7

File tree

203 files changed

+41273
-397169
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

203 files changed

+41273
-397169
lines changed

.github/workflows/ci.yml

Lines changed: 171 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -8,46 +8,63 @@ on:
88
- master
99
schedule:
1010
- cron: '00 01 * * *'
11+
12+
# The section is needed to drop write-all permissions that are granted on
13+
# `schedule` event. By specifying any permission explicitly all others are set
14+
# to none. By using the principle of least privilege the damage a compromised
15+
# workflow can do (because of an injection or compromised third party tool or
16+
# action) is restricted. Currently the worklow doesn't need any additional
17+
# permission except for pulling the code. Adding labels to issues, commenting
18+
# on pull-requests, etc. may need additional permissions:
19+
#
20+
# Syntax for this section:
21+
# https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions
22+
#
23+
# Reference for how to assign permissions on a job-by-job basis:
24+
# https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs
25+
#
26+
# Reference for available permissions that we can enable if needed:
27+
# https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token
28+
permissions:
29+
# to fetch code (actions/checkout)
30+
contents: read
31+
1132
jobs:
33+
# Baseline testing across a number of different targets.
1234
test:
13-
name: test
1435
env:
1536
# For some builds, we use cross to test on 32-bit and big-endian
1637
# systems.
1738
CARGO: cargo
1839
# When CARGO is set to CROSS, TARGET is set to `--target matrix.target`.
40+
# Note that we only use cross on Linux, so setting a target on a
41+
# different OS will just use normal cargo.
1942
TARGET:
43+
# Bump this as appropriate. We pin to a version to make sure CI
44+
# continues to work as cross releases in the past have broken things
45+
# in subtle ways.
46+
CROSS_VERSION: v0.2.5
2047
# Make quickcheck run more tests for hopefully better coverage.
2148
QUICKCHECK_TESTS: 100000
2249
runs-on: ${{ matrix.os }}
2350
strategy:
51+
fail-fast: false
2452
matrix:
2553
build:
26-
- pinned
2754
- stable
28-
- stable-32
29-
- stable-mips
30-
- wasm
3155
- beta
3256
- nightly
3357
- macos
3458
- win-msvc
3559
- win-gnu
60+
- stable-x86
61+
- stable-aarch64
62+
- stable-powerpc64
63+
- stable-s390x
3664
include:
37-
- build: pinned
38-
os: ubuntu-latest
39-
rust: 1.41.1
4065
- build: stable
4166
os: ubuntu-latest
4267
rust: stable
43-
- build: stable-32
44-
os: ubuntu-latest
45-
rust: stable
46-
target: i686-unknown-linux-gnu
47-
- build: stable-mips
48-
os: ubuntu-latest
49-
rust: stable
50-
target: mips64-unknown-linux-gnuabi64
5168
- build: beta
5269
os: ubuntu-latest
5370
rust: beta
@@ -63,10 +80,24 @@ jobs:
6380
- build: win-gnu
6481
os: windows-latest
6582
rust: stable-x86_64-gnu
66-
- build: wasm
83+
- build: stable-x86
6784
os: ubuntu-latest
68-
rust: stable-x86_64-gnu
69-
wasm: true
85+
rust: stable
86+
target: i686-unknown-linux-gnu
87+
# This is kind of a stand-in for Apple silicon since we can't currently
88+
# use GitHub Actions with Apple silicon.
89+
- build: stable-aarch64
90+
os: ubuntu-latest
91+
rust: stable
92+
target: aarch64-unknown-linux-gnu
93+
- build: stable-powerpc64
94+
os: ubuntu-latest
95+
rust: stable
96+
target: powerpc64-unknown-linux-gnu
97+
- build: stable-s390x
98+
os: ubuntu-latest
99+
rust: stable
100+
target: s390x-unknown-linux-gnu
70101
steps:
71102
- name: Checkout repository
72103
uses: actions/checkout@v3
@@ -75,83 +106,78 @@ jobs:
75106
with:
76107
toolchain: ${{ matrix.rust }}
77108
- name: Use Cross
78-
if: matrix.target != ''
109+
if: matrix.os == 'ubuntu-latest' && matrix.target != ''
79110
run: |
80-
# We used to install 'cross' from master, but it kept failing. So now
81-
# we build from a known-good version until 'cross' becomes more stable
82-
# or we find an alternative. Notably, between v0.2.1 and current
83-
# master (2022-06-14), the number of Cross's dependencies has doubled.
84-
cargo install --bins --git https://github.com/rust-embedded/cross --tag v0.2.1
111+
# In the past, new releases of 'cross' have broken CI. So for now, we
112+
# pin it. We also use their pre-compiled binary releases because cross
113+
# has over 100 dependencies and takes a bit to compile.
114+
dir="$RUNNER_TEMP/cross-download"
115+
mkdir "$dir"
116+
echo "$dir" >> $GITHUB_PATH
117+
cd "$dir"
118+
curl -LO "https://github.com/cross-rs/cross/releases/download/$CROSS_VERSION/cross-x86_64-unknown-linux-musl.tar.gz"
119+
tar xf cross-x86_64-unknown-linux-musl.tar.gz
85120
echo "CARGO=cross" >> $GITHUB_ENV
86121
echo "TARGET=--target ${{ matrix.target }}" >> $GITHUB_ENV
87-
- name: Download Wasmtime
88-
if: matrix.wasm
89-
run: |
90-
rustup target add wasm32-wasi
91-
echo "CARGO_BUILD_TARGET=wasm32-wasi" >> $GITHUB_ENV
92-
echo "RUSTFLAGS=-Ctarget-feature=+simd128" >> $GITHUB_ENV
93-
curl -LO https://github.com/bytecodealliance/wasmtime/releases/download/v0.32.0/wasmtime-v0.32.0-x86_64-linux.tar.xz
94-
tar xvf wasmtime-v0.32.0-x86_64-linux.tar.xz
95-
echo `pwd`/wasmtime-v0.32.0-x86_64-linux >> $GITHUB_PATH
96-
echo "CARGO_TARGET_WASM32_WASI_RUNNER=wasmtime run --wasm-features simd --" >> $GITHUB_ENV
97122
- name: Show command used for Cargo
98123
run: |
99124
echo "cargo command is: ${{ env.CARGO }}"
100125
echo "target flag is: ${{ env.TARGET }}"
101126
- name: Show CPU info for debugging
102127
if: matrix.os == 'ubuntu-latest'
103128
run: lscpu
104-
- run: ${{ env.CARGO }} build --verbose $TARGET
105-
- run: ${{ env.CARGO }} build --verbose $TARGET --no-default-features
106-
- run: ${{ env.CARGO }} doc --verbose $TARGET
107-
# Our dev dependencies evolve more rapidly than we'd like, so only run
108-
# tests when we aren't pinning the Rust version.
109-
- if: matrix.build != 'pinned'
110-
name: Show byte order for debugging
129+
- name: Basic build
130+
run: ${{ env.CARGO }} build --verbose $TARGET
131+
- name: Build docs
132+
run: ${{ env.CARGO }} doc --verbose $TARGET
133+
- name: Show byte order for debugging
111134
run: ${{ env.CARGO }} test --verbose $TARGET byte_order -- --nocapture
112-
- if: matrix.build != 'pinned'
113-
name: Run tests under default configuration
114-
run: ${{ env.CARGO }} test --verbose $TARGET
115-
- if: matrix.build != 'pinned'
116-
name: Run tests with just alloc feature
117-
run: ${{ env.CARGO }} test --verbose --no-default-features --features alloc $TARGET
118-
- if: matrix.build == 'stable'
119-
name: Run under different SIMD configurations
120-
run: |
121-
set -x
122-
123-
# Enable libc while using SIMD, libc won't be used.
124-
# (This is to ensure valid logic in the picking process.)
125-
cargo test --verbose --features libc
126-
127-
preamble="--cfg memchr_disable_auto_simd"
135+
- name: Run tests
136+
run: cargo test --verbose
137+
- name: Run with only 'alloc' enabled
138+
run: cargo test --verbose --no-default-features --features alloc
139+
- name: Run tests without any features enabled (core-only)
140+
run: cargo test --verbose --no-default-features
141+
- name: Run tests with miscellaneous features
142+
run: cargo test --verbose --features logging
128143

129-
# Force use of fallback without libc.
130-
RUSTFLAGS="$preamble" cargo test --verbose
131-
132-
# Force use of libc.
133-
RUSTFLAGS="$preamble" cargo test --verbose --features libc
134-
135-
preamble="$preamble --cfg memchr_runtime_simd"
136-
# Force use of fallback even when SIMD is enabled.
137-
RUSTFLAGS="$preamble" cargo test --verbose
138-
139-
# For some reason, cargo seems to get confused which results in
140-
# link errors. So wipe the slate clean.
141-
cargo clean
142-
# Force use of sse2 only
143-
RUSTFLAGS="$preamble --cfg memchr_runtime_sse2" cargo test --verbose
144-
145-
# ... and wipe it again. So weird.
146-
cargo clean
147-
# Force use of avx only
148-
RUSTFLAGS="$preamble --cfg memchr_runtime_avx" cargo test --verbose
149-
- if: matrix.build == 'nightly'
150-
name: Run benchmarks as tests
151-
run: cargo bench --manifest-path bench/Cargo.toml --verbose -- --test
144+
# Setup and run tests on the wasm32-wasi target via wasmtime.
145+
wasm:
146+
runs-on: ubuntu-latest
147+
env:
148+
# The version of wasmtime to download and install.
149+
WASMTIME_VERSION: 12.0.1
150+
steps:
151+
- name: Checkout repository
152+
uses: actions/checkout@v3
153+
- name: Install Rust
154+
uses: dtolnay/rust-toolchain@master
155+
with:
156+
toolchain: stable
157+
- name: Add wasm32-wasi target
158+
run: rustup target add wasm32-wasi
159+
- name: Download and install Wasmtime
160+
run: |
161+
echo "CARGO_BUILD_TARGET=wasm32-wasi" >> $GITHUB_ENV
162+
echo "RUSTFLAGS=-Ctarget-feature=+simd128" >> $GITHUB_ENV
163+
curl -LO https://github.com/bytecodealliance/wasmtime/releases/download/v$WASMTIME_VERSION/wasmtime-v$WASMTIME_VERSION-x86_64-linux.tar.xz
164+
tar xvf wasmtime-v$WASMTIME_VERSION-x86_64-linux.tar.xz
165+
echo `pwd`/wasmtime-v$WASMTIME_VERSION-x86_64-linux >> $GITHUB_PATH
166+
echo "CARGO_TARGET_WASM32_WASI_RUNNER=wasmtime run --wasm-features simd --" >> $GITHUB_ENV
167+
- name: Basic build
168+
run: cargo build --verbose
169+
- name: Run tests
170+
run: cargo test --verbose
171+
- name: Run with only 'alloc' enabled
172+
run: cargo test --verbose --no-default-features --features alloc
173+
- name: Run tests without any features enabled (core-only)
174+
run: cargo test --verbose --no-default-features
152175

153-
build-for-non_sse-target:
154-
name: build for non-SSE target
176+
# This job uses a custom target file to build the memchr crate on x86-64
177+
# but *without* SSE/AVX target features. This is a somewhat strange
178+
# configuration, but it pops up now and then. Particularly in kernels that
179+
# don't support SSE/AVX registers.
180+
build-for-x86-64-but-non-sse-target:
155181
runs-on: ubuntu-latest
156182
steps:
157183
- name: Checkout repository
@@ -163,25 +189,78 @@ jobs:
163189
components: rust-src
164190
- run: cargo build -Z build-std=core --target=src/tests/x86_64-soft_float.json --verbose --no-default-features
165191

166-
test-with-miri:
167-
name: test with miri
192+
# This job runs a stripped down version of CI to test the MSRV. The specific
193+
# reason for doing this is that dev-dependencies tend to evolve more quickly.
194+
# There isn't as tight of a control on them because, well, they're only used
195+
# in tests and their MSRV doesn't matter as much.
196+
#
197+
# It is a bit unfortunate that our MSRV test is basically just "build it"
198+
# and pass if that works. But usually MSRV is broken by compilation problems
199+
# and not runtime behavior. So this is in practice good enough.
200+
msrv:
168201
runs-on: ubuntu-latest
169202
steps:
170203
- name: Checkout repository
171204
uses: actions/checkout@v3
172205
- name: Install Rust
173206
uses: dtolnay/rust-toolchain@master
174207
with:
208+
toolchain: 1.60.0
209+
- name: Basic build
210+
run: cargo build --verbose
211+
- name: Build docs
212+
run: cargo doc --verbose
213+
214+
# Runs miri on memchr's test suite. This doesn't quite cover everything. Some
215+
# tests (especially quickcheck) are disabled when building with miri because
216+
# of how slow miri runs. But it still gives us decent coverage.
217+
miri:
218+
runs-on: ubuntu-latest
219+
steps:
220+
- name: Checkout repository
221+
uses: actions/checkout@v3
222+
- name: Install Rust
223+
uses: dtolnay/rust-toolchain@master
224+
with:
225+
# We use nightly here so that we can use miri I guess?
175226
toolchain: nightly
176227
components: miri
177-
- name: Show CPU info for debugging
178-
run: lscpu
179-
- run: cargo miri test --verbose
180-
- run: cargo miri test --verbose --no-default-features
181-
- run: cargo miri test --verbose --features libc
228+
- name: Run full test suite
229+
run: cargo miri test --verbose
182230

231+
# Tests that memchr's benchmark suite builds and passes all tests.
232+
rebar:
233+
runs-on: ubuntu-latest
234+
env:
235+
# The version of wasmtime to download and install.
236+
WASMTIME_VERSION: 12.0.1
237+
steps:
238+
- name: Checkout repository
239+
uses: actions/checkout@v3
240+
- name: Install Rust
241+
uses: dtolnay/rust-toolchain@master
242+
with:
243+
toolchain: stable
244+
- name: Add wasm32-wasi target
245+
run: rustup target add wasm32-wasi
246+
- name: Download and install Wasmtime
247+
run: |
248+
# Note that we don't have to set CARGO_BUILD_TARGET and other
249+
# environment variables like we do for the `wasm` job. This is because
250+
# `rebar` knows how to set them itself and only when running the wasm
251+
# engines.
252+
curl -LO https://github.com/bytecodealliance/wasmtime/releases/download/v$WASMTIME_VERSION/wasmtime-v$WASMTIME_VERSION-x86_64-linux.tar.xz
253+
tar xvf wasmtime-v$WASMTIME_VERSION-x86_64-linux.tar.xz
254+
echo `pwd`/wasmtime-v$WASMTIME_VERSION-x86_64-linux >> $GITHUB_PATH
255+
- name: Install rebar
256+
run: cargo install --git https://github.com/BurntSushi/rebar rebar
257+
- name: Build all rebar engines
258+
run: rebar build
259+
- name: Run all benchmarks as tests
260+
run: rebar measure --test
261+
262+
# Tests that everything is formatted correctly.
183263
rustfmt:
184-
name: rustfmt
185264
runs-on: ubuntu-latest
186265
steps:
187266
- name: Checkout repository
@@ -193,7 +272,4 @@ jobs:
193272
components: rustfmt
194273
- name: Check formatting
195274
run: |
196-
cargo fmt -- --check
197-
- name: Check formatting on benchmarks
198-
run: |
199-
cargo fmt --manifest-path bench/Cargo.toml -- --check
275+
cargo fmt --all -- --check

.vim/coc-settings.json

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
11
{
2-
"rust-analyzer.cargo.allFeatures": false
2+
"rust-analyzer.cargo.allFeatures": false,
3+
"rust-analyzer.linkedProjects": [
4+
"benchmarks/engines/libc/Cargo.toml",
5+
"benchmarks/engines/rust-bytecount/Cargo.toml",
6+
"benchmarks/engines/rust-jetscii/Cargo.toml",
7+
"benchmarks/engines/rust-memchr/Cargo.toml",
8+
"benchmarks/engines/rust-memchrold/Cargo.toml",
9+
"benchmarks/engines/rust-sliceslice/Cargo.toml",
10+
"benchmarks/engines/rust-std/Cargo.toml",
11+
"benchmarks/shared/Cargo.toml",
12+
"fuzz/Cargo.toml",
13+
"Cargo.toml"
14+
]
315
}

0 commit comments

Comments
 (0)