v0.45.0
Welcome to v0.45.0! This is a big update, much of them part of from rc.1 from last week. More context on the size of the update in the changelog there.
The biggest library user-facing changes are ergonomic: Node<'a> instead of &'a AstNode<'a>, is nice, and so likewise node.data() instead of node.data.borrow(). They're small, but I appreciate them a lot in my own work.
You'll also notice more bovine creatures in the Comrak pasture: there's a few Cow<str> instead of String, such as in NodeValue::Text. At most an extra .into() will be required; take note if you use any 'static str, as they'll no longer need to be heap-allocated. Some Boxes have been added, too, to reduce the size of every NodeValue. Let the types guide you.
Other than this, the options have been put in their own module (comrak::options), and a lot of things generally cleaned up. Read below for all the deets! Here's the final performance comparison to v0.44.0 on aarch64:
Benchmark 1: ./bench.sh ./comrak-0.44.0
Time (mean ± σ): 88.1 ms ± 1.9 ms [User: 71.2 ms, System: 17.8 ms]
Range (min … max): 86.2 ms … 93.2 ms 31 runs
Benchmark 2: ./bench.sh ./comrak-0.45.0
Time (mean ± σ): 67.0 ms ± 1.2 ms [User: 51.2 ms, System: 17.0 ms]
Range (min … max): 65.2 ms … 70.0 ms 42 runs
Summary
./bench.sh ./comrak-0.45.0 ran
1.32 ± 0.04 times faster than ./bench.sh ./comrak-0.44.0
Be well!
Parser changes:
- Runs of more than two
~are no longer recognised as valid delimiters, meaning they will not prevent strikethrough recognition when they occur within correct delimiters. See the PR for discussion. (by @miketheman in #635)- This does not impact spec compatibility, matches
cmark-gfm, and follows the intent of the original implementation and implementor (hi!).
- This does not impact spec compatibility, matches
Changed APIs:
r#unsafeis used instead ofunsafe_. (by @kivikakk in #640)--gemojisis renamed to--gemoji. (by @kivikakk in #641)NodeValue::Textnow contains aCow<'static, str>instead of aString. This is a pretty major change, but means we can now create text nodes with static content without duplicating the string on the heap. This particularly benefits smart quotes and HTML entity resolution. (by @kivikakk in #627)- Adapting to this change usually means nothing on the read-only side (you can use it as a
&strwithout issues); to write in-place, use.to_mut()on theCowto get a&mut String. To assign, use.into()on a&strorString, likeNodeValue::Text("moo".into()). NodeValue::text()now returns a&str. It used to return a&String(!).NodeValue::text_mut()now returns a&mut Cow<'static, str>, instead of a&mut String. This permits writing a borrowed reference.- I am experimenting with parameterising the lifetime on the
Cow; it'd be amazing to refer continuously to the input where possible.
- Adapting to this change usually means nothing on the read-only side (you can use it as a
NodeValue'sCodeBlock,Table,Link,Image,ShortCodeandAlertvariants' payloads are now boxed. (by @kivikakk in #632)- Adapting to this change usually means adding a
Box::newcall when constructing these nodes, and on matches, pulling the box out and then just dereferencing it directly (e.g.NodeValue::Table(nt) => &nt.alignmentsinstead ofNodeValue::Table(NodeTable { ref alignments }). - These payloads were larger than average, increasing the size of every node considerably. The changes reduce an
Astto 128 bytes, and a fullAstNode<'_>to 176 bytes. - This produces a performance sweet spot: boxing the whole
NodeValueresults in worse performance than doing nothing at all. This change appreciably improves matters. - We now assert the size of a node during build to ensure future payload changes don't increase the total size of an
Ast.
- Adapting to this change usually means adding a
- Options now live in
comrak::options. Structs have been renamed to removeOptionsfrom their name:comrak::RenderOptionsis nowcomrak::options::Render, etc. The old names are marked deprecated. (@kivikakk in #636)- Traits cannot be aliased yet :(
URLRewriterandBrokenLinkCallbackhave been moved, without a deprecation period.
- Traits cannot be aliased yet :(
SyntaxHighlighterAdapter'sattributesarguments now takeHashMap<&'static str, Cow<'s, str>>; they used to takeHashMap<String, String>. (by @kivikakk in #633)html::write_opening_tagcan now take differentAsRef<str>types for the attribute key and value.parse_document_with_broken_link_callbackhas been removed! This entrypoint has been deprecated since 0.25.0. (by @kivikakk in #623)options.render.ignore_setextwas moved tooptions.parse.ignore_setext, as its effect takes place only in the parse stage. (by @kivikakk in #623)nodes::can_contain_typeis nowNode::can_contain_type. (by @kivikakk in #625)
New APIs:
node.data()andnode.data_mut()are added as short-hand fornode.data.borrow()andnode.data.borrow_mut()respectively. (by @kivikakk in #643)comrak::nodes::Node<'a>is introduced as an alias for&'a comrak::nodes::AstNode<'a>. (by @kivikakk in #627)options.parse.tasklist_in_tableadded: parse a tasklist item if it's the only content of a table cell. (by @kivikakk in #622)
Performance:
- Inline content is transferred to Text nodes without copying where possible. (by @kivikakk in #642).
- Have you looked at your 7 year old code lately? A detail in the C-to-Rust translation meant essentially every line of input was being copied completely unnecessarily at the very beginning of the line processing stage. This no longer happens. We regret the error. (by @kivikakk in #629)
- Preprocess entity data at build-time so we don't spend time doing a linear search over an unsorted array, some of which we will never match. (by @kivikakk in #631)
- Inline content is consumed by the inline processor, instead of being borrowed by it and retained in memory indefinitely. (by @kivikakk in #631)
- Don't try to do better than the stdlib at guessing buffer sizes; it's very good at it. (by @kivikakk in #626)
- Use
strinternally in block and inline processing, eliminating many UTF-8 rechecks. Thestringsmodule actually operates on strings now. (by @kivikakk in #626) - Many, many needless clones have been removed in almost every subsystem.
Dependency updates:
memchrremoved fromCargo.toml; it wasn't used directly, though it still is included unconditionally due tocaseless. (by @kivikakk in #630)slugis moved to a development-only dependency; it's only used in an example. (by @kivikakk in #630)jetsciiis added for faster string searching, including SIMD on x86_64. (by @kivikakk in #630)- I'm experimenting with aarch64 SIMD.
Documentation:
- The CLI help text has been copy-edited to a consistent style. (by @kivikakk in #641)
- The
READMEexample code is updated to build with recent API changes. (by @kivikakk in #621)
Build changes:
shortcodesis enabled by default (but still optional) for CLI builds. (by @kivikakk in #641)syntectis now optional (but still default) in CLI builds. (by @kivikakk in #624)
Behind the scenes:
- Much of the block parser code has been re-organised, and many C-isms from the original port have been refactored into readable Rust. (by @kivikakk in #627)
- Likewise the inline parser has been re-organised. (by @kivikakk in #644)
- All
unsafeblocks now have aSAFETYcomment describing why their actions are safe.
New Contributors
- @miketheman made their first contribution in #635
Diff: v0.44.0...v0.45.0