Skip to content

Conversation

@l0ngvh
Copy link
Contributor

@l0ngvh l0ngvh commented May 3, 2025

Summary

Allow the YAML lexer to lex plain token.

Test Plan

Added new tests for the lexer.

@github-actions github-actions bot added the A-Parser Area: parser label May 3, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented May 3, 2025

CodSpeed Performance Report

Merging #5870 will not alter performance

Comparing vohoanglong0107:lex_plain_token_with_context (50d8ce3) with main (14551ba)

Summary

✅ 95 untouched benchmarks

@l0ngvh l0ngvh force-pushed the lex_plain_token_with_context branch from d324c06 to 9627d10 Compare May 3, 2025 15:32
@l0ngvh l0ngvh marked this pull request as ready for review May 3, 2025 23:35
@l0ngvh l0ngvh requested review from a team May 3, 2025 23:36
Copy link
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love to see work on yaml! While I have mostly compliments, I do have some things that need to be addressed.

Comment on lines +361 to +374
// https://yaml.org/spec/1.2.2/#rule-ns-char
fn is_non_space_char(c: u8) -> bool {
!is_space(c) && !is_break(c)
}

// https://yaml.org/spec/1.2.2/#rule-s-white
fn is_space(c: u8) -> bool {
c == b' ' || c == b'\t'
}

// https://yaml.org/spec/1.2.2/#rule-b-char
fn is_break(c: u8) -> bool {
c == b'\n' || c == b'\r'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the attention to detail here with the spec links. It's nice that these things are well defined.

Comment on lines 331 to 339
// Inside block key context
BlockKey,
// Outside flow context
FlowOut,
// Inside flow context
FlowIn,
// Inside flow key context
FlowKey,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be doc comments, and these should have some code samples to visually explain what these mean.

// and make sure the tokens yielded are fully lossless and the source can be reconstructed from only the tokens
macro_rules! assert_lex {
($src:expr, $($kind:ident:$len:expr $(,)?)*) => {{
($context:expr, $src:expr, $($kind:ident:$len:expr $(,)?)*) => {{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you've taken inspiration from the HTML lexer tests. :)

}

fn next_non_trivia_token(&mut self, first_token: bool) {
fn next_non_trivia_token(&mut self, first_token: bool, context: YamlLexContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the equivalent for this in the html parser has these arguments reversed: context: HtmlLexContext, first_token: bool and it would be better if we stay consistent. (so if the html parser is actually the one at fault we would change it to match the other languages, of course)

Comment on lines 76 to 80
fn bump(&mut self) {
if self.current() != EOF {
self.next_non_trivia_token(false)
self.next_non_trivia_token(false, YamlLexContext::Regular)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really related to this PR, but this should be a call to bump_with_context

fn bump(&mut self) {
    self.bump_with_context(HtmlLexContext::Regular)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I have another PR in store which implements BumpWithContext for the Token Source. Let me address this then.

Comment on lines 375 to 376

fn is_indicator(c: u8) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn is_indicator(c: u8) -> bool {
// https://yaml.org/spec/1.2.2/#53-indicator-characters
fn is_indicator(c: u8) -> bool {

Would it be useful to add the YAML spec link here as well?

@l0ngvh l0ngvh requested a review from dyc3 May 4, 2025 05:56
@l0ngvh l0ngvh merged commit 345d2dc into biomejs:main May 4, 2025
13 checks passed
@l0ngvh l0ngvh deleted the lex_plain_token_with_context branch May 5, 2025 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants