Skip to content

Rewrite lexer and parser#146

Open
Schamper wants to merge 1 commit intomainfrom
rewrite-parser
Open

Rewrite lexer and parser#146
Schamper wants to merge 1 commit intomainfrom
rewrite-parser

Conversation

@Schamper
Copy link
Member

@Schamper Schamper commented Mar 3, 2026

Closes #85, partially #142, and will make #86 and #138 a lot easier to implement.

This PR will (finally) replace the shoddy C syntax parser I originally wrote many moons ago, when I discovered the existence of re.Scanner and ran with it. This PR aims to add a somewhat decent lexer and separate parser. I'm still not a compsci 1337coder, so this is just what I came up with (with some help) and definitely not a textbook implementation. All feedback is welcome.

  • New lexer
  • New C syntax parser that utilizes the new lexer
  • Expression parser re-uses the new lexer
  • Reworked how sizeof works in the expression parser, and added offsetof

The new parser has made changing parsing behavior a lot easier. As such, this PR already makes the following changes:

  • The new parser is slightly stricter, requiring proper semicolon endings for example. We'll need to fix this in any dissect code that has this.

  • An important semantic change is how named nested structures are handled. In my infinite wisdom, I originally figured that named nested structures do not "exist" in the top level scope. That's not true, so now named nested structures get properly registered with the cstruct instance:

struct a {
    struct b {
        ...
    };
};

// Will register both `a` and `b`
  • Another important change is how we deal with struct { ... } name;. We used to parse this first as an anonymous struct, then capture name as the structure type name. That's not strictly correct, name is a variable of an anonymous unnamed struct, so we now treat it as such. We don't error on this, but rather we silently ignore name and skip until we reach a ;
  • typedef enum ... is now allowed
  • Probably some other things I'm forgetting

This probably warrants a major version bump, so maybe good to pair this with #114, #144 and what we discussed in #142.

@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 0% with 740 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (0e27409) to head (5f43faa).

Files with missing lines Patch % Lines
dissect/cstruct/lexer.py 0.00% 354 Missing ⚠️
dissect/cstruct/parser.py 0.00% 275 Missing ⚠️
dissect/cstruct/expression.py 0.00% 95 Missing ⚠️
dissect/cstruct/utils.py 0.00% 12 Missing ⚠️
dissect/cstruct/cstruct.py 0.00% 2 Missing ⚠️
dissect/cstruct/exceptions.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main    #146   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files         21      22    +1     
  Lines       2435    2526   +91     
=====================================
- Misses      2435    2526   +91     
Flag Coverage Δ
unittests 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 3, 2026

Merging this PR will degrade performance by 32.62%

⚡ 3 improved benchmarks
❌ 1 regressed benchmark
✅ 8 untouched benchmarks
🆕 2 new benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_benchmark_expression_parse_and_evaluate 393.3 µs 270 µs +45.68%
test_benchmark_expression_evaluate 82.2 µs 122 µs -32.62%
test_benchmark_expression_parse 347.5 µs 192.3 µs +80.7%
🆕 test_benchmark_parser N/A 10.7 ms N/A
test_benchmark_lexer_and_parser 15.7 ms 12.2 ms +28.8%
🆕 test_benchmark_lexer N/A 1.5 ms N/A

Comparing rewrite-parser (5f43faa) with main (0e27409)

Open in CodSpeed

@Schamper
Copy link
Member Author

Schamper commented Mar 3, 2026

@sMezaOrellana would be interested in your thoughts on these changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect parsing of multiple declarators in a single declaration

1 participant