Skip to content

Crashes during escaped Unicode surrogate pairs parsing #855

Closed
@RazrFalcon

Description

@RazrFalcon
> ruby-parse -v
ruby-parse based on parser version 3.1.2.0

> ruby-parse --32 -E -e '"\\u{D800}"'
Failed on: (fragment:0)
/Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17506:in `block in advance'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17494:in `each'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17494:in `advance'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/lexer/explanation.rb:19:in `advance'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/base.rb:252:in `next_token'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/racc/parser.rb:259:in `_racc_do_parse_c'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/racc/parser.rb:259:in `do_parse'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/base.rb:190:in `parse'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner/ruby_parse.rb:141:in `process'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:254:in `process_buffer'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:231:in `block in process_fragments'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:225:in `each'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:225:in `each_with_index'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:225:in `process_fragments'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:215:in `block in process_all_input'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/benchmark.rb:293:in `measure'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:214:in `process_all_input'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner/ruby_parse.rb:137:in `process_all_input'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:35:in `execute'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/lib/parser/runner.rb:13:in `go'
	from /Library/Ruby/Gems/2.6.0/gems/parser-3.1.2.0/bin/ruby-parse:7:in `<top (required)>'
	from /usr/local/bin/ruby-parse:23:in `load'
	from /usr/local/bin/ruby-parse:23:in `<main>'

> ruby -v
ruby 2.6.8p205 (2021-07-07 revision 67951) [universal.arm64e-darwin21]

> ruby -e '"\\u{D800}"'
-e:1: invalid Unicode codepoint
"\u{D800}"

I would assume that U+D800...U+DFFF should be ignored.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions