Skip to content

Multiple entries with intuitively equal keys in PackratReader cache #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tom--k opened this issue Feb 4, 2015 · 9 comments
Closed

Comments

@tom--k
Copy link

tom--k commented Feb 4, 2015

Even though we use PackratParsers, we've observed exponential parsing time on one of our bit more complicated inputs. It turned out that strange entries get into PackratReader cache; it contained more than one entry for seemingly equal (Parser,Position) pairs.

We've fixed this with a workaround which implements equals method in the OffsetPosition case class so that it does not compare CharSequences. Another working option which seems to be more safe is to compare toString representations of the CharSequences, but that introduces quite significant performance penalty.

@liskin
Copy link
Contributor

liskin commented Aug 3, 2015

The problem is that we used PagedSeqReader and it's not exactly usable with Packrat. Whenever it's constructed, source is assigned seq but since PagedSeq is not a subclass of java.lang.CharSequence, an implicit conversion takes place via Predef#SeqCharSequence, creating a new object. The problem is that this happens every time PagedSeqReader#rest is called, breaking packrat caching entirely.

Luckily PagedSeqReader seems to be entirely replaceable by CharSequenceReader, so I think I'll submit a pull request that deprecates PagedSeqReader and makes it an equivalent of CharSequenceReader. Any objections?

@liskin
Copy link
Contributor

liskin commented Aug 4, 2015

Well, seems like it's not entirely replaceable as SeqCharSequence calls length and forces the PagedSeq out of laziness. I submitted a different fix then.

@SethTisue
Copy link
Member

Could there be test coverage for this?

@liskin
Copy link
Contributor

liskin commented Aug 6, 2015

I can surely write a test that checks for rd.source == rd.rest.source and likewise for drop, but is that a good test? Any idea how to test this more thoroughly?

@SethTisue
Copy link
Member

I don't know. Maybe @gourlaysama wants to weigh in? If you don't see a good way to test it, I'm not opposed to merging the change anyway.

@liskin
Copy link
Contributor

liskin commented Aug 7, 2015

I've given it some more thought and I could probably create a parser that fails when applied twice. It wouldn't necessarily be a more thorough test in terms of coverage -- it may not test both rest and drop methods, but it would test that it behaves as intended. I'll give it a try tomorrow.

@liskin
Copy link
Contributor

liskin commented Aug 8, 2015

This is the best I can do, I think. I hope it's good enough. :-)

@SethTisue
Copy link
Member

@gourlaysama ? (maybe he's on a lengthy European-style vacation...)

@gourlaysama
Copy link
Contributor

Fixed by #65.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants