Multiple entries with intuitively equal keys in PackratReader cache #45

tom--k · 2015-02-04T12:32:23Z

Even though we use PackratParsers, we've observed exponential parsing time on one of our bit more complicated inputs. It turned out that strange entries get into PackratReader cache; it contained more than one entry for seemingly equal (Parser,Position) pairs.

We've fixed this with a workaround which implements equals method in the OffsetPosition case class so that it does not compare CharSequences. Another working option which seems to be more safe is to compare toString representations of the CharSequences, but that introduces quite significant performance penalty.

The text was updated successfully, but these errors were encountered:

liskin · 2015-08-03T22:16:32Z

The problem is that we used PagedSeqReader and it's not exactly usable with Packrat. Whenever it's constructed, source is assigned seq but since PagedSeq is not a subclass of java.lang.CharSequence, an implicit conversion takes place via Predef#SeqCharSequence, creating a new object. The problem is that this happens every time PagedSeqReader#rest is called, breaking packrat caching entirely.

Luckily PagedSeqReader seems to be entirely replaceable by CharSequenceReader, so I think I'll submit a pull request that deprecates PagedSeqReader and makes it an equivalent of CharSequenceReader. Any objections?

liskin · 2015-08-04T14:17:21Z

Well, seems like it's not entirely replaceable as SeqCharSequence calls length and forces the PagedSeq out of laziness. I submitted a different fix then.

SethTisue · 2015-08-05T18:24:19Z

Could there be test coverage for this?

liskin · 2015-08-06T07:56:19Z

I can surely write a test that checks for rd.source == rd.rest.source and likewise for drop, but is that a good test? Any idea how to test this more thoroughly?

SethTisue · 2015-08-07T18:26:08Z

I don't know. Maybe @gourlaysama wants to weigh in? If you don't see a good way to test it, I'm not opposed to merging the change anyway.

liskin · 2015-08-07T21:11:28Z

I've given it some more thought and I could probably create a parser that fails when applied twice. It wouldn't necessarily be a more thorough test in terms of coverage -- it may not test both rest and drop methods, but it would test that it behaves as intended. I'll give it a try tomorrow.

liskin · 2015-08-08T10:02:25Z

This is the best I can do, I think. I hope it's good enough. :-)

SethTisue · 2015-08-24T20:17:28Z

@gourlaysama ? (maybe he's on a lengthy European-style vacation...)

gourlaysama · 2015-09-09T21:18:49Z

Fixed by #65.

liskin mentioned this issue Aug 4, 2015

Fix packrat caching with PagedSeqReader #65

Merged

gourlaysama closed this as completed Sep 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple entries with intuitively equal keys in PackratReader cache #45

Multiple entries with intuitively equal keys in PackratReader cache #45

tom--k commented Feb 4, 2015

liskin commented Aug 3, 2015

liskin commented Aug 4, 2015

SethTisue commented Aug 5, 2015

liskin commented Aug 6, 2015

SethTisue commented Aug 7, 2015

liskin commented Aug 7, 2015

liskin commented Aug 8, 2015

SethTisue commented Aug 24, 2015

gourlaysama commented Sep 9, 2015

Multiple entries with intuitively equal keys in PackratReader cache #45

Multiple entries with intuitively equal keys in PackratReader cache #45

Comments

tom--k commented Feb 4, 2015

liskin commented Aug 3, 2015

liskin commented Aug 4, 2015

SethTisue commented Aug 5, 2015

liskin commented Aug 6, 2015

SethTisue commented Aug 7, 2015

liskin commented Aug 7, 2015

liskin commented Aug 8, 2015

SethTisue commented Aug 24, 2015

gourlaysama commented Sep 9, 2015