Skip to content

SI-8879 fix quadratic reading time in StreamReader #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 14, 2014

Conversation

gourlaysama
Copy link
Contributor

StreamReader.nextEol used to loop all the way to Eol every time an
element was read. That's very costly when lines are long.

Furthermore, it used to call PagedSeq.length, forcing PagedSeq to
load the whole input in memory, even when a single character was read.

nextEol is now saved as part of the state of StreamReader, and is passed
to child readers when created (as long as we do not read past the end of
the line). Thus it computed only once per line, whatever the length.

With the example in the ticket (SI-8879), we get:

  • before:
    User time (seconds): 82.12
    System time (seconds): 0.07
    Elapsed (wall clock) time (h:mm:ss or m:ss): 1:21.52
  • after:
    User time (seconds): 1.05
    System time (seconds): 0.06
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.68
  • for comparison, using PagedSeqReader directly:
    User time (seconds): 1.06
    System time (seconds): 0.06
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.69

isDefinedAt is used instead of length so that pages beyond the
tested index do not need to be read. The test only tests this part.

StreamReader.nextEol used to loop all the way to Eol every time an
element was read. That's very costly when lines are long.

Furthermore, it used to call PagedSeq.length, forcing PagedSeq to
load the whole input in memory, even when a single character was read.

nextEol is now saved as part of the state of StreamReader, and is passed
to child readers when created (as long as we do not read past the end of
the line). Thus it computed only once per line, whatever the length.

With the example in the ticket (SI-8879), we get:

* before:

    User time (seconds): 82.12
    System time (seconds): 0.07
    Elapsed (wall clock) time (h:mm:ss or m:ss): 1:21.52

* after:

    User time (seconds): 1.05
    System time (seconds): 0.06
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.68

* for comparison, using PagedSeqReader directly:

    User time (seconds): 1.06
    System time (seconds): 0.06
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.69

`isDefinedAt` is used instead of `length` so that pages beyond the
tested index do not need to be read. The test only tests this part.
gourlaysama added a commit that referenced this pull request Oct 14, 2014
SI-8879 fix quadratic reading time in StreamReader
@gourlaysama gourlaysama merged commit 45ed52f into scala:master Oct 14, 2014
@gourlaysama gourlaysama deleted the wip/t8879-streamreader branch November 12, 2014 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant