Skip to content

Various small optimizations #9605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 21, 2020
Merged

Conversation

odersky
Copy link
Contributor

@odersky odersky commented Aug 20, 2020

Each of these will not change the needle much but they don't make the code more complicated either. So it is just attention to detail.

Thsi gets Context size down to 72 bytes from 80, assuming a 12 byte header and 8 byte rounding.
In the type/* benchmark this gives a saving of ~700K contexts * 8 bytes = 5.4M vs a store allocation increase of
11K * ~90bytes -= 1M (approx).
It's fairly hot code, and eliminating the closure also
avoids Int boxing.
The previously optimized apply function was not tail recursive since
it was not final.
@odersky
Copy link
Contributor Author

odersky commented Aug 20, 2020

test performance please

@dottybot
Copy link
Member

performance test scheduled: 13 job(s) in queue, 1 running.

Copy link
Contributor

@liufengyun liufengyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

def fold(x: X, trees: List[Tree]): X = trees match
case tree :: rest => fold(apply(x, tree), rest)
case Nil => x
fold(x, trees)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ee4b125, I'm trying:

        var acc = x
        var list = trees
        while (!list.isEmpty) do
          acc = apply(acc, list.head)
          list = list.tail
        acc

After tail-call optimization, the two versions are almost the same. Let's see if there is difference in benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there will be. If anything, the tail recursive version should be faster since it does a single type test per iteration instead of one each in isEmpty, head, and tail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #9565, we get a slight speedup for Dotty.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/9605/ to see the changes.

Benchmarks is based on merging with master (f2018f0)

@odersky
Copy link
Contributor Author

odersky commented Aug 20, 2020

test performance please

@dottybot
Copy link
Member

performance test scheduled: 15 job(s) in queue, 1 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/9605/ to see the changes.

Benchmarks is based on merging with master (1431be2)

@odersky odersky merged commit d50bd76 into scala:master Aug 21, 2020
@odersky odersky deleted the various-ops-1 branch August 21, 2020 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants