Skip to content

Improve Int{Map,Set} construction from asc lists #1123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 23, 2025

Conversation

meooow25
Copy link
Contributor

@meooow25 meooow25 commented Mar 16, 2025

  • Switch to an explicit stack based approach, like we had before 77c8e5f, and like we have for Set and Map today.
  • This is a little faster (-13%) on in-memory lists, but much faster (-43%) when list fusion applies because we are now a good consumer. For a dense IntSet in the fusion case it is greatly faster (-81%).
  • Avoid a case of undefined behavior in IntMap.fromDistinctAscList due to calling branchMask on equal keys if the caller did not follow the distinct precondition.
  • Add property tests for the functions

Closes #951


Benchmarks on GHC 9.10.1:

IntSet:

Name                         Time - - - - - - - -    Allocated - - - - -
                                  A       B     %         A       B     %
fromAscList                   16 μs   17 μs   +0%    3.6 KB  5.6 KB  +56%
fromAscList:fusion            25 μs  4.7 μs  -81%    292 KB  5.5 KB  -98%
fromAscList:sparse            44 μs   38 μs  -13%    224 KB  352 KB  +57%
fromAscList:sparse:fusion     63 μs   35 μs  -43%    512 KB  352 KB  -31%

IntMap:

Name                  Time - - - - - - - -    Allocated - - - - -
                           A       B     %         A       B     %
fromAscList            47 μs   39 μs  -17%    224 KB  352 KB  +57%
fromAscList:fusion     65 μs   31 μs  -52%    608 KB  352 KB  -42%

@meooow25
Copy link
Contributor Author

meooow25 commented Mar 16, 2025

That allocated number for IntSet is strangely high. Logically, the fusion and non-fusion cases should have the same allocations, like IntMap does. I'll check what's happening.

Edit: Updated the description so preserving the old results here:

Name                         Time - - - - - - - -    Allocated - - - - -
                                  A       B     %         A       B     %
fromAscList                   16 μs   16 μs   +0%    3.6 KB  134 KB  +3648%
fromAscList:fusion            25 μs  4.7 μs  -81%    292 KB  5.5 KB  -98%
fromAscList:sparse            44 μs   45 μs   +2%    224 KB  480 KB  +114%
fromAscList:sparse:fusion     63 μs   35 μs  -44%    512 KB  352 KB  -31%

@meooow25
Copy link
Contributor Author

So here's what's happening.

First, fromAscList in Data.IntSet.Internal is perfectly fine.

Show Core
Rec {
-- RHS size: {terms: 68, types: 16, coercions: 0, joins: 0/1}
$sgo1_r4Xcz :: Int# -> Word# -> Stack -> [Int] -> IntSet
[GblId[StrictWorker([~, ~, !, !])],
 Arity=4,
 Str=<L><L><1L><1L>,
 Unf=OtherCon []]
$sgo1_r4Xcz
  = \ (sc_s4X4z :: Int#)
      (sc1_s4X4A :: Word#)
      (sc2_s4X4B :: Stack)
      (sc3_s4X4y :: [Int]) ->
      case sc2_s4X4B of sc4_Xh { __DEFAULT ->
      case sc3_s4X4y of {
        [] -> $wascLinkStack sc4_Xh sc_s4X4z (Tip sc_s4X4z sc1_s4X4A);
        : y_a6MK ys_a6ML ->
          case y_a6MK of { I# ipv_s4VSn ->
          let {
            py_s4VYU :: Int#
            [LclId]
            py_s4VYU = andI# ipv_s4VSn -64# } in
          case ==# sc_s4X4z py_s4VYU of {
            __DEFAULT ->
              case $wascLinkTop
                     sc4_Xh
                     sc_s4X4z
                     (Tip sc_s4X4z sc1_s4X4A)
                     (uncheckedIShiftL#
                        1#
                        (-# 63# (word2Int# (clz# (int2Word# (xorI# sc_s4X4z py_s4VYU))))))
              of conrep_a4RRL
              { __DEFAULT ->
              $sgo1_r4Xcz
                py_s4VYU
                (uncheckedShiftL# 1## (andI# ipv_s4VSn 63#))
                conrep_a4RRL
                ys_a6ML
              };
            1# ->
              $sgo1_r4Xcz
                py_s4VYU
                (or# sc1_s4X4A (uncheckedShiftL# 1## (andI# ipv_s4VSn 63#)))
                sc4_Xh
                ys_a6ML
          }
          }
      }
      }
end Rec }

-- RHS size: {terms: 20, types: 9, coercions: 0, joins: 0/0}
fromAscList [InlPrag=INLINE (sat-args=1)] :: [Key] -> IntSet
[GblId,
 Arity=1,
 Str=<1L>,
 Unf=Unf{Src=StableUser, TopLvl=True,
         Value=True, ConLike=True, WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=False,boring_ok=False)
         Tmpl= \ (xs_a4RLr [Occ=Once1] :: [Key]) ->
                 ...<unfolding omitted>...}]
fromAscList
  = \ (xs_a4RLr :: [Key]) ->
      case xs_a4RLr of {
        [] -> Nil;
        : y_a6MK ys_a6ML ->
          case y_a6MK of { I# ipv_s4VSn ->
          $sgo1_r4Xcz
            (andI# ipv_s4VSn -64#)
            (uncheckedShiftL# 1## (andI# ipv_s4VSn 63#))
            Nada
            ys_a6ML
          }
      }

But the one in the benchmark loop isn't as good.

Show Core
Rec {
-- RHS size: {terms: 105, types: 43, coercions: 0, joins: 0/2}
main_$s$wfuncToBenchLoop30 [Occ=LoopBreaker]
  :: [Key] -> Word64# -> State# RealWorld -> State# RealWorld
[GblId, Arity=3, Str=<L><1L><L>, Unf=OtherCon []]
main_$s$wfuncToBenchLoop30
  = \ (x_s7th :: [Key])
      (ww_s7tk :: Word64#)
      (eta_s7tm [OS=OneShot] :: State# RealWorld) ->
      case ww_s7tk of wild_X1G {
        __DEFAULT ->
          case seq#
                 @IntSet
                 @RealWorld
                 (letrec {
                    go15_X1H [Occ=LoopBreaker, Dmd=SC(S,C(1,L))]
                      :: [Int] -> MonoState -> MonoState
                    [LclId, Arity=2, Str=<1L><1L>, Unf=OtherCon []]
                    go15_X1H
                      = \ (ds_i7jd :: [Int]) (eta5_B0 :: MonoState) ->
                          case ds_i7jd of {
                            [] -> eta5_B0;
                            : y_i7jg ys_i7jh ->
                              case eta5_B0 of z_a5rC { __DEFAULT ->
                              go15_X1H
                                ys_i7jh
                                (case y_i7jg of { I# ipv_a5rF ->
                                 let {
                                   py_a5rE :: Int#
                                   [LclId]
                                   py_a5rE = andI# ipv_a5rF -64# } in
                                 case z_a5rC of {
                                   MSNada ->
                                     MSPush
                                       py_a5rE (uncheckedShiftL# 1## (andI# ipv_a5rF 63#)) Nada;
                                   MSPush bx_a5rV bx1_a5rW stk_a5rX ->
                                     case ==# bx_a5rV py_a5rE of {
                                       __DEFAULT ->
                                         case $wascLinkTop
                                                stk_a5rX
                                                bx_a5rV
                                                (Tip bx_a5rV bx1_a5rW)
                                                (uncheckedIShiftL#
                                                   1#
                                                   (-#
                                                      63#
                                                      (word2Int#
                                                         (clz#
                                                            (int2Word# (xorI# bx_a5rV py_a5rE))))))
                                         of conrep_a5rM
                                         { __DEFAULT ->
                                         MSPush
                                           py_a5rE
                                           (uncheckedShiftL# 1## (andI# ipv_a5rF 63#))
                                           conrep_a5rM
                                         };
                                       1# ->
                                         MSPush
                                           py_a5rE
                                           (or#
                                              bx1_a5rW (uncheckedShiftL# 1## (andI# ipv_a5rF 63#)))
                                           stk_a5rX
                                     }
                                 }
                                 })
                              }
                          }; } in
                  case go15_X1H x_s7th MSNada of {
                    MSNada -> Nil;
                    MSPush bx_a5s4 bx1_a5s5 stk_a5s6 ->
                      $wascLinkStack stk_a5s6 bx_a5s4 (Tip bx_a5s4 bx1_a5s5)
                  })
                 eta_s7tm
          of
          { (# ipv_i5QE, ipv1_i5QF #) ->
          main_$s$wfuncToBenchLoop30
            x_s7th (subWord64# wild_X1G 1#Word64) ipv_i5QE
          };
        0#Word64 -> eta_s7tm
      }
end Rec }

The MonoState has stuck around and causes all the allocations. No matter what I tweak in the original code, I cannot get SpecConstr to act on the go loop and get rid of it.

Then I realized that fromAscList is inlined inside the benchmark loop. Looking at the code in tasty-bench, it seems that SpecConstr specializes the benchmark loop for fromAscList and inlines it, but for some reason fails to optimize it. So maybe this deserves a GHC issue.

But wait, why does tasty-bench specialize the loop at all? This doesn't seem desirable for benchmarking, you would instead want it to not specialize, and apply the unknown function to the unknown argument every iteration. I'll open an issue on tasty-bench to see if there's a reason for this.

@treeowl
Copy link
Contributor

treeowl commented Mar 17, 2025

Double wacky.

@treeowl
Copy link
Contributor

treeowl commented Mar 17, 2025

Did you play around with GHC.Exts.SPEC, by any chance? Be sure to pattern match on it if you use it.

@meooow25
Copy link
Contributor Author

Yes, it did not help sadly. I've made it NOINLINE in the benchmarks for now.

tasty-bench issue: Bodigrim/tasty-bench#66
GHC issue: GHC #25878

@meooow25
Copy link
Contributor Author

@treeowl any comments?

@meooow25 meooow25 force-pushed the intmap-from-asc-list branch 3 times, most recently from af7981f to 418f96a Compare March 23, 2025 06:17
* Switch to an explicit stack based approach, like we had before 77c8e5f,
  and like we have for Set and Map today.
* This is a little faster (-13%) on in-memory lists, but much faster
  (-43%) when list fusion applies because we are now a good consumer.
  For a dense IntSet in the fusion case it is greatly faster (-81%).
* Avoid a case of undefined behavior in IntMap.fromDistinctAscList due to
  calling branchMask on equal keys if the caller did not follow the
  distinct precondition.
* Add property tests for the functions
@meooow25 meooow25 force-pushed the intmap-from-asc-list branch from 418f96a to 319d0b9 Compare March 23, 2025 06:33
@meooow25
Copy link
Contributor Author

Alright, merging.

@meooow25 meooow25 merged commit f602764 into haskell:master Mar 23, 2025
13 checks passed
@meooow25 meooow25 deleted the intmap-from-asc-list branch March 23, 2025 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fusible IntSet.fromDistinctAscList definition
2 participants