Skip to content

Conversation

@sjakobi
Copy link
Member

@sjakobi sjakobi commented Dec 15, 2025

Addresses #578.


TODO:

  • Check whether anything at all got faster:
    • lookup
    • delete
    • adjust
    • size
  • Check performance of combining functions. Maybe disjoint first.
  • Add (provisional) invariant: Any empty HashMap must be identical with empty.

Comment on lines -471 to -473
-- empty vs. anything
go !_ t1 Empty = t1
go _ Empty t2 = t2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just one of probably many performance bugs:

  • We want to be strict in the Shift argument!
  • Any BmI vs. BmI cases (including empties) are now handled via unionArrayBy, which doesn't include any shortcuts for handling empties yet!

new :: Int -> a -> ST s (MArray s a)
new _n@(I# n#) b =
CHECK_GT("new",_n,(0 :: Int))
CHECK_GE("new",_n,(0 :: Int))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you leave this as it was, and define empty from first principles. We really don't want to create any other empty arrays!

in HM.bitmapIndexedOrFull (b .|. m) ary'
let !l = leaf h k x
in if b == 0
then l
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmmm.... The extent of special casing is making me skeptical about this whole endeavor, especially if I'm right that we can't safely use pointer equality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this should still improve branch prediction. We've simply moved the Empty alternative from the outer-most "layer" to this inner branch.

However we still check for the empty case at every level of the tree, which is still suboptimal.

What we can do more easily now, is to split off an inner function that doesn't need to check for empties.

Something like:

insertWith f k v Empty = Leaf ...
insertWith f k v m = go ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we could know the branch at the tag check, before loading the heap object. Now we need to wait. How's that improve matters?

@sjakobi sjakobi force-pushed the sjakobi/issue578-empty-repr branch from 8021eb6 to 81fbd2e Compare December 15, 2025 09:13
@sjakobi
Copy link
Member Author

sjakobi commented Dec 15, 2025

So far (at 81fbd2e), the HashMap.Strict.lookup (1000x) benchmark has become a bit faster.

For present keys, the speedups are around 5–10% for size >=10.

For absent keys, there is of course a slowdown of about 10% for the empty map. But the other sizes are faster, especially the smaller ones.

@sjakobi
Copy link
Member Author

sjakobi commented Dec 15, 2025

size also seems to get faster apart from size 0, which is about ~20% slower! :/

size 1 is about 10% faster and for the sizes >= 50, I see a speedup of ~3–8%.


EDIT: After a small refactoring, size is significantly faster, even for sizes 0 and 1 now! f02530b

    $ cabal run fine-grained -- -p size --stdev 1
    All
      HashMap.Strict
        size
          Int
            0:      OK
              2.04 ns ±  22 ps
            1:      OK
              3.01 ns ±  40 ps
            5:      OK
              20.0 ns ± 128 ps
            10:     OK
              45.3 ns ± 568 ps
            50:     OK
              186  ns ± 1.1 ns
            100:    OK
              396  ns ± 4.1 ns
            500:    OK
              1.88 μs ± 5.8 ns
            1000:   OK
              4.59 μs ±  84 ns
            5000:   OK
              28.3 μs ± 483 ns
            10000:  OK
              56.2 μs ± 885 ns
            50000:  OK
              387  μs ± 4.4 μs
            100000: OK
              735  μs ± 4.9 μs
            500000: OK
              5.36 ms ±  51 μs
@sjakobi sjakobi force-pushed the sjakobi/issue578-empty-repr branch from 81fbd2e to f02530b Compare December 15, 2025 15:50
...in order to reduce code size.
Comment on lines +1244 to +1271
(# st #) ->
let !st' = go collPos (nextSH shiftedHash) k st
-- These let-bindings help GHC form join points in order to
-- prevent code duplication.
deletion = BitmapIndexed (b .&. complement m) (A.delete ary i)
update_ = BitmapIndexed b (A.update ary i st')
{-# NOINLINE update_ #-}
in case st' of
Empty | A.length ary == 2
, (# l #) <- A.index# ary (otherOfOneOrZero i)
, isLeafOrCollision l
-> l
| otherwise
-> deletion
_ | isLeafOrCollision st' && A.length ary == 1 -> st'
| otherwise -> update_
where m = maskSH shiftedHash
i = sparseIndex b m
go collPos shiftedHash k (Full ary) =
case A.index# ary i of
(# st #) -> case go collPos (nextSH shiftedHash) k st of
Empty ->
(# st #) ->
let !st' = go collPos (nextSH shiftedHash) k st
-- This let-binding helps GHC form a join point in order to
-- prevent code duplication.
update_ = Full (updateFullArray ary i st')
{-# NOINLINE update_ #-}
in if null st'
then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code duplication issue is pretty annoying. Apparently, when a constructor check is replaced by a constructor check and an additional field comparison (bitmap == 0), GHC tends to duplicate the code for the fallthrough case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative way to handle this, would be to change the return type of go to (# (# #) | HashMap k v #), where the "Nothing"-case represents an empty map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants