Use custom hash table as node lookup table #11

rintaro · 2018-09-14T09:32:31Z

Attempt to address the problem raised in #3 .

Use Set<T> like custom hash table that holds weak references for RawSyntax nodes.
This hash table doesn't holds ids separately, instead, use T.id via Identifiable protocol.
So freeing RawSyntax turns the bucket in the table into just nil, so it's reusable for referencing another object.

rintaro · 2018-09-14T09:33:13Z

@ahoppen @nkcsgexi
What do you think?

ahoppen · 2018-09-14T15:11:15Z

I haven’t looked at the implementation in detail yet but the idea sounds great. Thanks for addressing this.

nkcsgexi · 2018-09-14T16:56:37Z

It's a great idea to use HashTable! My concern is why do we need to maintain our own implementation of it? can we define extensions on some existing stdlib or Foundation types to achieve the same goal?

nkcsgexi · 2018-09-14T16:59:12Z

I guess the custom part is "if the bucket contains nil, feel free to reuse it". I wonder if existing data structures can allow us to hack it.

rintaro · 2018-09-14T17:25:15Z

Generally, erasing element in open addressing hash table needs extra work if any hash collided objects exist. But in this "weakly referencing bucket", we can't do that because we don't know when they become nil. So the lookup logic is very different from normal Set<Element> (see subscript implementation).

I don't think we can reasonably re-use existing data structures.
CC: @lorentey I'd appreciate it if you could give us your opinion about this. Please let me know if you need more context.

lorentey · 2018-09-14T17:48:41Z

Yeah, the stdlib's hash table implementation does not currently support weak keys -- they need modified low-level lookup/insert/removal logic that the stdlib doesn't implement yet. It is not feasible to build a weak-keyed set on top of Set. (However, we should add a WeakSet and multiple WeakDictionary variants to the stdlib at some point!)

Dealloc'd entries need to work like tombstones -- lookups need to ignore them, while insertions can reuse their space, and removals/rehashes can compress them away. Implementation note: it's usually a bad idea to mutate storage on lookup operations -- people expect lookups to be thread-safe, and guaranteeing that could be difficult. (It would also mean that lookups could invalidate indices, which would be highly peculiar.)

In swiftlang/swift#19213, I'm working on a new _HashTable construct to unify low-level hash table operations across Set and Dictionary. (I expect to land that next week, after some additional work.) That struct would simplify creating a WeakSet/WeakDictionary within the standard library, but since it's internal to the stdlib, unfortunately it won't be directly helpful for SwiftSyntax. :(

nkcsgexi · 2018-09-14T18:15:38Z

Thank you for the explanation! @rintaro, @lorentey . If we have to include a non-trivial custom HashTable implementation in SwiftSyntax, i prefer we hold on landing this until we're sure the ever-increasing nodeLookupTable is an actual problem. With that being said, even if it is an actual problem, i prefer a simpler solution.

One potential solution is that server side periodically sends the syntax tree in full with a bit telling the clients to clear the nodeLookupTable all together. Thus we don't need to maintain the table forever in the SwiftSyntax side. What do you think? @ahoppen

nkcsgexi · 2018-09-14T18:16:05Z

I didn't mean to close it. Pushed the wrong button. Sorry!

akyrtzi · 2018-09-14T18:46:52Z

The hashtable implementation doesn't seem to be a huge amount of code and once the stdlib provides a WeakSet we'll just replace it with the stdlib's version and keep the same mode of operation.
If we introduce a different mode of operation, like periodically starting from scratch, and adding the heuristics to figure out when to do that, then this mode of operation should be replaced with the WeakSet model once the stdlib provides it.

Long-term it seems better to me to go with the weak-ref hashtable since it is a mode of operation that will be future-proof.

lorentey · 2018-09-15T10:42:58Z

I forgot to mention that Foundation already provides NSHashTable, which could be exactly what you need!

rintaro · 2018-09-15T16:36:32Z

Implementation note: it's usually a bad idea to mutate storage on lookup operations -- people expect lookups to be thread-safe, and guaranteeing that could be difficult.

Makes sense. I removed mutation part from subscript.

NSHashTable, which could be exactly what you need!

Thank you! I wasn't aware of NSHashTable. It seems corelibs-foundation doesn't have implementation of it though. So, perhaps what we should do is implementing NSHashTable in corelibs-foundation...

We will discuss about this next week.

rintaro · 2018-09-20T00:58:33Z

We discussed about this, and decided to merge this for now to address immediate problem.
I will write unit tests for WeakLookupTable before merging.

…kupTable This way we don't continue to retain RawSyntax nodes that are no longer needed for incremental transfer.

ahoppen

Sorry, I only now found the time the PR in detail.

Overall looking very good. I've got a few minor comments. In particular I'd like to document whenever we are using &+ etc. just because of performance reasons because I usually expect to see a expected overflow whenever I see the unsafe operators. Also, do these actually make a performance difference?

ahoppen · 2018-09-20T11:27:48Z

Sources/SwiftSyntax/SwiftSyntax.swift

+  /// `nodeLookupTable`. Because `nodeLookupTable` only holds a weak reference 
+  /// to the RawSyntax nodes, all retired `RawSyntax` nodes will be deallocated
+  /// once we set a new tree. The weak references in `nodeLookupTable` will then
+  /// become `nil` but will also never be accessed again.


I think this comment needs to be updated for WeakLookupTable

ahoppen · 2018-09-20T11:38:32Z

Sources/SwiftSyntax/WeakLookupTable.swift

+  private static func _bucketCount(for capacity: Int,
+                                   from current: Int = 2) -> Int {
+    // Make sure it's representable.
+    precondition(capacity <= (Int.max >> 1) + 1)


Why can't we have e.g. capacity == Int.max - 1?

Sorry, the condition was wrong.
The condition here is _minimalBucketCount(for: capacity) <= (Int.max >> 1) + 1.
As I added comments, the max bucket count here is 0b0100_0000_... because 0b1000_0000_... is out of bound.

ahoppen · 2018-09-20T11:39:43Z

Sources/SwiftSyntax/WeakLookupTable.swift

+    let minimalBucketCount = _minimalBucketCount(for: capacity)
+    var bucketCount = current
+    while bucketCount < minimalBucketCount {
+      bucketCount &*= 2


We don't expect any overflow here, right? Is the &*= just for performance reasons?

ahoppen · 2018-09-20T11:40:15Z

Sources/SwiftSyntax/WeakLookupTable.swift

+
+  private var _bucketMask: Int {
+    @inline(__always) get {
+      return bucketCount &- 1


Again is this &- just for performance reasons?

ahoppen · 2018-09-20T11:42:46Z

Sources/SwiftSyntax/WeakLookupTable.swift

+
+  /// Finds the bucket where the object with the specified id should be stored
+  /// to.
+  private func _findHole(_ id: Element.Identifier) -> (pos: Int, found: Bool) {


I think alreadyExists would be clearer in the return type than found. found implies that we found a hole which we will always do.

ahoppen · 2018-09-20T11:45:50Z

Sources/SwiftSyntax/WeakLookupTable.swift

+  /// resizing happened.
+  private func _ensurePlusOneCapacity() -> Bool {
+    if bucketCount >= WeakLookupTable<Element>
+                        ._minimalBucketCount(for: estimatedCount &+ 1) {


ahoppen · 2018-09-20T11:46:01Z

Sources/SwiftSyntax/WeakLookupTable.swift

+
+    // Slow path.
+    estimatedCount = _countOccupiedBuckets()
+    return reserveCapacity(estimatedCount &+ 1)


ahoppen · 2018-09-20T11:46:12Z

Sources/SwiftSyntax/WeakLookupTable.swift

+      pos = _findHole(obj.id).pos
+    }
+    buckets[pos].value = obj
+    estimatedCount &+= 1


ahoppen · 2018-09-20T11:53:34Z

Sources/SwiftSyntax/WeakLookupTable.swift

+    // know), we can't stop iteration at a hole. So in the worst case (i.e. if
+    // the object doesn't exist in the table), full linear search is needed.
+    // However, since we assume the object exists and hasn't been freed yet,
+    // we expect it's stored near the 'idealBucket' anyway.


We could think about adding a tombstone property to WeakReference so that we can detect when we found a hole.
I.e.
WeakReference starts of as being {occupied: false, value: nil}. Once a value is set it becomes {occupied: true, value: someValue}. When someValue is freed it becomes {occupied: true, value: nil}. That way we can stop the search whenever we reach a WeakReference with occupied == false and thus avoid the full linear search.

However, in our use case, as you mentioned, we always expect to find the element, I think its worth it to make the trade-off to save an extra bit/byte for a linear worst-case complexity that we should never encounter in practice.

TL;DR: No need to change anything unless we are planning to merge this into corelibs-foundation.

Yeah, I don't intend to merge this into corelibs-foundation. :)
NSHashTable has very different API anyway.

ahoppen · 2018-09-20T11:54:02Z

Sources/SwiftSyntax/WeakLookupTable.swift

+      if let obj = buckets[bucket].value, obj.id == id {
+        return obj
+      }
+      bucket = (bucket &+ 1) & _bucketMask


rintaro · 2018-09-21T05:54:21Z

Thanks for detailed feedback @ahoppen !

In particular I'd like to document whenever we are using &+ etc. just because of performance reasons because I usually expect to see a expected overflow whenever I see the unsafe operators.

All &+ family operations in this PR are only for performance. I added comments to justify the use of them.

Also, do these actually make a performance difference?

Yes, it does affect the performance. I'm not sure how much, but using '&' operation is low-hanging-fruits to minimize the overflow check penalty.

ahoppen · 2018-09-21T06:03:33Z

Sources/SwiftSyntax/WeakLookupTable.swift

+  /// resizing happened.
+  private func _ensurePlusOneCapacity() -> Bool {
+    // '&+' for performance. 'estimatedCount' is always less than 'bucketCount'
+    // which is 0x4000_... or below.


This is not true. estimatedCount is always less than buckecCount / maxLoadFactor (it ends up giving the same guarantees for a maxLoadFactor > 0.5 but still.

Consider the following example:

class Foo: Identifiable { let id: Int = 1 } let lookupTable = WeakLookupTable<Foo>() lookupTable.reserveCapacity(32) while true { let newValue = Foo() lookupTable.insert(newValue) print(lookupTable.estimatedCount) /* need to make estimatedCount public for this test */ }

With each insert estimatedCount gets increased until _minimalBucketCount(for: estimatedCount + 1), i.e. (estimatedCount + 1) / maxLoadFactor is greater than bucketCount at which point the slow path is taken and the lookup table realises that all the previous elements have been released and recounts.

This might influence some of the other comments as well. I haven't looked at them in detail.

estimatedCount is always less than buckecCount / maxLoadFactor (it ends up giving the same guarantees for a maxLoadFactor > 0.5 but still.

I believe this is "estimatedCount is alway less than or equal to bucketCount * maxLoadFactor". So, 'estimatedCount' is always less than 'bucketCount' is true because maxLoadFactor is less than 1.

Either way, what we want to justify here is that estimatedCount + 1 doesn't overflow. i.e. estimatedCount < Int.max, right?

Ah, yes. I see. Sorry for the confusion. I somehow assumed that bucketCount is 32 in the example above, but its 64. You're right and it definitely doesn't overflow.

Use Set<T> like custom hash table (WeakLookupTable) that holds weak references for 'RawSyntax' nodes. This hash table doesn't holds ids separately, instead, use 'T.id' via Identifiable protocol. So freeing 'RawSyntax' turns the bucket in the table into just 'nil', so it's reusable for referencing another object.

rintaro · 2018-09-22T10:18:05Z

Squashed my commits.
@swift-ci Please test

swift-ci · 2018-09-22T10:18:27Z

Build failed
Swift Test OS X Platform
Git Sha - 9e00b74

rintaro · 2018-09-22T10:21:15Z

Linux: https://ci.swift.org/job/swift-PR-Linux/7666/

ahoppen · 2018-09-24T05:41:39Z

Could you also close rdar://43516167 if you haven‘t done so already?

[SR-11115] Missing type annotation on multiple declarations

rintaro force-pushed the lookuptable-custom branch 2 times, most recently from 7db8475 to 2155561 Compare September 14, 2018 13:44

rintaro force-pushed the lookuptable-custom branch 2 times, most recently from 5610c34 to f49811e Compare September 14, 2018 16:43

nkcsgexi closed this Sep 14, 2018

nkcsgexi reopened this Sep 14, 2018

rintaro force-pushed the lookuptable-custom branch from f49811e to 041e793 Compare September 15, 2018 16:19

rintaro force-pushed the lookuptable-custom branch from 041e793 to 0a92126 Compare September 15, 2018 16:51

nkcsgexi approved these changes Sep 20, 2018

View reviewed changes

ahoppen added 2 commits September 20, 2018 18:01

Make RawSyntax a class

af573bf

[swiftSyntax] Store weak references to RawSyntax nodes in the nodeLoo…

226a978

…kupTable This way we don't continue to retain RawSyntax nodes that are no longer needed for incremental transfer.

ahoppen reviewed Sep 20, 2018

View reviewed changes

rintaro force-pushed the lookuptable-custom branch 2 times, most recently from 2b0c155 to c89372d Compare September 21, 2018 05:25

ahoppen reviewed Sep 21, 2018

View reviewed changes

ahoppen approved these changes Sep 21, 2018

View reviewed changes

rintaro force-pushed the lookuptable-custom branch from c89372d to 336d1d5 Compare September 22, 2018 10:17

rintaro mentioned this pull request Sep 22, 2018

[DO NOT MERGE] dummy PR for swift-syntax PR testing (master) swiftlang/swift#19240

Closed

rintaro changed the title ~~[WIP][Experiment] Use custom hash table as node lookup table~~ Use custom hash table as node lookup table Sep 22, 2018

rintaro merged commit a997013 into swiftlang:master Sep 22, 2018

rintaro deleted the lookuptable-custom branch September 22, 2018 12:45

rintaro mentioned this pull request Sep 22, 2018

Remove unused nodes from SyntaxTreeDeserializer.nodeLookupTable #3

Closed

adevress pushed a commit to adevress/swift-syntax that referenced this pull request Jan 14, 2024

Merge pull request swiftlang#11 from kitasuke/SR-11115/type-annotation

2ce1c25

[SR-11115] Missing type annotation on multiple declarations

gfusee mentioned this pull request Sep 17, 2024

The package doesn't compile using the Swift 6.0 open source toolchain swiftlang/swift#76534

Open

Use custom hash table as node lookup table #11

Use custom hash table as node lookup table #11

Uh oh!

Conversation

rintaro commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rintaro commented Sep 14, 2018

Uh oh!

ahoppen commented Sep 14, 2018

Uh oh!

nkcsgexi commented Sep 14, 2018

Uh oh!

nkcsgexi commented Sep 14, 2018

Uh oh!

rintaro commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentey commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nkcsgexi commented Sep 14, 2018

Uh oh!

nkcsgexi commented Sep 14, 2018

Uh oh!

akyrtzi commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentey commented Sep 15, 2018

Uh oh!

rintaro commented Sep 15, 2018

Uh oh!

rintaro commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahoppen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rintaro commented Sep 21, 2018

Uh oh!

ahoppen Sep 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rintaro commented Sep 22, 2018

Uh oh!

swift-ci commented Sep 22, 2018

Uh oh!

rintaro commented Sep 22, 2018

Uh oh!

ahoppen commented Sep 24, 2018

Uh oh!

rintaro commented Sep 14, 2018 •

edited

Loading

rintaro commented Sep 14, 2018 •

edited

Loading

lorentey commented Sep 14, 2018 •

edited

Loading

akyrtzi commented Sep 14, 2018 •

edited

Loading

rintaro commented Sep 20, 2018 •

edited

Loading

ahoppen Sep 21, 2018 •

edited

Loading