-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add allMutationsAffectingDocumentKeys to FSTMutationQueue #1479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5e934b2
to
76bf96a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
@@ -396,6 +396,39 @@ - (nullable FSTMutationBatch *)nextMutationBatchAfterBatchID:(FSTBatchID)batchID | |||
return result; | |||
} | |||
|
|||
- (NSArray<FSTMutationBatch *> *)allMutationBatchesAffectingDocumentKeys: | |||
(const DocumentKeySet &)documentKeys { | |||
NSString *userID = self.userID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is storing this value in a variable intended as an optimization or readability improvement? In the latter case, I think moving it inline would be slightly better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a minor optimization. This is done fairly consistently throughout so I prefer to leave it as is.
As we rewrite to C++ this will become an instance variable reference.
* | ||
* Note that because of this requirement implementations are free to return mutation batches that | ||
* don't contain any of the given document keys at all if it's convenient. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the implementations return batches in sorted order, perhaps it's worth it to call it out in this method's description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's a contractual thing we care about, and isn't something that can necessarily be guaranteed on the other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wilhuff Sorry to reopen old thread, but it has relevant context. @mikelehen Michael pointed out that it would matter for batches to be in order when applying them. On Android, it's more natural to return batches unsorted, although of course it's possible to sort them as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. Of course.
I think it's reasonable to add a comment that indicates these come out in sorted order. I don't think it's worth changing to a data structure that necessarily enforces that order though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I guess the implementation does actually do the right thing (I thought it didn't guarantee order) since std::set is apparently a sorted set. So yeah, I guess we just need to update the comment.
@@ -115,10 +116,21 @@ NS_ASSUME_NONNULL_BEGIN | |||
* don't contain the document key at all if it's convenient. | |||
*/ | |||
// TODO(mcg): This should really return an NSEnumerator | |||
// also for b/32992024, all backing stores should really index by document key | |||
- (NSArray<FSTMutationBatch *> *)allMutationBatchesAffectingDocumentKey: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: perhaps this function can now be removed or reimplemented in terms of the new one? (I can do that if you think it's a good idea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the other can be removed since it's an anti-pattern. I purely went with the addition of this new method here to give you the chance to figure out how to integrate it into the reworked higher-level algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll do that in a follow-up.
for (const DocumentKey &documentKey : documentKeys) { | ||
std::string indexPrefix = | ||
[FSTLevelDBDocumentMutationKey keyPrefixWithUserID:userID resourcePath:documentKey.path()]; | ||
indexIterator->Seek(indexPrefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very optional nit: I think you can move this expression to the initialization of for
loop to avoid having a missing clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// the index rows for documentKey contiguously. In particular, all the rows for documentKey | ||
// will occur before any rows for documents nested in a subcollection beneath documentKey so | ||
// we can stop as soon as we hit any such row. | ||
if (!absl::StartsWith(indexIterator->key(), indexPrefix) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be four variations of this check throughout the file. One could go away if the single-key version of this function were to be removed or rewritten to delegate to this function. Nevertheless, perhaps this check deserves its own named function? I can do this in a subsequent PR if you think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble is that these aren't all the same. In three out of four cases we examine other parts of the row key after decoding it. In one case we consult the size of the key rather than an exact match. I'm also not keen on hiding the decoding the key too much.
The best way to solve this would probably be to add some mechanism for constructing the prefix including the marker that would start the next segment. That way we could do something like:
if (![rowKey decodeKey:indexIterator->key() withExactPrefix:suffixedIndexPrefix]) {
break;
}
This would effectively combine the cheap indexPrefix check we have now with the exact DocumentKey check.
However, I'm struggling with what to call that thing, since we need both kinds: we need exact matches as we do in this case, but also any path matching the prefix as we need for allMutationBatchesAffectingQuery
.
Given everything we have to do I'm inclined to just leave it as is. I'd rather not obsess about this too much--we need to make progress on the larger goals we have and this, while verbose, avoids being magical.
[FSTLevelDBDocumentMutationKey keyPrefixWithUserID:userID resourcePath:documentKey.path()]; | ||
indexIterator->Seek(indexPrefix); | ||
for (; indexIterator->Valid(); indexIterator->Next()) { | ||
// Only consider rows matching exactly the specific key of interest. Note that because we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: it's probably too much trouble, but I think a hypothetical example of how rows are laid out would make this comment easier to grasp and perhaps allow making it shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This helped tighten the text too.
|
||
self.persistence.run("testAllMutationBatchesAffectingDocumentKey", [&]() { | ||
NSArray<FSTMutation *> *mutations = @[ | ||
FSTTestSetMutation(@"fob/bar", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultranit: could you replace the b
in fob
with a letter that has less resemblance to an o
, say, x
or k
or y
? It took me three passes to finally spot the difference between this element and the next one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that. The idea is that this key precedes the others so I've chosen i
.
* add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details).
* Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)
* Merging PersistentStream refactor * [AUTOMATED]: Prettier Code Styling * Typo * Remove canUseNetwork state. (#1076) * Merging the latest merge into the previous merge (#1077) * Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)
* Catch invalid provider id error (#1064) * RxFire: Api Change and documentation (#1066) * api changes and doc updates * fixes * Refactor PersistentStream (no behavior changes). (#1041) This breaks out a number of changes I made as prep for b/80402781 (Continue retrying streams for 1 minute (idle delay)). PersistentStream changes: * Rather than providing a stream event listener to every call of start(), the stream listener is now provided once to the constructor and cannot be changed. * Streams can now be restarted indefinitely, even after a call to stop(). * PersistentStreamState.Stopped was removed and we just return to 'Initial' after a stop() call. * Added `closeCount` member to PersistentStream in order to avoid bleedthrough issues with auth and stream events once stop() has been called. * Calling stop() now triggers the onClose() event listener, which simplifies stream cleanup. * PersistentStreamState.Auth renamed to 'Starting' to better reflect that it encompasses both authentication and opening the stream. RemoteStore changes: * Creates streams once and just stop() / start()s them as necessary, never recreating them completely. * Added networkEnabled flag to track whether the network is enabled or not, since we no longer null out the streams. * Refactored disableNetwork() / enableNetwork() to remove stream re-creation. Misc: * Comment improvements including a state diagram on PersistentStream. * Fixed spec test shutdown to schedule via the AsyncQueue to fix sequencing order I ran into. * Merging Persistent Stream refactor (#1069) * Merging PersistentStream refactor * [AUTOMATED]: Prettier Code Styling * Typo * Remove canUseNetwork state. (#1076) * Merging the latest merge into the previous merge (#1077) * Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)
This is a building block for eliminating quadratic behavior found recently when recomputing the view for large batches.