-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Minimize rereading of batches in FSTLocalDocumentsView documentsMatchingCollectionQuery #1533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A couple of notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM
// Query the remote documents and overlay mutations. | ||
// TODO(mikelehen): There may be significant overlap between the mutations affecting these | ||
// remote documents and the allMutationBatchesAffectingQuery mutations. Consider optimizing. | ||
// Get the remote documents in the state in which the backend last acknowledged them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment and the one below are very close to merely restating what the code is doing. You might consider just dropping them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for (FSTMutation *mutation in batch.mutations) { | ||
// TODO(mikelehen): PERF: Check if this mutation actually affects the query to reduce work. | ||
// Only process documents belonging to the collection. | ||
if (mutation.key.path().PopLast() != query.path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PopLast() requires a full copy of the underlying array (less 1 segment). A better way to do this would be to add a method on Path that allows us to ask query.path if mutation.key.Path() is an immediate child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The numbers I'm seeing in my local performance test are similar, but logically the dedicated method makes more sense.
Feel free to suggest changes on the naming. To quickly recap my reasoning: I think IsChild
is slightly confusing because, to me at least, it's not clear whether it's the left or right side that is being checked, so that leaves IsChildOf
vs IsParentOf
(I can't think of a clear non-convoluted name to represent the opposite). Since you mentioned asking query.path
, I settled on IsParentOf
.
@wilhuff Gil, any idea on what's causing the CMake build to fail? From Travis log:
|
CMake is broken on master in CI: #1544 fixes it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
You can merge master to get an updated travis configuration.
@wilhuff Thanks a lot for suggesting this change! |
* add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details).
* Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)
* Merging PersistentStream refactor * [AUTOMATED]: Prettier Code Styling * Typo * Remove canUseNetwork state. (#1076) * Merging the latest merge into the previous merge (#1077) * Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)
* Catch invalid provider id error (#1064) * RxFire: Api Change and documentation (#1066) * api changes and doc updates * fixes * Refactor PersistentStream (no behavior changes). (#1041) This breaks out a number of changes I made as prep for b/80402781 (Continue retrying streams for 1 minute (idle delay)). PersistentStream changes: * Rather than providing a stream event listener to every call of start(), the stream listener is now provided once to the constructor and cannot be changed. * Streams can now be restarted indefinitely, even after a call to stop(). * PersistentStreamState.Stopped was removed and we just return to 'Initial' after a stop() call. * Added `closeCount` member to PersistentStream in order to avoid bleedthrough issues with auth and stream events once stop() has been called. * Calling stop() now triggers the onClose() event listener, which simplifies stream cleanup. * PersistentStreamState.Auth renamed to 'Starting' to better reflect that it encompasses both authentication and opening the stream. RemoteStore changes: * Creates streams once and just stop() / start()s them as necessary, never recreating them completely. * Added networkEnabled flag to track whether the network is enabled or not, since we no longer null out the streams. * Refactored disableNetwork() / enableNetwork() to remove stream re-creation. Misc: * Comment improvements including a state diagram on PersistentStream. * Fixed spec test shutdown to schedule via the AsyncQueue to fix sequencing order I ran into. * Merging Persistent Stream refactor (#1069) * Merging PersistentStream refactor * [AUTOMATED]: Prettier Code Styling * Typo * Remove canUseNetwork state. (#1076) * Merging the latest merge into the previous merge (#1077) * Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)
Previously in
documentsMatchingCollectionQuery
, write batches were read three times:localDocuments
(which calledallMutationBatchesAffectingDocumentKey
);allMutationBatchesAffectingQuery
;documentsForKeys
(which callsallMutationBatchesAffectingDocumentKey
).In fact, all relevant batches will always be contained inside
allMutationBatchesAffectingQuery
(which is also more efficient); the other two calls were redundant.The new algorithm is:
This also allows doing away with
localDocument
andlocalDocuments
.For a case when there are many batches in the mutation queue, the performance improvements appear to be significant. To test, I tried creating a large collection by writing N batches of 500 mutations each (all mutations are creations of very small documents) and manually measuring the time it takes to get all documents from the collection. Timings are greatly affected by the number of batches:
(this is in release mode, using iPhone 8 Plus simulator from XCode 9.4)
It appears that the old version exhibited quadratic growth.