Minimize rereading of batches in FSTLocalDocumentsView documentsMatchingCollectionQuery #1533

var-const · 2018-07-13T23:20:42Z

Previously in documentsMatchingCollectionQuery, write batches were read three times:

inside a call to localDocuments (which called allMutationBatchesAffectingDocumentKey);
by calling allMutationBatchesAffectingQuery;
inside a call to documentsForKeys (which calls allMutationBatchesAffectingDocumentKey).

In fact, all relevant batches will always be contained inside allMutationBatchesAffectingQuery (which is also more efficient); the other two calls were redundant.

The new algorithm is:

get remote documents from remote cache;
get write batches affecting the query;
build a set of resulting documents by playing all mutations from those batches that affect the query, modifying remote documents or creating new documents (for those documents that haven't yet been written to the backend);
filter out from the set of resulting documents those that don't satisfy the query.

This also allows doing away with localDocument and localDocuments.

For a case when there are many batches in the mutation queue, the performance improvements appear to be significant. To test, I tried creating a large collection by writing N batches of 500 mutations each (all mutations are creations of very small documents) and manually measuring the time it takes to get all documents from the collection. Timings are greatly affected by the number of batches:

1 batch of 500 mutations -- old ~90ms, new ~70ms;
10 batches of 500 mutations -- old ~2200ms, new ~800ms;
50 batches of 500 mutations -- old ~90000ms, new ~4300ms.

(this is in release mode, using iPhone 8 Plus simulator from XCode 9.4)

It appears that the old version exhibited quadratic growth.

var-const · 2018-07-13T23:23:07Z

A couple of notes:

I'd like to add a performance test (using Google Benchmark library, Add Google Benchmark library and benchmark leveldb #1137) in a follow-up;
I don't think it's possible to remove documentForKey/documentForKeys, because they are also called from FSTLocalStore.

wilhuff

Basically LGTM

wilhuff · 2018-07-14T19:15:02Z

Firestore/Source/Local/FSTLocalDocumentsView.mm

-  // Query the remote documents and overlay mutations.
-  // TODO(mikelehen): There may be significant overlap between the mutations affecting these
-  // remote documents and the allMutationBatchesAffectingQuery mutations. Consider optimizing.
+  // Get the remote documents in the state in which the backend last acknowledged them.


This comment and the one below are very close to merely restating what the code is doing. You might consider just dropping them.

wilhuff · 2018-07-14T19:17:22Z

Firestore/Source/Local/FSTLocalDocumentsView.mm

    for (FSTMutation *mutation in batch.mutations) {
-      // TODO(mikelehen): PERF: Check if this mutation actually affects the query to reduce work.
+      // Only process documents belonging to the collection.
+      if (mutation.key.path().PopLast() != query.path) {


PopLast() requires a full copy of the underlying array (less 1 segment). A better way to do this would be to add a method on Path that allows us to ask query.path if mutation.key.Path() is an immediate child.

Done. The numbers I'm seeing in my local performance test are similar, but logically the dedicated method makes more sense.

Feel free to suggest changes on the naming. To quickly recap my reasoning: I think IsChild is slightly confusing because, to me at least, it's not clear whether it's the left or right side that is being checked, so that leaves IsChildOf vs IsParentOf (I can't think of a clear non-convoluted name to represent the opposite). Since you mentioned asking query.path, I settled on IsParentOf.

var-const · 2018-07-16T21:17:17Z

@wilhuff Gil, any idea on what's causing the CMake build to fail? From Travis log:

/Users/travis/.rvm/rubies/ruby-2.3.1/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- cocoapods (LoadError)
	from /Users/travis/.rvm/rubies/ruby-2.3.1/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
	from /Users/travis/build/firebase/firebase-ios-sdk/Firestore/../cmake/podspec_cmake.rb:17:in `<main>'
CMake Error at /Users/travis/build/firebase/firebase-ios-sdk/cmake/podspec_rules.cmake:49 (include):
  include could not find load file:
    /Users/travis/build/firebase/firebase-ios-sdk/build/Firestore/GoogleUtilities.cmake
Call Stack (most recent call first):
  CMakeLists.txt:129 (podspec_framework)
/Users/travis/.rvm/rubies/ruby-2.3.1/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- cocoapods (LoadError)
	from /Users/travis/.rvm/rubies/ruby-2.3.1/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
	from /Users/travis/build/firebase/firebase-ios-sdk/Firestore/../cmake/podspec_cmake.rb:17:in `<main>'
CMake Error at /Users/travis/build/firebase/firebase-ios-sdk/cmake/podspec_rules.cmake:49 (include):
  include could not find load file:
    /Users/travis/build/firebase/firebase-ios-sdk/build/Firestore/FirebaseCore.cmake
Call Stack (most recent call first):
  CMakeLists.txt:134 (podspec_framework)

wilhuff · 2018-07-16T21:36:00Z

CMake is broken on master in CI: #1544 fixes it.

wilhuff

LGTM

You can merge master to get an updated travis configuration.

var-const · 2018-07-16T22:18:46Z

@wilhuff Thanks a lot for suggesting this change!

…#1534) Reflects #1505, #1507, #1533.

* add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details).

* Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)

* Merging PersistentStream refactor * [AUTOMATED]: Prettier Code Styling * Typo * Remove canUseNetwork state. (#1076) * Merging the latest merge into the previous merge (#1077) * Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)

* Catch invalid provider id error (#1064) * RxFire: Api Change and documentation (#1066) * api changes and doc updates * fixes * Refactor PersistentStream (no behavior changes). (#1041) This breaks out a number of changes I made as prep for b/80402781 (Continue retrying streams for 1 minute (idle delay)). PersistentStream changes: * Rather than providing a stream event listener to every call of start(), the stream listener is now provided once to the constructor and cannot be changed. * Streams can now be restarted indefinitely, even after a call to stop(). * PersistentStreamState.Stopped was removed and we just return to 'Initial' after a stop() call. * Added `closeCount` member to PersistentStream in order to avoid bleedthrough issues with auth and stream events once stop() has been called. * Calling stop() now triggers the onClose() event listener, which simplifies stream cleanup. * PersistentStreamState.Auth renamed to 'Starting' to better reflect that it encompasses both authentication and opening the stream. RemoteStore changes: * Creates streams once and just stop() / start()s them as necessary, never recreating them completely. * Added networkEnabled flag to track whether the network is enabled or not, since we no longer null out the streams. * Refactored disableNetwork() / enableNetwork() to remove stream re-creation. Misc: * Comment improvements including a state diagram on PersistentStream. * Fixed spec test shutdown to schedule via the AsyncQueue to fix sequencing order I ran into. * Merging Persistent Stream refactor (#1069) * Merging PersistentStream refactor * [AUTOMATED]: Prettier Code Styling * Typo * Remove canUseNetwork state. (#1076) * Merging the latest merge into the previous merge (#1077) * Implement global resume token (#1052) * Add a spec test that shows correct global resume token handling * Minimum implementation to handle global resume tokens * Remove unused QueryView.resumeToken * Avoid persisting the resume token unless required * Persist the resume token on unlisten * Add a type parameter to Persistence (#1047) * Cherry pick sequence number starting point * Working on typed transactions * Start plumbing in sequence number * Back out sequence number changes * [AUTOMATED]: Prettier Code Styling * Fix tests * [AUTOMATED]: Prettier Code Styling * Fix lint * [AUTOMATED]: Prettier Code Styling * Uncomment line * MemoryPersistenceTransaction -> MemoryTransaction * [AUTOMATED]: Prettier Code Styling * Review updates * Style * Lint and style * Review feedback * [AUTOMATED]: Prettier Code Styling * Revert some unintentional import churn * Line 44 should definitely be empty * Checkpoint before adding helper function for stores * Use a helper for casting PersistenceTransaction to IndexedDbTransaction * [AUTOMATED]: Prettier Code Styling * Remove errant generic type * Lint * Fix typo * Port optimizations to LocalDocumentsView from iOS (#1055) * add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479)); * use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505)); * avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)). Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details). * Add a CHANGELOG entry for #1052 (#1071) * Add a CHANGELOG entry for #1052 * Add notes for #1055 * Rename idleTimer and fix comments. (#1068) * Merge (#1073)

wilhuff and others added 25 commits June 28, 2018 17:55

Pod updates for Cocapods 1.5.3

c6c2a9c

Add allMutationsAffectingDocumentKeys

76bf96a

Initial

36d8596

Initial

4d18e47

Add comment

158877c

Initial

866d443

Merge branch 'varconst/write-batch-opt-2' into varconst/coll-opt

b6fdbcd

Fix tests

9434a99

wip test

9c4382f

Merge branch 'master' into varconst/coll-opt

09b87dd

test wip

bb58f9b

test wip

f69dbe5

test done

506de63

Refactoring

5de82a5

small fixes

9b6c334

pseudorevert

06eb18b

cleanup

c05ce4c

more cleanup

7035bf6

revert, pt.1

0b98d5c

Merge branch 'master' into varconst/coll-opt

7b90f46

revert, pt.2

8ccb281

Cleanup test

471a97e

style.sh

8857346

Remove the not-really-working performance test for now

97b7d6b

Add/Retain comments where applicable

8be649b

var-const added the api: firestore label Jul 13, 2018

var-const requested a review from wilhuff July 13, 2018 23:20

googlebot added the cla: yes label Jul 13, 2018

var-const assigned wilhuff Jul 13, 2018

This was referenced Jul 13, 2018

[For review only] Minimize rereading of batches in [FSTLocalDocumentsView documentsMatchingCollectionQuery] #1527

Closed

Slow firestore listener performance with large data sets #1477

Closed

wilhuff mentioned this pull request Jul 14, 2018

Update changelog to mention performance improvements in write batches #1534

Merged

wilhuff reviewed Jul 14, 2018

View reviewed changes

Review feedback

bf10402

wilhuff approved these changes Jul 16, 2018

View reviewed changes

Merge branch 'master' into varconst/coll-opt

35ea6b9

var-const merged commit 6ab0195 into master Jul 16, 2018

var-const added a commit that referenced this pull request Jul 17, 2018

Update changelog to mention performance improvements in write batches (…

d6d273a

…#1534) Reflects #1505, #1507, #1533.

var-const mentioned this pull request Jul 27, 2018

Port optimizations to LocalDocumentsView from iOS firebase/firebase-js-sdk#1055

Merged

Salakar mentioned this pull request Aug 2, 2018

Optimistic updates slow? invertase/react-native-firebase#1225

Closed

paulb777 deleted the varconst/coll-opt branch May 26, 2019 20:47

firebase locked and limited conversation to collaborators Oct 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize rereading of batches in FSTLocalDocumentsView documentsMatchingCollectionQuery #1533

Minimize rereading of batches in FSTLocalDocumentsView documentsMatchingCollectionQuery #1533

var-const commented Jul 13, 2018 •

edited

Loading

var-const commented Jul 13, 2018

wilhuff left a comment

wilhuff Jul 14, 2018

var-const Jul 16, 2018

wilhuff Jul 14, 2018

var-const Jul 16, 2018 •

edited

Loading

var-const commented Jul 16, 2018

wilhuff commented Jul 16, 2018

wilhuff left a comment

var-const commented Jul 16, 2018

Minimize rereading of batches in FSTLocalDocumentsView documentsMatchingCollectionQuery #1533

Minimize rereading of batches in FSTLocalDocumentsView documentsMatchingCollectionQuery #1533

Conversation

var-const commented Jul 13, 2018 • edited Loading

var-const commented Jul 13, 2018

wilhuff left a comment

Choose a reason for hiding this comment

wilhuff Jul 14, 2018

Choose a reason for hiding this comment

var-const Jul 16, 2018

Choose a reason for hiding this comment

wilhuff Jul 14, 2018

Choose a reason for hiding this comment

var-const Jul 16, 2018 • edited Loading

Choose a reason for hiding this comment

var-const commented Jul 16, 2018

wilhuff commented Jul 16, 2018

wilhuff left a comment

Choose a reason for hiding this comment

var-const commented Jul 16, 2018

var-const commented Jul 13, 2018 •

edited

Loading

var-const Jul 16, 2018 •

edited

Loading