Skip to content

Minimize rereading of batches in FSTLocalDocumentsView documentsMatchingCollectionQuery #1533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jul 16, 2018

Conversation

var-const
Copy link
Contributor

@var-const var-const commented Jul 13, 2018

Previously in documentsMatchingCollectionQuery, write batches were read three times:

  • inside a call to localDocuments (which called allMutationBatchesAffectingDocumentKey);
  • by calling allMutationBatchesAffectingQuery;
  • inside a call to documentsForKeys (which calls allMutationBatchesAffectingDocumentKey).

In fact, all relevant batches will always be contained inside allMutationBatchesAffectingQuery (which is also more efficient); the other two calls were redundant.

The new algorithm is:

  • get remote documents from remote cache;
  • get write batches affecting the query;
  • build a set of resulting documents by playing all mutations from those batches that affect the query, modifying remote documents or creating new documents (for those documents that haven't yet been written to the backend);
  • filter out from the set of resulting documents those that don't satisfy the query.

This also allows doing away with localDocument and localDocuments.

For a case when there are many batches in the mutation queue, the performance improvements appear to be significant. To test, I tried creating a large collection by writing N batches of 500 mutations each (all mutations are creations of very small documents) and manually measuring the time it takes to get all documents from the collection. Timings are greatly affected by the number of batches:

  • 1 batch of 500 mutations -- old ~90ms, new ~70ms;
  • 10 batches of 500 mutations -- old ~2200ms, new ~800ms;
  • 50 batches of 500 mutations -- old ~90000ms, new ~4300ms.

(this is in release mode, using iPhone 8 Plus simulator from XCode 9.4)

It appears that the old version exhibited quadratic growth.

@var-const
Copy link
Contributor Author

A couple of notes:

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM

// Query the remote documents and overlay mutations.
// TODO(mikelehen): There may be significant overlap between the mutations affecting these
// remote documents and the allMutationBatchesAffectingQuery mutations. Consider optimizing.
// Get the remote documents in the state in which the backend last acknowledged them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment and the one below are very close to merely restating what the code is doing. You might consider just dropping them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (FSTMutation *mutation in batch.mutations) {
// TODO(mikelehen): PERF: Check if this mutation actually affects the query to reduce work.
// Only process documents belonging to the collection.
if (mutation.key.path().PopLast() != query.path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PopLast() requires a full copy of the underlying array (less 1 segment). A better way to do this would be to add a method on Path that allows us to ask query.path if mutation.key.Path() is an immediate child.

Copy link
Contributor Author

@var-const var-const Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The numbers I'm seeing in my local performance test are similar, but logically the dedicated method makes more sense.

Feel free to suggest changes on the naming. To quickly recap my reasoning: I think IsChild is slightly confusing because, to me at least, it's not clear whether it's the left or right side that is being checked, so that leaves IsChildOf vs IsParentOf (I can't think of a clear non-convoluted name to represent the opposite). Since you mentioned asking query.path, I settled on IsParentOf.

@var-const
Copy link
Contributor Author

@wilhuff Gil, any idea on what's causing the CMake build to fail? From Travis log:

/Users/travis/.rvm/rubies/ruby-2.3.1/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- cocoapods (LoadError)
	from /Users/travis/.rvm/rubies/ruby-2.3.1/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
	from /Users/travis/build/firebase/firebase-ios-sdk/Firestore/../cmake/podspec_cmake.rb:17:in `<main>'
CMake Error at /Users/travis/build/firebase/firebase-ios-sdk/cmake/podspec_rules.cmake:49 (include):
  include could not find load file:
    /Users/travis/build/firebase/firebase-ios-sdk/build/Firestore/GoogleUtilities.cmake
Call Stack (most recent call first):
  CMakeLists.txt:129 (podspec_framework)
/Users/travis/.rvm/rubies/ruby-2.3.1/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- cocoapods (LoadError)
	from /Users/travis/.rvm/rubies/ruby-2.3.1/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
	from /Users/travis/build/firebase/firebase-ios-sdk/Firestore/../cmake/podspec_cmake.rb:17:in `<main>'
CMake Error at /Users/travis/build/firebase/firebase-ios-sdk/cmake/podspec_rules.cmake:49 (include):
  include could not find load file:
    /Users/travis/build/firebase/firebase-ios-sdk/build/Firestore/FirebaseCore.cmake
Call Stack (most recent call first):
  CMakeLists.txt:134 (podspec_framework)

@wilhuff
Copy link
Contributor

wilhuff commented Jul 16, 2018

CMake is broken on master in CI: #1544 fixes it.

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

You can merge master to get an updated travis configuration.

@var-const
Copy link
Contributor Author

@wilhuff Thanks a lot for suggesting this change!

@var-const var-const merged commit 6ab0195 into master Jul 16, 2018
var-const added a commit to firebase/firebase-js-sdk that referenced this pull request Jul 31, 2018
* add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479));
* use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505));
* avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)).

Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details).
schmidt-sebastian added a commit to firebase/firebase-js-sdk that referenced this pull request Aug 1, 2018
* Implement global resume token (#1052)

* Add a spec test that shows correct global resume token handling

* Minimum implementation to handle global resume tokens

* Remove unused QueryView.resumeToken

* Avoid persisting the resume token unless required

* Persist the resume token on unlisten

* Add a type parameter to Persistence (#1047)

* Cherry pick sequence number starting point

* Working on typed transactions

* Start plumbing in sequence number

* Back out sequence number changes

* [AUTOMATED]: Prettier Code Styling

* Fix tests

* [AUTOMATED]: Prettier Code Styling

* Fix lint

* [AUTOMATED]: Prettier Code Styling

* Uncomment line

* MemoryPersistenceTransaction -> MemoryTransaction

* [AUTOMATED]: Prettier Code Styling

* Review updates

* Style

* Lint and style

* Review feedback

* [AUTOMATED]: Prettier Code Styling

* Revert some unintentional import churn

* Line 44 should definitely be empty

* Checkpoint before adding helper function for stores

* Use a helper for casting PersistenceTransaction to IndexedDbTransaction

* [AUTOMATED]: Prettier Code Styling

* Remove errant generic type

* Lint

* Fix typo

* Port optimizations to LocalDocumentsView from iOS (#1055)

* add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479));
* use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505));
* avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)).

Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details).

* Add a CHANGELOG entry for #1052 (#1071)

* Add a CHANGELOG entry for #1052

* Add notes for #1055

* Rename idleTimer and fix comments. (#1068)

* Merge (#1073)
schmidt-sebastian added a commit to firebase/firebase-js-sdk that referenced this pull request Aug 1, 2018
* Merging PersistentStream refactor

* [AUTOMATED]: Prettier Code Styling

* Typo

* Remove canUseNetwork state. (#1076)

* Merging the latest merge into the previous merge (#1077)

* Implement global resume token (#1052)

* Add a spec test that shows correct global resume token handling

* Minimum implementation to handle global resume tokens

* Remove unused QueryView.resumeToken

* Avoid persisting the resume token unless required

* Persist the resume token on unlisten

* Add a type parameter to Persistence (#1047)

* Cherry pick sequence number starting point

* Working on typed transactions

* Start plumbing in sequence number

* Back out sequence number changes

* [AUTOMATED]: Prettier Code Styling

* Fix tests

* [AUTOMATED]: Prettier Code Styling

* Fix lint

* [AUTOMATED]: Prettier Code Styling

* Uncomment line

* MemoryPersistenceTransaction -> MemoryTransaction

* [AUTOMATED]: Prettier Code Styling

* Review updates

* Style

* Lint and style

* Review feedback

* [AUTOMATED]: Prettier Code Styling

* Revert some unintentional import churn

* Line 44 should definitely be empty

* Checkpoint before adding helper function for stores

* Use a helper for casting PersistenceTransaction to IndexedDbTransaction

* [AUTOMATED]: Prettier Code Styling

* Remove errant generic type

* Lint

* Fix typo

* Port optimizations to LocalDocumentsView from iOS (#1055)

* add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479));
* use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505));
* avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)).

Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details).

* Add a CHANGELOG entry for #1052 (#1071)

* Add a CHANGELOG entry for #1052

* Add notes for #1055

* Rename idleTimer and fix comments. (#1068)

* Merge (#1073)
schmidt-sebastian added a commit to firebase/firebase-js-sdk that referenced this pull request Aug 1, 2018
* Catch invalid provider id error (#1064)

* RxFire: Api Change and documentation (#1066)

* api changes and doc updates

* fixes

* Refactor PersistentStream (no behavior changes). (#1041)

This breaks out a number of changes I made as prep for b/80402781 (Continue
retrying streams for 1 minute (idle delay)).

PersistentStream changes:
* Rather than providing a stream event listener to every call of start(),
  the stream listener is now provided once to the constructor and cannot
  be changed.
* Streams can now be restarted indefinitely, even after a call to stop().
  * PersistentStreamState.Stopped was removed and we just return to
    'Initial' after a stop() call.
  * Added `closeCount` member to PersistentStream in order to avoid
    bleedthrough issues with auth and stream events once stop() has
    been called.
  * Calling stop() now triggers the onClose() event listener, which
    simplifies stream cleanup.
* PersistentStreamState.Auth renamed to 'Starting' to better reflect that
  it encompasses both authentication and opening the stream.

RemoteStore changes:
* Creates streams once and just stop() / start()s them as necessary,
  never recreating them completely.
  * Added networkEnabled flag to track whether the network is
    enabled or not, since we no longer null out the streams.
  * Refactored disableNetwork() / enableNetwork() to remove stream
    re-creation.

Misc:
* Comment improvements including a state diagram on PersistentStream.
* Fixed spec test shutdown to schedule via the AsyncQueue to fix
  sequencing order I ran into.

* Merging Persistent Stream refactor (#1069)

* Merging PersistentStream refactor

* [AUTOMATED]: Prettier Code Styling

* Typo

* Remove canUseNetwork state. (#1076)

* Merging the latest merge into the previous merge (#1077)

* Implement global resume token (#1052)

* Add a spec test that shows correct global resume token handling

* Minimum implementation to handle global resume tokens

* Remove unused QueryView.resumeToken

* Avoid persisting the resume token unless required

* Persist the resume token on unlisten

* Add a type parameter to Persistence (#1047)

* Cherry pick sequence number starting point

* Working on typed transactions

* Start plumbing in sequence number

* Back out sequence number changes

* [AUTOMATED]: Prettier Code Styling

* Fix tests

* [AUTOMATED]: Prettier Code Styling

* Fix lint

* [AUTOMATED]: Prettier Code Styling

* Uncomment line

* MemoryPersistenceTransaction -> MemoryTransaction

* [AUTOMATED]: Prettier Code Styling

* Review updates

* Style

* Lint and style

* Review feedback

* [AUTOMATED]: Prettier Code Styling

* Revert some unintentional import churn

* Line 44 should definitely be empty

* Checkpoint before adding helper function for stores

* Use a helper for casting PersistenceTransaction to IndexedDbTransaction

* [AUTOMATED]: Prettier Code Styling

* Remove errant generic type

* Lint

* Fix typo

* Port optimizations to LocalDocumentsView from iOS (#1055)

* add a method to find batches affecting a set of keys (port of [1479](firebase/firebase-ios-sdk#1479));
* use the newly-added method to avoid rereading batches when getting documents in `LocalDocumentsView` (port of [1505](firebase/firebase-ios-sdk#1505));
* avoid rereading batches when searching for documents in a collection (port of [1533](firebase/firebase-ios-sdk#1533)).

Speedup was measured by running tests in browser and checking time spent writing 10 batches of 500 mutations each, and then querying the resulting 5K docs collection from cache in offline mode. For this case, the writing speedup is about 3x, and querying speedup is about 6x (see PR for more details).

* Add a CHANGELOG entry for #1052 (#1071)

* Add a CHANGELOG entry for #1052

* Add notes for #1055

* Rename idleTimer and fix comments. (#1068)

* Merge (#1073)
@paulb777 paulb777 deleted the varconst/coll-opt branch May 26, 2019 20:47
@firebase firebase locked and limited conversation to collaborators Oct 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants