Optimisation for schemas with many fields #4567

steven-supersolid · 2018-02-15T12:40:54Z

When there are many (1000+) fields then updates to objects take a long time even when very few values are changed. The slow code is in SchemaController.validateObject and can be traced back to SchemaController.injectDefaultSchema via SchemaController.reloadData.

The previous code in injectDefaultSchema when transpiled becomes:

fields: Object.assign({}, defaultColumns._Default, defaultColumns[className] || {}, fields)

This creates a new fields object and does not modify the fields parameter but copies each field, so is safe but slow.

The downside of the optimisation is that the fields parameter will be modified and this could be unexpected (although seems to work OK with the current code). To make this clearer we could refactor to add a function to injectDefaultFields with no return type and where injection here should be clear that modification is occurring.

codecov · 2018-02-15T12:53:35Z

Codecov Report

Merging #4567 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4567      +/-   ##
==========================================
- Coverage    92.9%   92.86%   -0.04%     
==========================================
  Files         118      118              
  Lines        8453     8491      +38     
==========================================
+ Hits         7853     7885      +32     
- Misses        600      606       +6

Impacted Files	Coverage Δ
src/Controllers/SchemaController.js	`96.47% <100%> (ø)`	⬆️
src/Routers/PushRouter.js	`92.85% <0%> (-3.58%)`	⬇️
src/RestWrite.js	`93.1% <0%> (-0.55%)`	⬇️
src/Adapters/Storage/Mongo/MongoStorageAdapter.js	`95.54% <0%> (-0.24%)`	⬇️
...dapters/Storage/Postgres/PostgresStorageAdapter.js	`97.11% <0%> (-0.01%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cac14bc...cbfb976. Read the comment docs.

dplewis · 2018-02-15T15:16:22Z

So the goal here

Not copy the field parameter (make it the target of Object.assign instead of source)
Not mutate the field parameter (store it in a new variable is not an option)

steven-supersolid · 2018-02-15T16:33:14Z

Yes, so the code change does the first option - makes fields the target of Object.assign but in that case it will be mutated.

fields: Object.assign(fields || {}, defaultColumns._Default, defaultColumns[className] || {})

or equivalently:

Object.assign(fields || {}, defaultColumns._Default, defaultColumns[className] || {})
...
fields: fields

If we don't mutate then the object has to be copied but that is what is slow when there are a lot of properties.

A new variable won't work because that will just be a reference to fields, so adding properties to the new variable will add properties to fields too.

P.S. the fields || {} is required because in some cases fields is undefined

…aultColumns._Audience

This needs more investigation before we can safely merge

flovilmart · 2018-02-16T21:14:41Z

@dplewis sorry to have dismiss the review, but we originally implemented the copy of fields to make sure we don't generate side effects into the schema table. I need to take some time to wrap my head around it. We need to be utterly careful when changing those behaviors as they can have unintended side effects.

flovilmart · 2018-02-16T21:18:07Z

@steven-supersolid could you provide examples also of those slow calls, examples etc... so we can measure improvements and find a proper workaround?

dplewis · 2018-02-16T23:07:05Z

@flovilmart No worries, don't want a repeat of #4409

flovilmart · 2018-02-16T23:12:58Z

Ahaha true that no worries man! I'll just check that the fields / schema never come back to the DB layer once injectected.

steven-supersolid · 2018-02-17T14:58:18Z

Agree we need to be really careful with this. It's my understanding the default fields get injected to each schema entry (if missing) and then saved to the database anyway so did not see a security concern but I may have overlooked something.

In my integration test I created an object with 50 properties to set the schema entry at the same size. Then I created a new object with those same 50 properties and saved, timing this operation. For comparison I created an object with 1000 properties to update the schema, then repeated the step of creating a new object with the original 50 properties and timing the save of that. The time taken was approximately double even though saving the same data (and schema was cached by this point). I tracked the slow area of code to injectDefaultSchema.

In hindsight it stands to reason that copying an object with 1000 properties is not going to be quick. The matter is made worse because when modifying multiple properties then the default schema is injected for every field modified, in series. So in my test 50 x 1000 properties were copied for the update operation.

If we want to look at a larger refactor then I suggest that the default fields are added when a schema entry is created only, as it seems redundant to add them on every update.

flovilmart · 2018-02-17T15:03:13Z

If we want to look at a larger refactor then I suggest that the default fields are added when a schema entry is created only, as it seems redundant to add them on every update.

This makes way more sense. Reducing the number of overall calls is the way to go, and more long term if we identify this operation as a bottleneck.

…tDefaultSchema in reloadData

steven-supersolid · 2018-02-19T10:12:51Z

Examining the code it seems reloadData always calls getAllClasses and there if the schema is not cached then after getting from _dbAdapter, injectDefaultSchema is called anyway before adding to the cache. So I think the call to injectDefaultSchema in reloadData is redundant and can be removed.

I have reverted the original change and removed this redundant call. Also left the minor code cleanup in.

steven-supersolid · 2018-02-19T10:14:20Z

src/Controllers/SchemaController.js

@@ -330,26 +330,26 @@ const injectDefaultSchema = ({className, fields, classLevelPermissions, indexes}

 const _HooksSchema =  {className: "_Hooks", fields: defaultColumns._Hooks};
 const _GlobalConfigSchema = { className: "_GlobalConfig", fields: defaultColumns._GlobalConfig }
-const _PushStatusSchema = convertSchemaToAdapterSchema(injectDefaultSchema({


Don't need to call injectDefaultSchema here as it is called inside convertSchemaToAdapterSchema

steven-supersolid · 2018-02-19T10:15:08Z

src/Controllers/SchemaController.js

  className: "_Audience",
-  fields: defaultColumns._Audience,


Inconsistent and not required as convertSchemaToAdapterSchema calls injectDefaultSchema which adds these fields anyway

steven-supersolid · 2018-03-06T15:03:58Z

Any thoughts on the cut down changes? I have tried this in production and no problems observed

flovilmart · 2018-03-06T15:44:56Z

@steven-supersolid I just thought of something, we could leverage Object.freeze in order to prevent mutations on the schema. The same way local mutations would not affect the original fields object, using Object.freeze(fields) would throw an error if ever we'd attempt to mutate the fields. What do you think?

steven-supersolid · 2018-03-06T16:59:11Z

Freeze could work as long as a TypeError is not thrown.

I think in that case we can:

Keep the current change which avoids calling injectDefaultSchema in reloadData
Freeze fields in getAllClasses or perhaps in injectDefaultSchema
Change injectDefaultSchema to copy the fields

flovilmart · 2018-03-06T17:00:25Z

YES this makes sense :)

steven-supersolid · 2018-04-13T14:34:49Z

I've done an experiment by adding a freeze to SchemaController.reloadData(), adding on line 410 Object.freeze(schema.fields); and adding a test. The test passes but the added code breaks existing functionality when DatabaseController.transformAuthData() is called, i.e.

DatabaseController.create() -> DatabaseController.transformAuthData()
DatabaseController.update() -> DatabaseController.transformAuthData()

So not sure how to proceed with this now

flovilmart · 2018-04-13T14:36:15Z

@steven-supersolid how does it break and how badly?

steven-supersolid · 2018-04-13T14:51:14Z

Those functions no longer work because they are trying to modify the schema fields, which have been frozen. I can push some breaking code if that will help?

flovilmart · 2018-04-16T13:20:42Z

Yep go ahead! This way we’ll probably be able to fix it up!

…n freezing schema fields (do not merge)

stale · 2018-09-18T06:17:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Modify fields instead of creating new object and copying fields

0391176

steven-supersolid added 2 commits February 16, 2018 10:14

Remove redundant call to injectDefaultSchema. Fix modification of def…

ef82929

…aultColumns._Audience

Merge branch 'master' into steven.large.schema.optimisation

68bc669

dplewis previously approved these changes Feb 16, 2018

View reviewed changes

dplewis added the needs investigation label Feb 17, 2018

Revert change to injectDefaultSchema. Remove additional call to injec…

1a6e3c1

…tDefaultSchema in reloadData

steven-supersolid commented Feb 19, 2018

View reviewed changes

steven-supersolid added a commit to supersolid/parse-server that referenced this pull request Feb 19, 2018

Cherry pick parse-community#4567 optimisation

42243aa

Add test for modifying schema. Demonstrate failure of other tests whe…

cbfb976

…n freezing schema fields (do not merge)

stale bot added the wontfix label Sep 18, 2018

stale bot closed this Sep 25, 2018

This was referenced Aug 18, 2022

refactor: upgrade @graphql-tools/utils from 8.6.13 to 8.9.0 #8129

Merged

refactor: upgrade @graphql-tools/schema from 8.5.0 to 8.5.1 #8130

Merged

refactor: upgrade @graphql-tools/merge from 8.3.0 to 8.3.1 #8131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimisation for schemas with many fields #4567

Optimisation for schemas with many fields #4567

steven-supersolid commented Feb 15, 2018

codecov bot commented Feb 15, 2018 •

edited

Loading

dplewis commented Feb 15, 2018 •

edited

Loading

steven-supersolid commented Feb 15, 2018 •

edited

Loading

flovilmart commented Feb 16, 2018

flovilmart commented Feb 16, 2018

dplewis commented Feb 16, 2018

flovilmart commented Feb 16, 2018

steven-supersolid commented Feb 17, 2018

flovilmart commented Feb 17, 2018

steven-supersolid commented Feb 19, 2018

steven-supersolid Feb 19, 2018

steven-supersolid Feb 19, 2018

steven-supersolid commented Mar 6, 2018

flovilmart commented Mar 6, 2018

steven-supersolid commented Mar 6, 2018

flovilmart commented Mar 6, 2018

steven-supersolid commented Apr 13, 2018

flovilmart commented Apr 13, 2018

steven-supersolid commented Apr 13, 2018

flovilmart commented Apr 16, 2018

stale bot commented Sep 18, 2018

Optimisation for schemas with many fields #4567

Optimisation for schemas with many fields #4567

Conversation

steven-supersolid commented Feb 15, 2018

codecov bot commented Feb 15, 2018 • edited Loading

Codecov Report

dplewis commented Feb 15, 2018 • edited Loading

steven-supersolid commented Feb 15, 2018 • edited Loading

flovilmart commented Feb 16, 2018

flovilmart commented Feb 16, 2018

dplewis commented Feb 16, 2018

flovilmart commented Feb 16, 2018

steven-supersolid commented Feb 17, 2018

flovilmart commented Feb 17, 2018

steven-supersolid commented Feb 19, 2018

steven-supersolid Feb 19, 2018

Choose a reason for hiding this comment

steven-supersolid Feb 19, 2018

Choose a reason for hiding this comment

steven-supersolid commented Mar 6, 2018

flovilmart commented Mar 6, 2018

steven-supersolid commented Mar 6, 2018

flovilmart commented Mar 6, 2018

steven-supersolid commented Apr 13, 2018

flovilmart commented Apr 13, 2018

steven-supersolid commented Apr 13, 2018

flovilmart commented Apr 16, 2018

stale bot commented Sep 18, 2018

codecov bot commented Feb 15, 2018 •

edited

Loading

dplewis commented Feb 15, 2018 •

edited

Loading

steven-supersolid commented Feb 15, 2018 •

edited

Loading