Skip to content

Optimisation for schemas with many fields #4567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steven-supersolid
Copy link
Contributor

When there are many (1000+) fields then updates to objects take a long time even when very few values are changed. The slow code is in SchemaController.validateObject and can be traced back to SchemaController.injectDefaultSchema via SchemaController.reloadData.

The previous code in injectDefaultSchema when transpiled becomes:

fields: Object.assign({}, defaultColumns._Default, defaultColumns[className] || {}, fields)

This creates a new fields object and does not modify the fields parameter but copies each field, so is safe but slow.

The downside of the optimisation is that the fields parameter will be modified and this could be unexpected (although seems to work OK with the current code). To make this clearer we could refactor to add a function to injectDefaultFields with no return type and where injection here should be clear that modification is occurring.

@codecov
Copy link

codecov bot commented Feb 15, 2018

Codecov Report

Merging #4567 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4567      +/-   ##
==========================================
- Coverage    92.9%   92.86%   -0.04%     
==========================================
  Files         118      118              
  Lines        8453     8491      +38     
==========================================
+ Hits         7853     7885      +32     
- Misses        600      606       +6
Impacted Files Coverage Δ
src/Controllers/SchemaController.js 96.47% <100%> (ø) ⬆️
src/Routers/PushRouter.js 92.85% <0%> (-3.58%) ⬇️
src/RestWrite.js 93.1% <0%> (-0.55%) ⬇️
src/Adapters/Storage/Mongo/MongoStorageAdapter.js 95.54% <0%> (-0.24%) ⬇️
...dapters/Storage/Postgres/PostgresStorageAdapter.js 97.11% <0%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cac14bc...cbfb976. Read the comment docs.

@dplewis
Copy link
Member

dplewis commented Feb 15, 2018

So the goal here

  • Not copy the field parameter (make it the target of Object.assign instead of source)
  • Not mutate the field parameter (store it in a new variable is not an option)

@steven-supersolid
Copy link
Contributor Author

steven-supersolid commented Feb 15, 2018

Yes, so the code change does the first option - makes fields the target of Object.assign but in that case it will be mutated.

fields: Object.assign(fields || {}, defaultColumns._Default, defaultColumns[className] || {})

or equivalently:

Object.assign(fields || {}, defaultColumns._Default, defaultColumns[className] || {})
...
fields: fields

If we don't mutate then the object has to be copied but that is what is slow when there are a lot of properties.

A new variable won't work because that will just be a reference to fields, so adding properties to the new variable will add properties to fields too.

P.S. the fields || {} is required because in some cases fields is undefined

dplewis
dplewis previously approved these changes Feb 16, 2018
@flovilmart flovilmart dismissed dplewis’s stale review February 16, 2018 21:13

This needs more investigation before we can safely merge

@flovilmart
Copy link
Contributor

@dplewis sorry to have dismiss the review, but we originally implemented the copy of fields to make sure we don't generate side effects into the schema table. I need to take some time to wrap my head around it. We need to be utterly careful when changing those behaviors as they can have unintended side effects.

@flovilmart
Copy link
Contributor

@steven-supersolid could you provide examples also of those slow calls, examples etc... so we can measure improvements and find a proper workaround?

@dplewis
Copy link
Member

dplewis commented Feb 16, 2018

@flovilmart No worries, don't want a repeat of #4409

@flovilmart
Copy link
Contributor

Ahaha true that no worries man! I'll just check that the fields / schema never come back to the DB layer once injectected.

@steven-supersolid
Copy link
Contributor Author

Agree we need to be really careful with this. It's my understanding the default fields get injected to each schema entry (if missing) and then saved to the database anyway so did not see a security concern but I may have overlooked something.

In my integration test I created an object with 50 properties to set the schema entry at the same size. Then I created a new object with those same 50 properties and saved, timing this operation. For comparison I created an object with 1000 properties to update the schema, then repeated the step of creating a new object with the original 50 properties and timing the save of that. The time taken was approximately double even though saving the same data (and schema was cached by this point). I tracked the slow area of code to injectDefaultSchema.

In hindsight it stands to reason that copying an object with 1000 properties is not going to be quick. The matter is made worse because when modifying multiple properties then the default schema is injected for every field modified, in series. So in my test 50 x 1000 properties were copied for the update operation.

If we want to look at a larger refactor then I suggest that the default fields are added when a schema entry is created only, as it seems redundant to add them on every update.

@flovilmart
Copy link
Contributor

If we want to look at a larger refactor then I suggest that the default fields are added when a schema entry is created only, as it seems redundant to add them on every update.

This makes way more sense. Reducing the number of overall calls is the way to go, and more long term if we identify this operation as a bottleneck.

@steven-supersolid
Copy link
Contributor Author

Examining the code it seems reloadData always calls getAllClasses and there if the schema is not cached then after getting from _dbAdapter, injectDefaultSchema is called anyway before adding to the cache. So I think the call to injectDefaultSchema in reloadData is redundant and can be removed.

I have reverted the original change and removed this redundant call. Also left the minor code cleanup in.

@@ -330,26 +330,26 @@ const injectDefaultSchema = ({className, fields, classLevelPermissions, indexes}

const _HooksSchema = {className: "_Hooks", fields: defaultColumns._Hooks};
const _GlobalConfigSchema = { className: "_GlobalConfig", fields: defaultColumns._GlobalConfig }
const _PushStatusSchema = convertSchemaToAdapterSchema(injectDefaultSchema({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to call injectDefaultSchema here as it is called inside convertSchemaToAdapterSchema

className: "_Audience",
fields: defaultColumns._Audience,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent and not required as convertSchemaToAdapterSchema calls injectDefaultSchema which adds these fields anyway

steven-supersolid added a commit to supersolid/parse-server that referenced this pull request Feb 19, 2018
@steven-supersolid
Copy link
Contributor Author

Any thoughts on the cut down changes? I have tried this in production and no problems observed

@flovilmart
Copy link
Contributor

@steven-supersolid I just thought of something, we could leverage Object.freeze in order to prevent mutations on the schema. The same way local mutations would not affect the original fields object, using Object.freeze(fields) would throw an error if ever we'd attempt to mutate the fields. What do you think?

@steven-supersolid
Copy link
Contributor Author

Freeze could work as long as a TypeError is not thrown.

I think in that case we can:

  1. Keep the current change which avoids calling injectDefaultSchema in reloadData
  2. Freeze fields in getAllClasses or perhaps in injectDefaultSchema
  3. Change injectDefaultSchema to copy the fields

@flovilmart
Copy link
Contributor

YES this makes sense :)

@steven-supersolid
Copy link
Contributor Author

I've done an experiment by adding a freeze to SchemaController.reloadData(), adding on line 410 Object.freeze(schema.fields); and adding a test. The test passes but the added code breaks existing functionality when DatabaseController.transformAuthData() is called, i.e.

DatabaseController.create() -> DatabaseController.transformAuthData()
DatabaseController.update() -> DatabaseController.transformAuthData()

So not sure how to proceed with this now

@flovilmart
Copy link
Contributor

@steven-supersolid how does it break and how badly?

@steven-supersolid
Copy link
Contributor Author

Those functions no longer work because they are trying to modify the schema fields, which have been frozen. I can push some breaking code if that will help?

@flovilmart
Copy link
Contributor

Yep go ahead! This way we’ll probably be able to fix it up!

@stale
Copy link

stale bot commented Sep 18, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants