-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Optimisation for schemas with many fields #4567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimisation for schemas with many fields #4567
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4567 +/- ##
==========================================
- Coverage 92.9% 92.86% -0.04%
==========================================
Files 118 118
Lines 8453 8491 +38
==========================================
+ Hits 7853 7885 +32
- Misses 600 606 +6
Continue to review full report at Codecov.
|
So the goal here
|
Yes, so the code change does the first option - makes
or equivalently:
If we don't mutate then the object has to be copied but that is what is slow when there are a lot of properties. A new variable won't work because that will just be a reference to P.S. the |
This needs more investigation before we can safely merge
@dplewis sorry to have dismiss the review, but we originally implemented the copy of fields to make sure we don't generate side effects into the schema table. I need to take some time to wrap my head around it. We need to be utterly careful when changing those behaviors as they can have unintended side effects. |
@steven-supersolid could you provide examples also of those slow calls, examples etc... so we can measure improvements and find a proper workaround? |
@flovilmart No worries, don't want a repeat of #4409 |
Ahaha true that no worries man! I'll just check that the fields / schema never come back to the DB layer once injectected. |
Agree we need to be really careful with this. It's my understanding the default fields get injected to each schema entry (if missing) and then saved to the database anyway so did not see a security concern but I may have overlooked something. In my integration test I created an object with 50 properties to set the schema entry at the same size. Then I created a new object with those same 50 properties and saved, timing this operation. For comparison I created an object with 1000 properties to update the schema, then repeated the step of creating a new object with the original 50 properties and timing the save of that. The time taken was approximately double even though saving the same data (and schema was cached by this point). I tracked the slow area of code to injectDefaultSchema. In hindsight it stands to reason that copying an object with 1000 properties is not going to be quick. The matter is made worse because when modifying multiple properties then the default schema is injected for every field modified, in series. So in my test 50 x 1000 properties were copied for the update operation. If we want to look at a larger refactor then I suggest that the default fields are added when a schema entry is created only, as it seems redundant to add them on every update. |
This makes way more sense. Reducing the number of overall calls is the way to go, and more long term if we identify this operation as a bottleneck. |
…tDefaultSchema in reloadData
Examining the code it seems I have reverted the original change and removed this redundant call. Also left the minor code cleanup in. |
@@ -330,26 +330,26 @@ const injectDefaultSchema = ({className, fields, classLevelPermissions, indexes} | |||
|
|||
const _HooksSchema = {className: "_Hooks", fields: defaultColumns._Hooks}; | |||
const _GlobalConfigSchema = { className: "_GlobalConfig", fields: defaultColumns._GlobalConfig } | |||
const _PushStatusSchema = convertSchemaToAdapterSchema(injectDefaultSchema({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to call injectDefaultSchema here as it is called inside convertSchemaToAdapterSchema
className: "_Audience", | ||
fields: defaultColumns._Audience, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent and not required as convertSchemaToAdapterSchema calls injectDefaultSchema which adds these fields anyway
Any thoughts on the cut down changes? I have tried this in production and no problems observed |
@steven-supersolid I just thought of something, we could leverage |
Freeze could work as long as a TypeError is not thrown. I think in that case we can:
|
YES this makes sense :) |
I've done an experiment by adding a freeze to
So not sure how to proceed with this now |
@steven-supersolid how does it break and how badly? |
Those functions no longer work because they are trying to modify the schema fields, which have been frozen. I can push some breaking code if that will help? |
Yep go ahead! This way we’ll probably be able to fix it up! |
…n freezing schema fields (do not merge)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
When there are many (1000+) fields then updates to objects take a long time even when very few values are changed. The slow code is in
SchemaController.validateObject
and can be traced back toSchemaController.injectDefaultSchema
viaSchemaController.reloadData
.The previous code in
injectDefaultSchema
when transpiled becomes:This creates a new fields object and does not modify the fields parameter but copies each field, so is safe but slow.
The downside of the optimisation is that the fields parameter will be modified and this could be unexpected (although seems to work OK with the current code). To make this clearer we could refactor to add a function to
injectDefaultFields
with no return type and where injection here should be clear that modification is occurring.