-
Notifications
You must be signed in to change notification settings - Fork 5
Description
(There is more general task about caching #69, but I want to eat this pie piece by piece.)
We're going to add ddl.bucket_id() function (see #76). The function may be called quite frequently, so it worth to take care to its performance.
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space, mainly due to those actions:
- MsgPack decoding.
loadstring()is the function is declared as code ({body = <...>}).- Extra Lua GC pressure on re-cretion of usually same objects.
Ideally obtaining of the function should be just Lua table lookup. And it is possible to achieve.
The only way to track _ddl_sharding_func changes is to set a trigger on the space to track modifications (on_replace)1. Since it is not always possible to set a trigger when the module is just loaded2, I propose a trick.
The key idea is to generate an initial cache value and set the trigger when we access the sharding function information first time. After this the cache will be updated 'in background' (by the trigger) and we can just access the cache.
What to consider:
- Take care to access from different fibers (see the locked() function and the synchronized module for examples).
- But implement this locking in a way that works good with hot reload (including cartridge's one, which cleans up all globals).
- How to remove the old trigger if our code was unloaded (especially if _G was cleaned up by cartridge)? There are some ideas how to keep a state between reloads in https://github.com/tarantool/conf/issues/2 (at end of the issue description). A Lua/C module may also use Lua registry, but it is not our case.
- Don't use the cache in a transaction? That's slow, but how else we can ensure that the cache will correspond to the state visible in the transaction?
- Perform
loadstring()when the function is defined as code (to don't do that each time). - But don't save the function itself if it is defined by name: a user may replace it. Maybe just parse
'dot.notation'into something like{'dot', 'notation'}.
Optimization trick:
We can use the trick with two implementations of the cache access function (see src/box/lua/load_cfg.lua in tarantool for example). The first function doing all the work: check whether the trigger is set (and the initial cache is generated), set the trigger and generate the cache if necessary, access cache, replace itself with the second function. The second function skips extra checks and just access the cache.
I'll note that the first function must not set the trigger unconditionally, because it may be called after hot reload. See tarantool/tarantool#5826.
Looks a bit tricky, but doable. Opinions?
Footnotes
-
I filed https://github.com/tarantool/tarantool/issues/6544 and https://github.com/tarantool/tarantool/issues/6545 to track it using the database schema version in a future. I think it may simplify some future code. ↩
-
box may be unconfigured (or not fully loaded), when the module is
required first time. ↩