-
-
Notifications
You must be signed in to change notification settings - Fork 918
Description
Two things I want to explore doing to try to improve the performance of XPath (and, transitively, CSS) searches:
- re-use XPathContext objects which are a little expensive to create
- expose libxml2's ability to compile XPath expressions
For (1), we need to be a bit careful:
- XPathContext is not thread-safe
- there is some state we need to set or un-set appropriately:
- namespaces (via
XPathContext#register_namespaces
) - variables (via
XPathContext#register_variable
)
- namespaces (via
- while preserving other state:
- the
nokogiri:
prefix used for dynamic function binding - the
nokogiri-builtin:
prefix used for our performance-optimized builtin functions - the built-in xpath functions themselves
- the
but the performance improvement could be significant, see this response from the current libxml2 maintainer indicating "best practice" is to keep one XPathContext per thread and re-use it.
The benchmark submitted by a user in #760 indicates a 4x(!) speedup on simple expressions by avoiding re-initializing an XPathContext object. It seems likely that the real-world speedup will be less (since cleaning up registered namespaces and variables will have some overhead), but it still seems like it would be a pretty decent speedup.
For (2), we'll need a new Ruby class to wrap the compiled expression represented by xmlXPathCompExprPtr
, and a way to pass that into #xpath
, but that seems like relatively straightforward work. (Note this API won't be available in JRuby.)
I'd like to get a rough benchmark ahead of time to see how much time this will save us, for simple and for complex expressions -- after a brief search I couldn't find any prior results here.