Description
Use Case
The get() method on a provider is currently called too many times. The operations performed by the get() method can often be somewhat time consuming (e.g. slow command, slow API). The Resource API specification at https://github.com/puppetlabs/puppet-specifications/blob/master/language/resource-api/README.md acknowledges that when it states that calls to get() should be minimal:
- "The runtime environment calls get with a minimal set of names, and keeps track of additional instances returned to avoid double querying. To gain the most benefits from batching implementations, the runtime minimizes the number of calls into get." from https://github.com/puppetlabs/puppet-specifications/blob/master/language/resource-api/README.md#provider-feature-simple_get_filter
- "A transaction will usually call get once" from: https://github.com/puppetlabs/puppet-specifications/blob/master/language/resource-api/README.md#runtime-environment
The problem however is that, in practice, the calls to get are not always minimal.
Here are the problems with the current implementation:
- The instances() method is called in primarily (only?) two scenarios: 1) "puppet resource" is used. and 2) By resources which use "resource generation" (probably to support purging: see "resources" and "crayfishx/purge". instances() attempts to cache results with the "cache_current_state" method, but that caches state into each resource instance and those are not preserved between calls to instances() - so they are not available to other instances() calls nor to Puppet for the normal apply transaction.
- When the provider does not use "simple_get_filter" the state is retrieved in refresh_current_state per resource instance, however, only the results for the sought resource are saved, discarding everything else returned by get. get() is called this way once per resource: https://github.com/puppetlabs/puppet-resource_api/blob/main/lib/puppet/resource_api.rb#L256
The case where get() is called (in a type which does not support "simple_get_filter") and returns all resources it is particularly bad because that is called to enumerate all the resources perhaps hundreds (or thousands) of times, for a use-heavy resource, with all the results but one discarded.
Describe the Solution You Would Like
Per the Resource API specification, the Resource API should actually attempt to minimize the calls to get(). Once a resource is retrieved, it should not be retrieved a second time.
The Resource API should establish and manage a cache which persists for the duration of a "puppet resource", "puppet apply" or single "puppet agent" transaction which contains any values fetched from the provider get().
Describe Alternatives You've Considered
Each provider could implement its own cache at the provider level. Alternatively, the Resource API could provide an enhancement or alternate to the SimpleProvider which implements caching entirely at the provider layer as a type of wrapper. However, if the goal is to only call get() once (or, no more than once per resource) then this should probably be implemented within the Resource API so that the person implementing a new RSAPI Provider need not concern themselves with the particulars of how Puppet/RSAPI gets resources.