-
Notifications
You must be signed in to change notification settings - Fork 802
[STORE] Use Singleton pattern for Repository #822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1d5f8b1
to
658db5b
Compare
Hi – one viewpoint against this change, with an alternative suggested pattern: Composition is often preferable to inheritance and singletons bring about dependency and testing issues (happy to go in to more detail if needed). A pattern I've used recently goes along the lines of: class FooSearchService
module Errors
class SearchUnavailable < StandardError; end
end
# Optional
def self.instance
# Instantiate and cache a global instance if needed
end
attr_reader :repository
def initialize es_client:, index_name:
@repository = initialize_repository(
es_client: es_client,
index_name: index_name
)
end
def reindex_all force: false
if force
@repository.delete_index! if @repository.index_exists?
@repository.create_index!
else
ensure_available
end
# Can reindex all items here or trigger a separate background job
end
def index_item item
ensure_available
@repository.save item # or custom mapping from item to ES document can be done here
end
def delete_item item
ensure_available
@repository.delete item
end
def search query
ensure_available
results = @repository.search query
results # or custom parsing can be done here into domain model
end
private
def initialize_repository es_client:, index_name:
Elasticsearch::Persistence::Repository.new do
client es_client
index index_name
type :_doc
settings number_of_shards: 1, number_of_replicas: 0 do
mapping do
# …
end
end
end
end
def ensure_available
raise Errors::SearchUnavailable unless @repository.index_exists?
end
end I can see some potential benefits to doing it this way:
What do you think? Could this be the kind of pattern encouraged in the docs? |
Hi @jits, thank you for your feedback! I’m open to having the Repository be a module to be mixed into a custom class instead but I'm not sure if it's preferable to using inheritance from a base class. There are some issues with using a module instead:
In the second link you provided, it says “Singletons may often be modeled as a server within the application that accepts requests to send, store, or retrieve data and configure the resource state.” — This is exactly what a repository class is supposed to represent so it seems that using a class inherited from a Singleton is appropriate. With this pattern, I think it’s much cleaner where your business logic and app logic live. In your example, you are holding onto a class FooSearchService < Elasticsearch::Persistence::Repository::Base
module Errors
class SearchUnavailable < StandardError; end
end
settings number_of_shards: 1, number_of_replicas: 0 do
mapping do
# …
end
end
def reindex_all force: false
if force
delete_index! if index_exists?
create_index!
else
ensure_available
end
# Can reindex all items here or trigger a separate background job
end
def index_item item
ensure_available
save item # or custom mapping from item to ES document can be done here
end
def delete_item item
ensure_available
delete item
end
def search query
ensure_available
results = search query
results # or custom parsing can be done here into domain model
end
private
def ensure_available
raise Errors::SearchUnavailable unless index_exists?
end
end
# in a test:
FooSearchService.client = es_test_client
FooSearchService.index_name = test_index_name
# run test Is there something that you aren't able to accomplish using the above example? Lastly, can you tell me why it’s easier to test the search service in case I'm missing something? In a test using the changes in this pull request, can you just mock the client on your FooSearchService, if you need to? Thanks again! |
Hi @estolfo – thanks for your quick response! I guess it boils down to how much the library enforces for you versus how much flexibility and control it provides – it seems to me that the library should allow as much flexibility as possible, with some suggested patterns in the docs, and avoiding pitfalls as much as possible (I realise what I'm saying is probably an obvious thing to say, but I hope it gives some context to my suggestion(s)). To help clarify things in my head, I think we're talking about three patterns here:
My strong preference is for option (3), but it's totally fair for the library to allow the other two options as well. The reason why I prefer composition here is:
It's possible that I'm just not understanding the intended repository pattern here (or it's actual implementation) – the above comes from a more theoretical viewpoint (that can be applied to a lot of different types of services). Is there something special about the ElasticSearch library that requires a singleton pattern to be enforced out of the box (with no way to not be a singleton)? Also, just to check something: with the change in this PR, can you only ever have one repository ever in your app? I.e. you couldn't have I'll try to address some of your comments specifically (apologies if I miss something):
As mentioned above, this seems to be a bug in the underlying implementation? Or is this behaviour intended?
I would argue otherwise: with the inheritance + singleton pattern, you're essentially mixing your business logic with the repository logic directly in the same code namespace. (note: I'm not familiar with this "gateway" pattern that's going on under the hood, so my comment is purely in terms of how "normal" inheritance and singletons work). Holding on to an internal
Hopefully my above points/examples cover this, but I can elaborate some more on why I would not prefer to use a singleton pattern.
Certainly – I would like to test the search service without hitting ElasticSearch, and without stubbing HTTP calls. As you suggest, I could mock the client out. But that means I have to stub/mock out the client's methods that the repository is using, which I would have to dig out, and which seems like leaky abstractions to me. Instead, I use an internal Hope this helps. Ultimately, I think all 3 patterns mentioned earlier are valid and have pros and cons, and should be equally supported by this library. |
@jits thanks for your thorough responses and interest in this gem! I think the fundamental issue here is that you’re looking to use the repository class provided by the elasticsearch-persistence gem in a way that isn’t consistent with the theoretical Repository pattern. You’re using the repository more like a “client” with a Service Layer pattern. I’m not convinced that the library should provide as much flexibility as possible. I think instead the library should provide an API that makes most sense for the majority of users, that abstracts the API of Elasticsearch into an idiomatic Ruby interface, and provides a clean pattern for users to incorporate into their applications. If the library does none of the above, it will be difficult to use, will allow users to misuse the classes, and ultimately will lead to more issues opened in GitHub. For example, the leaking repository settings issues I linked above are a result of the library providing too much flexibility. My goal is to prevent this unexpected and problematic behavior through improved library design. That said, changing the elasticsearch-persistence gem doesn’t prevent you from continuing to take the approach that you do in If you’d like a concrete example of how the pull request changes are intended to be used, you can see this test, in particular. Changing the pull request to provide the base repository as a mixin doesn’t change any of the above (though there are tradeoffs). The intention would still be that users have a 1-1-1 mapping between a repository class, domain object, and index. I’m going to try out refactoring to use a repository mixin anyway to see if it seems more natural and perhaps you’ll find it makes more sense as well. |
@estolfo – thanks for the detailed explanation – I agree with pretty much all you're saying/suggesting - my original concern was only with making these repository instances singletons and enforcing this in the library, not with the repository pattern itself. My request for flexibility (those 3 options above) are purely in terms of how to extend and construct repository instances, not to provide three different patterns of data access. The fact that the 3rd option (composition) wraps things within a service layer is more of an implementation detail, but one which I think the library should allow, in the the isolated fashion I described. (And I would even say that this should be recommended in the documentation). In the link you mentioned, and looking at further literature I'm struggling to see where it says that repository instances need to be singletons.
I completely agree with this! I'm not sure if anything I've suggested goes against this? Ultimately I want the library to allow me to create repository instances that I can then use. But the way in which these repository instances can be extended and instantiated should be flexible to cater for different use cases. Note that flexibility != complexity – it's possible to achieve this flexibility with just one provided class (I think), as I'll propose below…
Please note I'm not strongly advocating for this mixin approach – I state this because it's twice now where you've mentioned this approach as though it's the main request from me. I do think the library should allow it, but my personal preference for this library is to allow you to instantiate a simple and isolated repository instance that I can then wrap in a service. But others may have differing use cases (e.g. someone may want to extend the save behaviour). Ultimately though it should always be about having repository instances that are isolated and as lean as possible. I would argue that the most likely use case will be to create an instance of a generic repository and have it configured to point to a particular index, with particular mappings, and then just use that repository in a simple way (as repositories are intended to be, I think?). This would be akin to the "Generic Repository Implementation" in the link you mentioned. My suspicion is that extending and creating your own repository class is less likely, but still possible (hence the need for the library to at least allow it). In terms of having singletons for repository instances (another option the library should allow) maybe this is as simple as suggesting in the documentation that you can do Perhaps naive, but my approach for this library would be to provide a simple generic
my_repository = Repository.new(
client: my_client,
index_name: 'foo',
settings: {...},
mappings: {...}
) … or 2. be extended into your own specialised repository class: class FooRepository < Repository
client ...
index_name 'foo'
settings do
...
end
mappings do
...
end
end … or 3. be extended AND made into a singleton: class FooRepository < Repository
include Singleton
# rest as above
end … but maybe this solution misses out on something else that the library and/or repository pattern provides? Thanks again for listening to these thoughts – ultimately, the only big concern I have is enforcing singleton behaviour. Otherwise, I want to use the repository pattern as way of accessing my collection of search items. |
Can you tell me what exactly having the class be a Singleton prevents you from doing in your example? def initialize_repository es_client:, index_name:
Class.new(Elasticsearch::Persistence::Repository::Base) do # This is the only line you have to change
client es_client
index index_name
type :_doc
settings number_of_shards: 1, number_of_replicas: 0 do
mapping do
# …
end
end
end
end |
@estolfo – I just think the singleton pattern is too restrictive and excessive to be enforced by this library out of the box, without a strong enough reason to do so. It's not about fulfilling my one use case only, it's about the flexibility for other/future use cases and the openness of the library. One example I've brought up multiple times above is using the same repository class to instantiate multiple repository instances that can access different ES clusters and/or indexes (e.g. for a sharding-esque pattern). Yes you can do this by creating a new anonymous class for each repository instance (using Another example I've alluded to is in tests: singletons carry state throughout the lifetime of the process, so if you use a singleton instance across multiple tests then you carry state in between these test cases. Yes, you can reset the whole state of the singleton instance in between each test case, but a) you have to remember to set this up, and b) it just seems unnecessary overhead. There's a lot of literature around to suggest that singletons are a dangerous pattern (another example) – just like global variables. I do believe they can be used judiciously, and for very high level things (like the You mentioned earlier: "idiomatic Ruby interface" – I don't think restricting to having singletons, and having these anonymous classes is very idiomatic Ruby. I've certainly not seen this in any other client library that provides store/repo abstractions. At this point, I'm not sure I have more to add. So I'll hand over back to you and to anyone else that has an opinion about this. |
Hi @jits Thanks so much for all your feedback and for sharing your opinion. I'm not sure if you've seen it, but I've opened another PR to refactor the code using a mixin. You can see an example in the PR description or following along with the specs in this file. |
Hi @estolfo – thanks for pointing me to this – I've had a quick look, but am struggling with spare time at the moment, so will do my best to provide feedback as best I can… This looks great! I'll leave some comments/thoughts on that PR now :) |
Closing in favor of #824 |
I propose we change the way the Repository persistence pattern is used in this pull request. With these changes, users will inherit from the base repository class and can define their own custom settings, or use the default ones. The Singleton pattern is enforced so that different repository instances cannot be created with varying settings that leak into each other. See this issue for an example of what this design will help avoid.
Example use: