Skip to content

[STORE] Use Singleton pattern for Repository #822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

estolfo
Copy link
Contributor

@estolfo estolfo commented Aug 1, 2018

I propose we change the way the Repository persistence pattern is used in this pull request. With these changes, users will inherit from the base repository class and can define their own custom settings, or use the default ones. The Singleton pattern is enforced so that different repository instances cannot be created with varying settings that leak into each other. See this issue for an example of what this design will help avoid.

Example use:

class NotesRepository < Elasticsearch::Persistence::Repository::Base
  client Elasticsearch::Persistence::Repository::Base.client
  index_name 'notes_repository'
  document_type 'notes'

  def deserialize(document)
    document.delete(:created_at)
    Note.new(document)
  end
end

@estolfo estolfo force-pushed the refactor-repository branch from 1d5f8b1 to 658db5b Compare August 1, 2018 12:03
@jits
Copy link

jits commented Aug 1, 2018

Hi – one viewpoint against this change, with an alternative suggested pattern:

Composition is often preferable to inheritance and singletons bring about dependency and testing issues (happy to go in to more detail if needed).

A pattern I've used recently goes along the lines of:

class FooSearchService
  module Errors
    class SearchUnavailable < StandardError; end
  end

  # Optional
  def self.instance
    # Instantiate and cache a global instance if needed
  end

  attr_reader :repository

  def initialize es_client:, index_name:
    @repository = initialize_repository(
      es_client: es_client,
      index_name: index_name
    )
  end

  def reindex_all force: false
    if force
      @repository.delete_index! if @repository.index_exists?
      @repository.create_index!
    else
      ensure_available
    end

    # Can reindex all items here or trigger a separate background job
  end

  def index_item item
    ensure_available

    @repository.save item  # or custom mapping from item to ES document can be done here
  end

  def delete_item item
    ensure_available

    @repository.delete item
  end

  def search query
    ensure_available

    results = @repository.search query
    results  # or custom parsing can be done here into domain model
  end
  
  private

  def initialize_repository es_client:, index_name:
    Elasticsearch::Persistence::Repository.new do
      client es_client

      index index_name

      type :_doc

      settings number_of_shards: 1, number_of_replicas: 0 do
        mapping do
          # …
        end
      end
    end
  end

  def ensure_available
    raise Errors::SearchUnavailable unless @repository.index_exists?
  end
end

Full example

I can see some potential benefits to doing it this way:

  • The search service itself is decoupled from the underlying repository, avoiding the mixing of repository logic vs app/business logic.
  • It's easy to test the search service in isolation by stubbing/mocking out the call to Elasticsearch::Persistence::Repository.new.
  • The repository is still accessible directly if needed (though should probably be discouraged).
  • Multiple instances of this search service could be used, e.g. to access different ES instances.
  • App specific errors can be thrown by catching and handling the lower level repository / ES client errors.
  • If you do want a global instance, you can still have FooSearchService.instance. But this is purely up to the consumer of this library.
  • Overall: there's more flexibility to handle custom app/business logic in the search service layer and call on the repository for ES interactions.

What do you think? Could this be the kind of pattern encouraged in the docs?

@estolfo
Copy link
Contributor Author

estolfo commented Aug 1, 2018

Hi @jits, thank you for your feedback! I’m open to having the Repository be a module to be mixed into a custom class instead but I'm not sure if it's preferable to using inheritance from a base class. There are some issues with using a module instead:

In the second link you provided, it says “Singletons may often be modeled as a server within the application that accepts requests to send, store, or retrieve data and configure the resource state.” — This is exactly what a repository class is supposed to represent so it seems that using a class inherited from a Singleton is appropriate.

With this pattern, I think it’s much cleaner where your business logic and app logic live. In your example, you are holding onto a @repository reference, which seems to be a roundabout way of calling methods on the repository, and I'm not sure what the benefit is. I'm also not sure what the implications would be if you were using your FooSearchService in different threads (like with Sidekiq, for example). With your example, you can just define a default repository as the class itself and then change settings as you need to. For example, you would rewrite your code like the following if you were to use the changes in this pull request.

class FooSearchService < Elasticsearch::Persistence::Repository::Base
  module Errors
    class SearchUnavailable < StandardError; end
  end

  settings number_of_shards: 1, number_of_replicas: 0 do
    mapping do
      # …
    end
  end

  def reindex_all force: false
    if force
      delete_index! if index_exists?
      create_index!
    else
      ensure_available
    end

    # Can reindex all items here or trigger a separate background job
  end

  def index_item item
    ensure_available
    save item  # or custom mapping from item to ES document can be done here
  end

  def delete_item item
    ensure_available
    delete item
  end

  def search query
    ensure_available
    results = search query
    results  # or custom parsing can be done here into domain model
  end

  private

  def ensure_available
    raise Errors::SearchUnavailable unless index_exists?
  end
end

# in a test:
FooSearchService.client = es_test_client
FooSearchService.index_name = test_index_name
# run test

Is there something that you aren't able to accomplish using the above example?

Lastly, can you tell me why it’s easier to test the search service in case I'm missing something? In a test using the changes in this pull request, can you just mock the client on your FooSearchService, if you need to?

Thanks again!

@jits
Copy link

jits commented Aug 1, 2018

Hi @estolfo – thanks for your quick response!

I guess it boils down to how much the library enforces for you versus how much flexibility and control it provides – it seems to me that the library should allow as much flexibility as possible, with some suggested patterns in the docs, and avoiding pitfalls as much as possible (I realise what I'm saying is probably an obvious thing to say, but I hope it gives some context to my suggestion(s)).

To help clarify things in my head, I think we're talking about three patterns here:

  1. Mixin / inheritance – already available in v5.x now via a mixin module.
  2. Mixin / inheritance with enforced singleton behaviour – the proposed change in this PR.
  3. Composition – also available right now (partially?) – my earlier suggestion of wrapping a completely independent and isolated @repository instance in higher level "search service" that other parts of the codebase interact with.

My strong preference is for option (3), but it's totally fair for the library to allow the other two options as well.

The reason why I prefer composition here is:

  • I don't want my search service to be polluted with all the methods and implementation from the repository class (hence why I have an internal @repository instance that I only control internally and in a controlled fashion) – this is to avoid leaky abstractions, and avoiding things like changing it's internal config on the fly.
  • I want to be able to instantiate my search service as many times as needed, in an isolated fashion, with different config potentially giving different behaviour / characteristics.
  • I want things to be as isolated as possible to minimise side-effects. E.g. I didn't realise that changing index_name on any repository instance currently changes the index name for all instances – that sounds rather painful to me :( and quite unpredictable behaviour.

It's possible that I'm just not understanding the intended repository pattern here (or it's actual implementation) – the above comes from a more theoretical viewpoint (that can be applied to a lot of different types of services). Is there something special about the ElasticSearch library that requires a singleton pattern to be enforced out of the box (with no way to not be a singleton)?

Also, just to check something: with the change in this PR, can you only ever have one repository ever in your app? I.e. you couldn't have FooRepository and BarRepository talking to different indexes or even different ElasticSearch instances? (Apologies, I haven't had a chance to look closely at your changes).

I'll try to address some of your comments specifically (apologies if I miss something):

If you can’t enforce that only one instance be created, then you’ll potentially have issues like these: #811, #385

As mentioned above, this seems to be a bug in the underlying implementation? Or is this behaviour intended?

With this pattern, I think it’s much cleaner where your business logic and app logic live. In your example, you are holding onto a @repository reference, which seems to be a roundabout way of calling methods on the repository, and I'm not sure what the benefit is. I'm also not sure what the implications would be if you were using your FooSearchService in different threads (like with Sidekiq, for example). With your example, you can just define a default repository as the class itself and then change settings as you need to. For example, you would rewrite your code like the following if you were to use the changes in this pull request.

I would argue otherwise: with the inheritance + singleton pattern, you're essentially mixing your business logic with the repository logic directly in the same code namespace. (note: I'm not familiar with this "gateway" pattern that's going on under the hood, so my comment is purely in terms of how "normal" inheritance and singletons work). Holding on to an internal @repository (i.e. composition) and calling methods on it is exactly the way to achieve better isolation and control over interactions with ElasticSearch (assuming first that the repository instance itself is well isolated). This then makes it easier to reason about things in multi-threaded/concurrent environments – you could either make your search service thread safe and use one instance across threads, or instantiate a separate instance per thread, potentially with different characteristics, and using different clients under the hood. Again, the control and flexibility is in the consumer of the library. And the docs can provide guidance here.

Is there something that you aren't able to accomplish using the above example?

Hopefully my above points/examples cover this, but I can elaborate some more on why I would not prefer to use a singleton pattern.

Lastly, can you tell me why it’s easier to test the search service in case I'm missing something? In a test using the changes in this pull request, can you just mock the client on your FooSearchService, if you need to?

Certainly – I would like to test the search service without hitting ElasticSearch, and without stubbing HTTP calls. As you suggest, I could mock the client out. But that means I have to stub/mock out the client's methods that the repository is using, which I would have to dig out, and which seems like leaky abstractions to me. Instead, I use an internal @repository instance, which is one of the main boundaries in my search service, so I would like to just mock out the calls that I directly make to this instance, rather than digging into the next layer below it. (Note: a better pattern would be to pass in an instance of repository to the search service, instead of creating it internally, which would further decouple things nicely. But in my case I felt that the ES mapping should be close to the rest of the search indexing and retrieval logic, so I create a new repository internally).

Hope this helps.

Ultimately, I think all 3 patterns mentioned earlier are valid and have pros and cons, and should be equally supported by this library.

@estolfo
Copy link
Contributor Author

estolfo commented Aug 2, 2018

@jits thanks for your thorough responses and interest in this gem! I think the fundamental issue here is that you’re looking to use the repository class provided by the elasticsearch-persistence gem in a way that isn’t consistent with the theoretical Repository pattern. You’re using the repository more like a “client” with a Service Layer pattern.
There are a number of elasticsearch gems available and the job of this particular one is to provide an API and pattern through which you can persist your data into Elasticsearch. We previously had two patterns: the ActiveRecord and Repository patterns available via the gem’s API but the former is deprecated because of a number of reasons. With the Repository pattern, you would have one Repository class per index. That makes a lot of sense with Elasticsearch 6.0, as you can no longer have multiple types in a single index, so the settings— in particular, the document_type setting— should be immutable for a repository instance.

I’m not convinced that the library should provide as much flexibility as possible. I think instead the library should provide an API that makes most sense for the majority of users, that abstracts the API of Elasticsearch into an idiomatic Ruby interface, and provides a clean pattern for users to incorporate into their applications. If the library does none of the above, it will be difficult to use, will allow users to misuse the classes, and ultimately will lead to more issues opened in GitHub. For example, the leaking repository settings issues I linked above are a result of the library providing too much flexibility. My goal is to prevent this unexpected and problematic behavior through improved library design.

That said, changing the elasticsearch-persistence gem doesn’t prevent you from continuing to take the approach that you do in FooSearchService. Given that you are using the repository object more like a client, you could just switch out the @repository instance variable for an Elasticsearch::Transport::Client and add some custom methods. Alternatively, you could create a class that includes the Elasticsearch::Model module and set the client there. Nevertheless, if you’re looking to use the Repository pattern, I would recommend creating a Repository class for each index you have, so there’s a 1-1-1 mapping between a client-side Domain object class, Repository class, and server-side Elasticsearch index. This is a nice reference for how the Repository pattern should be used.

If you’d like a concrete example of how the pull request changes are intended to be used, you can see this test, in particular.
This is exactly how the library was originally intended to be used, even before these proposed changes, as you can see here in the documentation.

Changing the pull request to provide the base repository as a mixin doesn’t change any of the above (though there are tradeoffs). The intention would still be that users have a 1-1-1 mapping between a repository class, domain object, and index. I’m going to try out refactoring to use a repository mixin anyway to see if it seems more natural and perhaps you’ll find it makes more sense as well.

@jits
Copy link

jits commented Aug 2, 2018

@estolfo – thanks for the detailed explanation – I agree with pretty much all you're saying/suggesting - my original concern was only with making these repository instances singletons and enforcing this in the library, not with the repository pattern itself.

My request for flexibility (those 3 options above) are purely in terms of how to extend and construct repository instances, not to provide three different patterns of data access. The fact that the 3rd option (composition) wraps things within a service layer is more of an implementation detail, but one which I think the library should allow, in the the isolated fashion I described. (And I would even say that this should be recommended in the documentation).

In the link you mentioned, and looking at further literature I'm struggling to see where it says that repository instances need to be singletons.

I’m not convinced that the library should provide as much flexibility as possible. I think instead the library should provide an API that makes most sense for the majority of users, that abstracts the API of Elasticsearch into an idiomatic Ruby interface, and provides a clean pattern for users to incorporate into their applications. If the library does none of the above, it will be difficult to use, will allow users to misuse the classes, and ultimately will lead to more issues opened in GitHub.

I completely agree with this! I'm not sure if anything I've suggested goes against this? Ultimately I want the library to allow me to create repository instances that I can then use. But the way in which these repository instances can be extended and instantiated should be flexible to cater for different use cases. Note that flexibility != complexity – it's possible to achieve this flexibility with just one provided class (I think), as I'll propose below…

Changing the pull request to provide the base repository as a mixin doesn’t change any of the above (though there are tradeoffs).

Please note I'm not strongly advocating for this mixin approach – I state this because it's twice now where you've mentioned this approach as though it's the main request from me. I do think the library should allow it, but my personal preference for this library is to allow you to instantiate a simple and isolated repository instance that I can then wrap in a service. But others may have differing use cases (e.g. someone may want to extend the save behaviour). Ultimately though it should always be about having repository instances that are isolated and as lean as possible.

I would argue that the most likely use case will be to create an instance of a generic repository and have it configured to point to a particular index, with particular mappings, and then just use that repository in a simple way (as repositories are intended to be, I think?). This would be akin to the "Generic Repository Implementation" in the link you mentioned.

My suspicion is that extending and creating your own repository class is less likely, but still possible (hence the need for the library to at least allow it).

In terms of having singletons for repository instances (another option the library should allow) maybe this is as simple as suggesting in the documentation that you can do include Singleton if you want. I think enforcing this in the library is a step too far.

Perhaps naive, but my approach for this library would be to provide a simple generic Repository class that can…

  1. Be instantiated on it's own e.g.:
my_repository = Repository.new(
  client: my_client,
  index_name: 'foo',
  settings: {...},
  mappings: {...}
)

… or 2. be extended into your own specialised repository class:

class FooRepository < Repository
  client ...
  index_name 'foo'
  settings do
    ...
  end
  mappings do
    ...
  end
end

… or 3. be extended AND made into a singleton:

class FooRepository < Repository
  include Singleton

  # rest as above
end

… but maybe this solution misses out on something else that the library and/or repository pattern provides?

Thanks again for listening to these thoughts – ultimately, the only big concern I have is enforcing singleton behaviour. Otherwise, I want to use the repository pattern as way of accessing my collection of search items.

@estolfo
Copy link
Contributor Author

estolfo commented Aug 3, 2018

Can you tell me what exactly having the class be a Singleton prevents you from doing in your example?
It sounds like your biggest concern is being able to have an isolated repository instance that you can then wrap in a service. You can still do this; you would only have to change a single line of code in your #initialize_repository method:

def initialize_repository es_client:, index_name:
  Class.new(Elasticsearch::Persistence::Repository::Base) do # This is the only line you have to change
    client es_client

    index index_name

    type :_doc

    settings number_of_shards: 1, number_of_replicas: 0 do
      mapping do
        # …
      end
    end
  end
end

@estolfo estolfo changed the title Use Singleton pattern for Repository [STORE] Use Singleton pattern for Repository Aug 3, 2018
@jits
Copy link

jits commented Aug 3, 2018

@estolfo – I just think the singleton pattern is too restrictive and excessive to be enforced by this library out of the box, without a strong enough reason to do so. It's not about fulfilling my one use case only, it's about the flexibility for other/future use cases and the openness of the library.

One example I've brought up multiple times above is using the same repository class to instantiate multiple repository instances that can access different ES clusters and/or indexes (e.g. for a sharding-esque pattern). Yes you can do this by creating a new anonymous class for each repository instance (using Class.new as you suggest) but that doesn't sound very idiomatic and plus you lose some of the benefits of having classes in the first place – e.g. if you inspect the instance's class.name you get nil since it's anonymous. In my opinion, it's unnecessary to have to do this.

Another example I've alluded to is in tests: singletons carry state throughout the lifetime of the process, so if you use a singleton instance across multiple tests then you carry state in between these test cases. Yes, you can reset the whole state of the singleton instance in between each test case, but a) you have to remember to set this up, and b) it just seems unnecessary overhead.

There's a lot of literature around to suggest that singletons are a dangerous pattern (another example) – just like global variables. I do believe they can be used judiciously, and for very high level things (like the Rails.{blah} global) – just not enforced by a library like this.

You mentioned earlier: "idiomatic Ruby interface" – I don't think restricting to having singletons, and having these anonymous classes is very idiomatic Ruby. I've certainly not seen this in any other client library that provides store/repo abstractions.

At this point, I'm not sure I have more to add. So I'll hand over back to you and to anyone else that has an opinion about this.

@estolfo
Copy link
Contributor Author

estolfo commented Aug 7, 2018

Hi @jits Thanks so much for all your feedback and for sharing your opinion. I'm not sure if you've seen it, but I've opened another PR to refactor the code using a mixin. You can see an example in the PR description or following along with the specs in this file.

@jits
Copy link

jits commented Aug 8, 2018

Hi @estolfo – thanks for pointing me to this – I've had a quick look, but am struggling with spare time at the moment, so will do my best to provide feedback as best I can…

This looks great! I'll leave some comments/thoughts on that PR now :)

@estolfo
Copy link
Contributor Author

estolfo commented Aug 13, 2018

Closing in favor of #824

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants