Skip to content

API object exposes a huge surface. #172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
scudette opened this issue Jul 12, 2020 · 5 comments
Open

API object exposes a huge surface. #172

scudette opened this issue Jul 12, 2020 · 5 comments

Comments

@scudette
Copy link

Currently the API is exposed via an API object which initializes all the API functions in members (https://github.com/elastic/go-elasticsearch/blob/master/esapi/api._.go#L465)

Our application only needs to access the Bulk API to push to elastic but we are forced to use the API Client as obtained from NewClient(), This adds approximately 6mb to our executable size for a single linked function! Please consider making the API a bit lighter to use. We only need to replicate a few functions in a more flexible way to avoid this cost:

  1. Allow a transport to be created without an API object (Basically NewClient() first part without calling esapi.New(tp)
  2. Make the new* functions public in esapi so we could initialize just the API functions we need.

This will allow the Go compiler to remove all the unused code from the binary since it is not called. The current API design forces the Go compiler to include all code because it has no idea if it will be called at runtime through the API members.

It would actually be cleaner design to decompose the API client into a transport and distinct handlers that require the transport as parameter - i.e. rather than calling through a function pointer in the API struct, simply allow callers to call the handlers directly.

@karmi
Copy link
Contributor

karmi commented Jul 17, 2020

I understand your concern about the size. The thing to keep in mind is that the goal of the elasticsearch package is to support all the Elasticsearch APIs, as well as being consistent with the other language clients in naming and calling patterns.

I somewhat understand your suggestion to change the composition, but a more fleshed-out version, in code, would probably make it easier to discuss it further.

Another way of achieving your goal would be to use the estransport package, namely the estransport.Client component, directly, and only "vendorize" the api.bulk.go in your code. In this way, you would get all the features of the client (retries, logging, connection handling, etc), without the overhead of the large esapi package, and without the need to keep your fork in sync with the mainline. Historically, the "Bulk" API is one of the most stable ones in Elasticsearch, so keeping a "hard copy" would possibly work for a long time. I can flesh out a code based example if you would like to see one.

Also, if you're mainly interested in indexing data, the esutil.BulkIndexer component is optimized for that use case. At this point, it expects to be passed the elasticsearch.Client struct, but it should be easy to change it to support an interface, so you could pass eg. estransport.Client (or other custom client) to it.

@karmi
Copy link
Contributor

karmi commented Sep 8, 2020

Hi @scudette, I've left a comment at #177 (comment), can you have a look there?

@scudette
Copy link
Author

scudette commented Sep 8, 2020

Hi @karmi I took a very quick look at this pr.

I think the pr makes the api much simpler for a user that is not that familiar with elastic because it is very similar to the documented API and feels like a natural part of the API rather than an unintended side effect of another API.

Maybe your suggested change works (sorry I haven't tried it) but it feels to me like a hack: as a user not familiar with elastic I have no idea what a transport is and why should I jump through these hacks just to call essentially a rest API. I want to read the documentation for the library and see something like "if you only need to include one API you can compose the client like this"

I actually did consider just calling the rest API directly because it is not that hard and maybe that's actually simpler for me to understand than doing some convoluted things like you suggested. But it feels like reinvent the wheel.

Your suggestion to vendor a part of the API also feels like a hack. If I need to vendor a part of the code I might as well just fork the entire project and change the couple lines that I need.

This is what I did here https://github.com/Velocidex/go-elasticsearch

I was surprised at how essentially simple the change is but it did take me a while to figure it out so that's why I think it would be nice to include upstream. Of course this pr is much nicer and the proper way for sure, so I would love to see it merged.

@karmi
Copy link
Contributor

karmi commented Sep 28, 2020

Hello, sorry for the delay with the response. I don't consider the suggestion in #177 (comment) to be a hack. After all, the esapi and estransport packages are well documented, and the client has been decomposed into these package precisely for use cases like this. This is not a side effect of the design — this is the goal of the design.

As I observe people using the client, there's a fairly balanced split between people calling the methods on the client (es.Indices.Create()) and using the request structs (esapi.IndicesCreateRequest{...}.Do(...)) directly. As the Do() method accepts the transport interface, it is easy to pass estransport.Client, instead of elasticsearch.Client. (With a few rough edges to smooth out, which I highlight in the comment.)

I if I recall correctly, your use case was focused on bulk indexing, which is trickier — the most effective way of bulk indexing with Go is to use the helper, which currently expects the full client. This is actually something much worth looking into.

@scudette
Copy link
Author

Ok sure if there is a documented way to call the bulk index api without a full client then it should solve my issue (I think - I still have to check how much is getting linked in).

Currently the sample code builds a full client here

es, err := elasticsearch.NewClient(elasticsearch.Config{

but maybe it can be turned into an interface?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants