Skip to content

Enhance performance of getting information about keywords with big remote libraries #3362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MyMindWorld opened this issue Nov 1, 2019 · 8 comments
Labels
acknowledge To be acknowledged in release notes enhancement priority: high rc 1
Milestone

Comments

@MyMindWorld
Copy link

Hello!
We have really big remote libraries (about 50.000 kw) and we stuck on performance issues

While researching it i found that when client asks for kw names, server sends them all at once, but documentation, arguments e.t.c. sending by one. So we have around 200-300k of requests and to run single case we need to wait around 3 minutes to actually start.

Seems like ok, but in real life when we need to debug smth or run pabot(which simply run allocated instance of robot) 3 minutes becomes 5 hours for 100 suites and our parallel setup becomes useless also

Maybe its possible to make request for all at once and send them also like that?

For now we simply removed this functions from XmlRpcRemoteClient class in Remote.py and it starting now in less than 1 sec)

We also researched dynamic libraries API, but it seems like its not what we need)

@pekkaklarck
Copy link
Member

Sending all information in one go sounds like a good idea, and is definitely possible, but would need someone to implement it. I can help with the design but don't have time for implementation in the near future. Are you interested? Once this feature would be added to the Remote API, it then needed to be added to various remote servers before actually helping.

Alternatively, or additionally, we could add a feature to the Remote library to disable querying other keyword information than names and possibly arguments. It would probably be easier to implement and wouldn't require changes to remote servers. Depending on the context, of some of this information may be needed during execution (especially argument names and types), so this wouldn't be as good solution than getting all information in one request.

@MyMindWorld
Copy link
Author

MyMindWorld commented Nov 8, 2019

Yeah, im definately interested in this!

Disable querying is what we have done already, for now seems working ok, but it is a bit cutting functions (as you mentioned).

I will fork project and start developing for now then

According to xmlrpcclient documentation it supports dictionaries, so for now i think send data as
{
kw_name : { "documentation":kw_doc","type":kw_type...}
...
}

But how can we pass variable for remote library? As arg when starting robot? Or as arg when defining
Library REMOTE ?

@pekkaklarck
Copy link
Member

pekkaklarck commented Nov 26, 2019

We can easily pass arguments to Remote itself when it's imported (it currently gets the optional URI), but there's no way we can pass such initialization information to remote servers. I'm not sure why we needed to pass any such information to implement this functionality, though. This is how I think this should be implemented:

  1. Add a new method to the remote API (and also the the dynamic API used by it) to get all information in one go. Two possibilities here:

    • Add a single get_library_information that returns all information of the library (doc, version, ...) and all keywords (names, docs, args, ...). Basically a big dictionary in format like

      {'name': 'ExampleLibrary', 'doc': 'Example doc', 'version': '0.0.1a1',
       'keywords': [{'name': 'Kw 1', 'doc': 'xxx', 'args': ['arg1', 'arg2=default']}, 
                    {'name': 'Kw 2', 'args': ['**kws'], 'tags': ['example']}]}
      

      This would have a benefit that there would be a single call to get everything. An additional benefit would be being able to pass information like the version and scope that the remote API currently doesn't support. Needed to think could we use possible scope information somehow or not, though. A possible problem is that the returned dictionary could be huge with big libraries. Needed to be tested can that cause issues.

    • Add a keyword specific get_keyword_information that returns all information of that particular keyword. The return value would be a dictionary like {'name': 'Kw 1', 'doc': 'xxx', 'args': ['arg1', 'arg2=default']}. This obviously needed to be called for all keywords meaning the number of XML-RPC calls would still be dependent on the number of keywords.

  2. Regardless would we implement get_library_information or get_keyword_information, we should just try calling that method to see is it implemented. If it is, individual methods like get_keyword_arguments wouldn't need to be called.

    If we'd implement get_library_information, then we didn't necessarily need to even call get_keyword_names. This would be a bit more work because at the moment we detect is the library using the dynamic API by checking does it have get_keyword_names and run_keyword methods. If we add support to get_library_information, we probably needed to change this logic so that this method in combination with run_keyword makes the library dynamic as well.

@pekkaklarck
Copy link
Member

I was thinking this a bit more and realized it would be fine to implement this support only to the remote interface, not to the dynamic API. Robot communicates with the dynamic API on Python level and I doubt there are problems even if the lib would have lot of keywords. The problem is several orders of magnitudes bigger when API calls end up going over HTTP.

If we only implement this in the Remote API, the needed work is a lot smaller.

@rychem
Copy link

rychem commented Sep 10, 2020

@MyMindWorld , @pekkaklarck I just found your discussion on performance issue for big library case.
Having exactly the same case, would you please share whether there is planned some implementation on bulk library transfer?

@pekkaklarck
Copy link
Member

Let's see can we still get this into RF 4.0. I was planning to make a release candidate already today so it's pretty late, but I understand this would be a really important enhancement.

pekkaklarck pushed a commit that referenced this issue Mar 4, 2021
This enhances performance of getting information about keywords with big remote libraries considerably. See issue #3362 for more information.
pekkaklarck added a commit that referenced this issue Mar 4, 2021
pekkaklarck added a commit that referenced this issue Mar 4, 2021
@pekkaklarck
Copy link
Member

Thanks to PR #3802 by @JFoederer this enhancement has been done! I did some cleanup in the commits above and still need to enhance documentation a bit. Obviously remote servers need to be updated as well before this enhancement really brings benefits.

@pekkaklarck
Copy link
Member

RoboCon sprints are running today and we just decided to talk about Python remote servers there at 12:00 UTC. Sprints are organized on Gather and you can access them via https://venue.robocon.io. This is very late notice but it would be great if you could make it there @JFoederer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge To be acknowledged in release notes enhancement priority: high rc 1
Projects
None yet
Development

No branches or pull requests

3 participants