robotframework/robotframework

Enhance performance of getting information about keywords with big remote libraries

MyMindWorld opened this issue · 8 comments

Hello!
We have really big remote libraries (about 50.000 kw) and we stuck on performance issues

While researching it i found that when client asks for kw names, server sends them all at once, but documentation, arguments e.t.c. sending by one. So we have around 200-300k of requests and to run single case we need to wait around 3 minutes to actually start.

Seems like ok, but in real life when we need to debug smth or run pabot(which simply run allocated instance of robot) 3 minutes becomes 5 hours for 100 suites and our parallel setup becomes useless also

Maybe its possible to make request for all at once and send them also like that?

For now we simply removed this functions from XmlRpcRemoteClient class in Remote.py and it starting now in less than 1 sec)

We also researched dynamic libraries API, but it seems like its not what we need)

Sending all information in one go sounds like a good idea, and is definitely possible, but would need someone to implement it. I can help with the design but don't have time for implementation in the near future. Are you interested? Once this feature would be added to the Remote API, it then needed to be added to various remote servers before actually helping.

Alternatively, or additionally, we could add a feature to the Remote library to disable querying other keyword information than names and possibly arguments. It would probably be easier to implement and wouldn't require changes to remote servers. Depending on the context, of some of this information may be needed during execution (especially argument names and types), so this wouldn't be as good solution than getting all information in one request.

Yeah, im definately interested in this!

Disable querying is what we have done already, for now seems working ok, but it is a bit cutting functions (as you mentioned).

I will fork project and start developing for now then

According to xmlrpcclient documentation it supports dictionaries, so for now i think send data as
{
kw_name : { "documentation":kw_doc","type":kw_type...}
...
}

But how can we pass variable for remote library? As arg when starting robot? Or as arg when defining
Library REMOTE ?

We can easily pass arguments to Remote itself when it's imported (it currently gets the optional URI), but there's no way we can pass such initialization information to remote servers. I'm not sure why we needed to pass any such information to implement this functionality, though. This is how I think this should be implemented:

  1. Add a new method to the remote API (and also the the dynamic API used by it) to get all information in one go. Two possibilities here:

    • Add a single get_library_information that returns all information of the library (doc, version, ...) and all keywords (names, docs, args, ...). Basically a big dictionary in format like

      {'name': 'ExampleLibrary', 'doc': 'Example doc', 'version': '0.0.1a1',
       'keywords': [{'name': 'Kw 1', 'doc': 'xxx', 'args': ['arg1', 'arg2=default']}, 
                    {'name': 'Kw 2', 'args': ['**kws'], 'tags': ['example']}]}
      

      This would have a benefit that there would be a single call to get everything. An additional benefit would be being able to pass information like the version and scope that the remote API currently doesn't support. Needed to think could we use possible scope information somehow or not, though. A possible problem is that the returned dictionary could be huge with big libraries. Needed to be tested can that cause issues.

    • Add a keyword specific get_keyword_information that returns all information of that particular keyword. The return value would be a dictionary like {'name': 'Kw 1', 'doc': 'xxx', 'args': ['arg1', 'arg2=default']}. This obviously needed to be called for all keywords meaning the number of XML-RPC calls would still be dependent on the number of keywords.

  2. Regardless would we implement get_library_information or get_keyword_information, we should just try calling that method to see is it implemented. If it is, individual methods like get_keyword_arguments wouldn't need to be called.

    If we'd implement get_library_information, then we didn't necessarily need to even call get_keyword_names. This would be a bit more work because at the moment we detect is the library using the dynamic API by checking does it have get_keyword_names and run_keyword methods. If we add support to get_library_information, we probably needed to change this logic so that this method in combination with run_keyword makes the library dynamic as well.

I was thinking this a bit more and realized it would be fine to implement this support only to the remote interface, not to the dynamic API. Robot communicates with the dynamic API on Python level and I doubt there are problems even if the lib would have lot of keywords. The problem is several orders of magnitudes bigger when API calls end up going over HTTP.

If we only implement this in the Remote API, the needed work is a lot smaller.

@MyMindWorld , @pekkaklarck I just found your discussion on performance issue for big library case.
Having exactly the same case, would you please share whether there is planned some implementation on bulk library transfer?

Let's see can we still get this into RF 4.0. I was planning to make a release candidate already today so it's pretty late, but I understand this would be a really important enhancement.

Thanks to PR #3802 by @JFoederer this enhancement has been done! I did some cleanup in the commits above and still need to enhance documentation a bit. Obviously remote servers need to be updated as well before this enhancement really brings benefits.

RoboCon sprints are running today and we just decided to talk about Python remote servers there at 12:00 UTC. Sprints are organized on Gather and you can access them via https://venue.robocon.io. This is very late notice but it would be great if you could make it there @JFoederer!