ros-infrastructure/catkin_pkg

Parallel find_packages_allowing_duplicates?

mikepurvis opened this issue · 4 comments

For large workspaces (600+ packages), the serialized parsing of package.xml files in find_packages_allowing_duplicates can start to be non-trivial:

for path in package_paths:
packages[path] = parse_package(os.path.join(basepath, path), warnings=warnings)

Without changing the interface to the function, would we consider allowing this work to be spread over multiple threads or processes, possibly triggered by some threshold in number of packages?

Absolutely! The caller shouldn't care how the requested information is being gathered. If that loop can be parallelized that would be great.

A naive threading implementation is slower than the simple loop, so it's not IO bound. I get ~1.5s with the current implementation, and <0.5s running it with a multiprocessing map. I'll send a PR shortly.

The simplest implementation is like so:

package_paths = find_package_paths(basepath, exclude_paths=exclude_paths, exclude_subspaces=exclude_subspaces)
parsed_packages = multiprocessing.Pool(4).map(parse_package, package_paths)
return dict(zip(package_paths, parsed_packages))

However, to preserve the behaviour of the warnings argument requires passing an extra thing into the map for it and then manually aggregating the results, which necessitates some additional wrapping, unfortunately.

Addressed by #171.