python-rope/rope

AutoImport.generate_modules_cache can be speeded up by 2x

tkrabel opened this issue · 1 comments

Describe the bug

I played around with AutoImport.generate_modules_cache, as it is perceived to be slow. I logged what packages get imported from a conda env I created and linked to the project, and I seems there are many unnecessary duplicate entries added to the database.

You can see the imports in this file: sorted_packages_with_site_packages.txt.

I see the following pattern of duplicate entries in that file.

Name(name=<import_name>, modname=<mod_name>, package=<package>, ...)
Name(name=<import_name>, modname="site-packages."<mod_name>, package='site-packages', ...)

One example of a duplicate:

Name(name='BaseName', modname='jedi.api.classes', package='jedi', source=<Source.SITE_PACKAGE: 4>, name_type=<NameType.Class: 7>)
Name(name='BaseName', modname='site-packages.jedi.api.classes', package='site-packages', source=<Source.SITE_PACKAGE: 4>, name_type=<NameType.Class: 7>)

From that, it seems the issue is that we don't exclude the top level site-packages directory itself from our search tree, which treats it as its own package and hence every package inside of it is double counted.

To Reproduce

  1. Change code in sqlite.py
diff --git a/rope/contrib/autoimport/sqlite.py b/rope/contrib/autoimport/sqlite.py
index eb7c27de..42447edc 100644
--- a/rope/contrib/autoimport/sqlite.py
+++ b/rope/contrib/autoimport/sqlite.py
@@ -371,6 +371,7 @@ class AutoImport:
             return
         self._add_packages(packages)
         job_set = task_handle.create_jobset("Generating autoimport cache", 0)
+        end_names = []
         if single_thread:
             for package in packages:
                 for module in get_files(package, underlined):
@@ -383,9 +384,11 @@ class AutoImport:
                 get_future_names(packages, underlined, job_set)
             ):
                 self._add_names(future_name.result())
+                end_names.append(future_name.result())
                 job_set.finished_job()
 
         self.connection.commit()
+        return end_names
 
     def _get_packages_from_modules(self, modules: List[str]) -> Iterator[Package]:
         for modname in modules:
  1. Run the code from an env that has the changes from (1) applied
from rope.base.project import Project
from rope.contrib.autoimport.sqlite import AutoImport

project = Project(".")
autoimport = AutoImport(project, memory=True)
autoimport.generate_modules_cache()
  1. Look at the result

We need to also address this comment: #723 (comment)