datamade/usaddress

Install takes minutes

mlissner opened this issue · 2 comments

Just a heads up, the latest version that works with python 3.10 takes a very long time to install. I haven't timed it, but it looks like it's doing something with gcc that takes a really long time, maybe 5-10 minutes.

This is well outside my wheelhouse, so not something I think I can fix, but I thought I'd alert the maintainers. Maybe the wheel needs to have a compiled version in it? Is that a thing?

I was able to do a little digging on this today. The timing is about 20 minutes to install usaddress, and we're seeing it on both arm laptops and x86 linux machines. Looks like this is crfstuite again, but I don't think I can move this issue over there (not sure it matters too much?).

I'm guessing the build artifacts are being made via this:

https://github.com/scrapinghub/python-crfsuite/blob/master/.github/workflows/build_and_upload.yml

What I don't understand at the moment is why it looks like the whl's are there for basically every platform under the sun, and yet I assume they're being compiled when we install usaddress. You can see all the wheels here:

https://pypi.org/project/python-crfsuite/0.9.8/

Hm. I'll see if I can figure out why the wheels aren't getting installed since they seem to exist, but I'm drawing a blank for the moment.

I did a bunch more testing on this today, and the short version is: It's not usaddress causing this at all. I'll close this issue, and apologize for taking folks time.

The TL;DR is that usaddress looks like the cause because it's the last dependency we install, and a different dependency we have does some sort of compilation step after all our other dependencies are installed. So we see a log like this:

Package operations: 42 installs, 0 updates, 0 removals

[ --- a small flotilla of dependencies --]
  • Installing python-igraph (0.9.1)
[ --- a gazillion more dependencies --]
  • Installing usaddress (0.5.10)
[ --- laptop freezes, stalls for 20 minutes --] 

I had a tough time figuring this out. Since the log looked like it was installing usaddress, I tried to reproduce this in just about every possible way: Python 3.9 or 3.10, poetry vs pip, docker vs. host machine, etc. Finally, I went back to using my full list of dependencies and decided to bisect things until I found the cause, which was a slightly old version of python-igraph. Updating it to the latest version fixes this.

Anyway, apologies again for the noise. All is well here.