Miserlou/Zappa

Zappa always re-downloads "psycopg2-binary" library when building the deployment package

Closed this issue · 5 comments

Context

First of all, I am indebted to Rich Jones and contributors who made Zappa possible! It's mind-blowing to think how far Zappa has taken Amazon API Gateway and Lambda functions.

The bug itself is tiny and might not even be worthy of the attention of this audience, and there is also a workaround. I am deploying a stock Django project coupled with a PostgreSQL database.

Not sure how you guys and gals are doing this, but what worked for me was to run both pip install psycopg2 and pip install psycopg2-binary. The former installs a nice _psycopg.cpython-38-x86_64-linux-gnu.so so-called "shared object" file, a dynamically linked Linux library, but it requires a host of other libraries to actually run -- so the latter provides those libraries.

It all deploys and runs well, but each time I try to run zappa update, or actually even zappa package, Zappa builds a package and re-downloads the poor psycopg2-binary library.

Those libraries are supposed to be cached on disk to avoid re-downloading each time. I inserted a debug print statement after line 817 in core.py:

print(f"Looking for file: {wheel_path}")

And when running zappa package, it prints, among other lines:

Looking for file: /tmp/cached_wheels/psycopg2-binary-2.8.5-*_x86_64.whl

Actually, there is a file with almost the same name in that location, but the dash - between psycopg2 and binary is replaced with an underscore _:

ls /tmp/cached_wheels/psycopg2_binary-2.8.5-cp38-cp38-manylinux1_x86_64.whl

The underscore is what is present in the original name of that file if you want to download it from PyPi.

Looks like this happens because the name of the library in requirements.txt is with a dash: psycopg2-binary, and this is what Zappa expects when building a wildcard expression in line 816 of core.py, whereas the file on disk is with an underscore.

For comparison, another package with a dash in its name, django-environ, also ships its wheel file with an underscore instead of a dash.

Now, I am not sure we should call it a bug, since Zappa works, it just re-downloads this file each time since it believes it's not cached. The only negative thing is that guilty feeling that someone is paying for that bandwidth.

A workaround would be to make a copy of that file in /tmp/cached_wheels but replace an underscore with a dash:

cd /tmp/cached_wheels
cp psycopg2_binary-2.8.5-cp38-cp38-manylinux1_x86_64.whl psycopg2-binary-2.8.5-cp38-cp38-manylinux1_x86_64.whl

It will then find the "cached" version and not download anything:

$ zappa package

Downloading and installing dependencies..
 - psycopg2-binary==2.8.5: Using locally cached manylinux wheel

I am surprised that I didn't find anyone else complaining about this. Maybe I am doing it wrong and everyone else using Django with PostgreSQL is not having this problem.

Feel free to close this issue since there is a workaround -- this way, even if closed, it will eventually become indexed by search engines for others to find.

Expected Behavior

Zappa should find the cached wheel file despite the underscore in its name.

Actual Behavior

The file has an underscore where Zappa expects a dash, so Zappa re-downloads the file.

Possible Fix

A workaround is creating a copy of the wheel file with a dash instead of the underscore in its name (commands provided above); this way Zappa will find the copy and use it.

Steps to Reproduce

  1. pip install -r requirements.txt (packages to install are listed below)
  2. zappa package

Your Environment

  • Zappa version used: 0.51.0
  • Operating System and Python version: Ubuntu 20.04 LTS, Python 3.8.2
  • The output of pip freeze (relevant parts):
Django==3.0.7
psycopg2==2.8.5
psycopg2-binary==2.8.5
zappa==0.51.0

Thanks, everyone -- again, it's not really a bug, or at most a tiny one. I just hope that after reading this others will know how to fix it using the workaround provided.

Thank you very much. This is a bug in zappa as we (in this specific case I) forgot to apply this section of the PEP to the filenames: https://www.python.org/dev/peps/pep-0427/#escaping-and-unicode

Thank you, João -- I am impressed with the speed of your reply and your exact pinpointing of the bug! 👍 🙇

Dear João, I noticed you made a fix available, so I patched Zappa and the problem indeed went away! That was very quick -- thank you!

I am currently trying to deploy a cookiecutter-django project with Zappa and I noticed that it requires lots of libraries, including Pillow, whose wheel files start with a capital P.

So now there is no problem with wheel files with dashes in their names, but Zappa is trying to re-download the wheel file for Pillow.

Added debug print statements show that when running zappa package, Zappa is looking for a file matching this wildcard pillow-7.1.2-*_x86_64.whl, whereas the actual file name in /tmp/cached_wheels/ is Pillow-7.1.2-cp38-cp38-manylinux1_x86_64.whl.

I am not really sure it's a bug in Zappa (maybe wheel files shouldn't be starting with capital letters?), so feel free to disregard this altogether.

For those who are looking for a quick workaround, it's the same as it was with psycopg2-binary: just provide the file name that Zappa is expecting by making a copy of the wheel file starting with the lowercase letter; Zappa will not re-download it then:

cd /tmp/cached_wheels
cp Pillow-7.1.2-cp38-cp38-manylinux1_x86_64.whl pillow-7.1.2-cp38-cp38-manylinux1_x86_64.whl

I am having the same problem, its redownloading the below pip libraries again while packaging,
for my local machine its happening for below 2 libraries,
pillow==9.2.0
markupsafe==2.1.1

When i am using github actions, its happening for 4 library. its making my deployment a little slow. how can I fix this ?
pillow==9.2.0
markupsafe==2.1.1
numpy==1.23.1
pandas==1.4.3

@tusharmesh can you try to update Zappa version. From the discussion it looks like this issue has been fixed