MacHu-GWU/uszipcode-project

Suggest explicitly mentioning that the database is separate from the Python package

Closed this issue · 2 comments

I've been evaluating using your package and it looks initially like this project is really useful.

However currently it is not clear from the description of the package that the underlying database is not actually part of the distributed package and instead is downloaded separately on first use. Obviously this makes the package far less appealing for a variety of use-cases. I appreciate that there is mention of dumping the sqlite database to another database server for deployment as a web service, and that the database file location can be configured, however I think it would be great if this separation and download were mentioned upfront in the readme.

Just for future people, (one day I'll open a pull request on the docs there are a lot of types )
It downloads on first run, and doesn't need to redownlaod. So for instance I downloaded both simple and nonsimple

$ pip install uszipcode
$ vi example.py
from uszipcode import SearchEngine
search1 = SearchEngine(db_file_dir="./tmp" )
search2 = SearchEngine(db_file_dir="./tmp", simple_zipcode=False)

you will see

(v3) 10:51:26 chai@mycomp:~/projects/uszipcode$ python example.py
Start downloading data for simple zipcode database, total size 9MB ...
  1 MB finished ...
  2 MB finished ...
  3 MB finished ...
  4 MB finished ...
  5 MB finished ...
  6 MB finished ...
  7 MB finished ...
  8 MB finished ...
  9 MB finished ...
  10 MB finished ...
  Complete!

After that it does not need to redownload. Though you do need to specify the folder or it will re-download.

from uszipcode import SearchEngine
search = SearchEngine(db_file_dir="./tmp", simple_zipcode=False) # one second run, this will not re-download the database
zipcode = search.by_zipcode("10001")
print(zipcode)
Zipcode(zipcode_type='Standard', major_city='New York', post_office_city='New York, NY', common_city_list=['New York'], county='New York County', state='NY', lat=40.75, lng=-73.99, timezone='Eastern', radius_in_miles=0.9090909090909091, area_code_list=['718', '917', '347', '646'], population=21102, population_density=33959.0, ...
# ... etc.

@nouyang i want to add one thing, use absolute path for db_file_dir parameter to avoid redownload database. By default, you don't need to specify this value. If it is a relative path, then the db_file_dir is changing if u run your code from different directory

search = SearchEngine(db_file_dir="/use-absolute-path-here")