Providing a cleaned dataset of international names based on the EU Science Hub's JRC-Names dataset
Code tested on MacOS 10.14.6. You need
gzip
(should come with MacOS)- Python 3
pandas
First run the download script. Then run the Python script to clean the data. Done.
JRC-Names is a great resource—-it provides lots of different names from lots of different locales. Unfortunately, it is also a bit dirty. This repository aims to fix that.