/InternationalNames

Providing a cleaned dataset of international names based on the EU Science Hub's JRC-Names dataset

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

International Names

Providing a cleaned dataset of international names based on the EU Science Hub's JRC-Names dataset

Prerequisites

Code tested on MacOS 10.14.6. You need

  • gzip (should come with MacOS)
  • Python 3
  • pandas

Instructions

First run the download script. Then run the Python script to clean the data. Done.

Rationale

JRC-Names is a great resource—-it provides lots of different names from lots of different locales. Unfortunately, it is also a bit dirty. This repository aims to fix that.