/mundi

Organize data about countries and its subdivisions

Primary LanguagePythonMIT LicenseMIT

Mundi

Mundi is a simple package that provides information about all countries in the world as as a convenient set of classes and Pandas dataframes. It uses information provided by the popular pycountry package and supplement it with several other data sources using plugins.

Warning!

Mundi is still in an early stage of development and thus is changing very quickly. New users should expect some risks in terms of API changes and general breakage. We suggest that if you want to take that risk, install it from git and keep in touch with the developers (and better yet, contribute to the project).

Usage

Install Mundi using pip install mundi or your method of choice. Now, you can just import it and load the desired information. Mundi exposes collections of entries as dataframes, and single entries (rows in those dataframes) as Series objects.

>>> import mundi
>>> df = mundi.countries(); df  # DOCTEST: +ELLIPSIS
                    name
id
AD               Andorra
AE  United Arab Emirates
AF           Afghanistan
AG   Antigua and Barbuda
AI              Anguilla
...

The mundi.countries() function is just an alias to mundi.regions(type="country"). The more generic mundi.region() function may be used to query countries and subdivisions inside a country.

>>> br_states = mundi.regions(country="BR", type="state"); br_states  # DOCTEST: +ELLIPSIS
                      name
id
BR-AC                 Acre
BR-AL              Alagoas
BR-AM             Amazonas
BR-AP                Amapá
BR-BA                Bahia
...

If you want a single country or single region, use the mundi.region() function, which returns a Region object, that in many ways behave like a row of a dataframe.

>>> br = mundi.region("BR"); br
Region("BR", name="Brazil")

The library creates a custom .mundi accessor that exposes additional methods not present in regular data frames. The most important of those is the ability to extend the data frame with additional columns available from Mundi itself or from plugins.

>>> extra = df.mundi[["region", "income_group"]]; extra   # DOCTEST: +ELLIPSIS
                region  income_group
id
AD              europe          high
AE         middle-east          high
AF          south-asia           low
AG       latin-america          high
AI                 NaN           NaN
...

Each region also exhibit those values as attributes

>>> br.region
'latin-america'
>>> br.income_group
'upper-middle'

It is also possible to keep the columns of the original dataframe using the ellipsis syntax

>>> df = df.mundi[..., "region", "income_group"]; df    # DOCTEST: +ELLIPSIS
                    name         region  income_group
id
AD               Andorra         europe          high
AE  United Arab Emirates    middle-east          high
AF           Afghanistan     south-asia           low
AG   Antigua and Barbuda  latin-america          high
AI              Anguilla            NaN           NaN
...

The .mundi accessor is also able to select countries over mundi columns, even if those columns are not in the original dataframe.

>>> countries = mundi.countries()
>>> countries.mundi.filter(income_group="upper-middle")  # DOCTEST: +ELLIPSIS
                       name
id
AD                  Andorra
AE     United Arab Emirates
AG      Antigua and Barbuda
AT                  Austria
AU                Australia
...

Information

The basic data in the mundi package is centered around a table describing many world regions with the following structure:

Column Description
id (index) Dataframe indexes are strings and correspond to the ISO code of a region, when available.
name Region name in English
type Type of region. There are too many types to list here, but it will be something like "country", "state", "municipality", etc.
subtype A sub-division of the given type (e.g. a state can also be a "federal district")
short_code Short code for region. Those are unique in the same country, but may repeat elsewhere. For Countries, this is the ISO alpha-2 code.
long_code Alternative long version of the code. For countries, this is the ISO alpha-3 code. Other sub-regions may optionally leave this column empty.
numeric_code Numeric code for region, when it exists. ISO assign a numeric code to each country and the official geographical bureau of each country frequently works with numerical codes too. Mundi will try to use those codes whenever possible, or will leave this column empty when no numerical convention is available.
country_code Country code for the selected region. If region is a country, this column is empty.
parent_id The id string for the parent element. Countries are considered to be root elements and therefore do not fill this column. The parent might be an intermediate region between the current row and the corresponding country. A city, for instance, may have a parent state, which have a parent country.
alt_parents List of ids separated by semi-colons with alternative parents that do not belong to the main hierarchy.
income_group Country classification according to UN's income groups.
region Region of the globe according to UN's classification.