CoronaWhy/task-geo

IMPORTANT: Wrong lat, long values.

juancalvof opened this issue · 6 comments

  • Commit SHA:
    commit c4af4d7 (HEAD -> master, origin/master, origin/HEAD)
    Merge: f23824d 3de7d90
    Author: Manuel Alvarez Campo manuel@pythiac.com
    Date: Sat Apr 4 13:48:16 2020 +0200

    Merge pull request #35 from shaikh-raj/master

    Adding metadata for CDS datasource

  • Python version:3.7

  • Operating System:Windows

  • Data source: cds

Description

Please, revise lat, long values. There are some countries that are wrong. I have made this Viz for helping to visualize the situation. Clic dot to see country, lat, long values:
https://juancalvo.carto.com/builder/3ad41c17-bc07-4889-b047-5903300806c4/embed

Hi @JuanCalvoFerrandiz, thanks again for your bug report.

Do you mind detailing with data source returns this values?

Thanks!

I hope this data helps :)

Countries data.zip

This has already been reported to CoronaDataScraper : covidatlas/coronadatascraper#528

During the afternoon I will try to find more cases using the data you provided and send them also the visualization you made in case it may help them.

Hi guys,

This is my exploration code for Viz fixing: Agregation, lat,long, adding ISO 3 and adding an official name column. Hope that helps:

import task_geo.data_sources as ds
import pandas as pd


# A function that returns de unique values of a column id a df sorted
def series_unique(df, column):
    unique_country_base = df.loc[:, column].unique()
    return pd.DataFrame(data=unique_country_base,
                        columns=["unique_" + column]).sort_values("unique_" + column, ignore_index=True)


# A function that creates a dictionary from a values in a column of df_carto
def create_dict(column):
    dict = {}
    for value in df_unique_country_cl.loc[:, "unique_country"]:
        value_dict = df_carto.loc[df_carto['country'] == value, column].iloc[0]
        dict[value] = value_dict
    return dict

# 0_Correction of aggregate values in countries
data_cds = ds.cds()
data_cds.loc[(data_cds["state"].isnull()) & (data_cds["county"].isnull()) & (data_cds["city"].isnull()), "aggregate"]\
    = "country"

# Getting unique values from country column
data_cds_country_raw = data_cds.loc[(data_cds["aggregate"] == "country")]
df_unique_country = series_unique(data_cds_country_raw, "country")

#Getting df_carto
df_carto = pd.read_csv("..\DATA\RAW\Countries data\world_borders.csv", sep=",")
df_carto.rename(columns={"name": "country"}, inplace=True)

# 1_Getting country_carto column
# Getting unique values from country column
df_unique_country_cl = series_unique(df_carto, "country")

# Getting values with no direct equivalence in df_carto
df_left = df_unique_country.merge(df_unique_country_cl, how='outer', indicator=True).loc[
    lambda x: x['_merge'] == 'left_only']

list = df_left.loc[:, "unique_country"]
list2 = ["Brunei Darussalam", "Congo", "Czech Republic", "Cote d'Ivoire", "Timor-Leste", "Swaziland",
         "Iran (Islamic Republic of)", "Kosovo", "Lao People's Democratic Republic", "Libyan Arab Jamahiriya",
         "Republic of Moldova", "Burma", "The former Yugoslav Republic of Macedonia", "Palestine",
         "Western Sahara", "Korea, Democratic People's Republic of", "South Sudan", "Syrian Arab Republic",
         "Sao Tome and Principe", "United Republic of Tanzania", "Bahamas", "Gambia", "Holy See (Vatican City)",
         "Viet Nam"]

# Create a zip object from two lists and then a dict
dict = dict(zip(list, list2))
data_cds.insert(4, "country_carto", data_cds.loc[:, "country"].map(dict).fillna(data_cds.loc[:, "country"]))

# 2_Getting iso
dict_iso = create_dict("iso3")
dict_iso["Kosovo"] = "RKS"
dict_iso["South Sudan"] = "SSD"
data_cds.insert(5, "iso3", data_cds.loc[:, "country_carto"].map(dict_iso))

# Data_cds_country
data_cds_country = data_cds.loc[(data_cds["aggregate"] == "country")]

# 3_Getting lat just in countries
dict_lat = create_dict("lat")
dict_lat["Kosovo"] = 42.667542
dict_lat["South Sudan"] = 6.8769908
data_cds_country['lat'] = data_cds_country.loc[:, "country_carto"].map(dict_lat)

# 4_Getting long just in countries
dict_long = create_dict("lon")
dict_long["Kosovo"] = 21.166191
dict_long["South Sudan"] = 31.3069782
data_cds_country['long'] = data_cds_country["country_carto"].map(dict_long)

data_cds_country.to_csv(r"C:\Users\juanc\Google Drive\CORONAWHY\DATASETS\data_cds_countries.csv", encoding="UTF-8")


[world_borders.zip](https://github.com/CoronaWhy/task-geo/files/4456383/world_borders.zip)

While reading the docs I came to the realization that the values of the field aggregationare completely correct, the thing is that we should be looking at the level field. More info

Will upload this along the adding of the iso codes.

Update from CDS team:

@ManuelAlvarezC @JuanCalvoFerrandiz we are soon migrating to totally different coordinates, calculated in country-levels. https://github.com/hyperknot/country-levels

Please review if this issue is still present in a few days.

Source: covidatlas/coronadatascraper#528 (comment)