Small script to import the CSV files from https://datacatalog.worldbank.org/dataset/world-development-indicators into an SQLite database.
The dataset is licensed under CC-BY 4.0.
- Download the CSV zip (in the above link, go to Data & Resources, then select CSV)
- run create_db.py
The resulting database is around 600MB (7 million indicator values).
select count(*) as num_countries from wdi_country
num_countries | |
---|---|
0 | 263 |
select country_code, long_name, currency_unit from wdi_country limit 5
country_code | long_name | currency_unit | |
---|---|---|---|
0 | ABW | Aruba | Aruban florin |
1 | AFG | Islamic State of Afghanistan | Afghan afghani |
2 | AGO | People's Republic of Angola | Angolan kwanza |
3 | ALB | Republic of Albania | Albanian lek |
4 | AND | Principality of Andorra | Euro |
30 random indictaors
select series_code, indicator_name, substr(long_definition, 0, 500)||'...' as definition from wdi_series order by random() limit 30
series_code | indicator_name | definition | |
---|---|---|---|
0 | NY.GNP.MKTP.PP.KD | GNI, PPP (constant 2017 international $) | PPP GNI (formerly PPP GNP) is gross national income (GNI) converted to international dollars using purchasing power parity rates. An international dollar has the same purchasing power over GNI as a U.S. dollar has in the United States. Gross national income is the sum of value added by all resident producers plus any product taxes (less subsidies) not included in the valuation of output plus net receipts of primary income (compensation of employees and property income) from abroad. Data are in ... |
1 | TM.VAL.MRCH.CD.WT | Merchandise imports (current US$) | Merchandise imports show the c.i.f. value of goods received from the rest of the world valued in current U.S. dollars.... |
2 | SE.XPD.CPRM.ZS | Current education expenditure, primary (% of total expenditure in primary public institutions) | Current expenditure is expressed as a percentage of direct expenditure in public educational institutions (instructional and non-instructional) of the specified level of education. Financial aid to students and other transfers are excluded from direct expenditure. Current expenditure is consumed within the current year and would have to be renewed if needed in the following year. It includes staff compensation and current expenditure other than for staff compensation (ex. on teaching materials,... |
3 | DT.ODA.ODAT.GI.ZS | Net ODA received (% of gross capital formation) | Net official development assistance (ODA) consists of disbursements of loans made on concessional terms (net of repayments of principal) and grants by official agencies of the members of the Development Assistance Committee (DAC), by multilateral institutions, and by non-DAC countries to promote economic development and welfare in countries and territories in the DAC list of ODA recipients. It includes loans with a grant element of at least 25 percent (calculated at a rate of discount of 10 per... |
4 | SE.SEC.ENRL.GC.FE.ZS | Secondary education, general pupils (% female) | Secondary general pupils are the number of secondary students enrolled in general education programs, including teacher training.... |
5 | SE.SEC.CUAT.UP.ZS | Educational attainment, at least completed upper secondary, population 25+, total (%) (cumulative) | The percentage of population ages 25 and over that attained or completed upper secondary education.... |
6 | SP.URB.TOTL | Urban population | Urban population refers to people living in urban areas as defined by national statistical offices. It is calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects. Aggregation of urban and rural population may not add up to total population because of different country coverages.... |
7 | SP.M15.2024.FE.ZS | Women who were first married by age 15 (% of women ages 20-24) | Women who were first married by age 15 refers to the percentage of women ages 20-24 who were first married by age 15.... |
8 | SP.RUR.TOTL.ZG | Rural population growth (annual %) | Rural population refers to people living in rural areas as defined by national statistical offices. It is calculated as the difference between total population and urban population.... |
9 | NE.GDI.FTOT.CN | Gross fixed capital formation (current LCU) | Gross fixed capital formation (formerly gross domestic fixed investment) includes land improvements (fences, ditches, drains, and so on); plant, machinery, and equipment purchases; and the construction of roads, railways, and the like, including schools, offices, hospitals, private residential dwellings, and commercial and industrial buildings. According to the 1993 SNA, net acquisitions of valuables are also considered capital formation. Data are in current local currency.... |
10 | IC.FRM.OUTG.ZS | Value lost due to electrical outages (% of sales for affected firms) | Average losses due to electrical outages, as percentage of total annual sales. The value represents average losses for all firms which reported outages (please see indicator IC.ELC.OUTG.ZS).... |
11 | EP.PMP.SGAS.CD | Pump price for gasoline (US$ per liter) | Fuel prices refer to the pump prices of the most widely sold grade of gasoline. Prices have been converted from the local currency to U.S. dollars.... |
12 | EG.ELC.LOSS.ZS | Electric power transmission and distribution losses (% of output) | Electric power transmission and distribution losses include losses in transmission between sources of supply and points of distribution and in the distribution to consumers, including pilferage.... |
13 | IQ.SCI.PRDC | Periodicity and timeliness assessment of statistical capacity (scale 0 - 100) | The periodicity and timeliness indicator assesses the availability and periodicity of key socioeconomic indicators. It measures the extent to which data are made accessible to users through transformation of source data into timely statistical outputs. The periodicity score is calculated as the weighted average of 10 underlying indicator scores. The final periodicity score contributes 1/3 of the overall Statistical Capacity Indicator score.... |
14 | SE.SEC.CUAT.UP.FE.ZS | Educational attainment, at least completed upper secondary, population 25+, female (%) (cumulative) | The percentage of population ages 25 and over that attained or completed upper secondary education.... |
15 | IC.TAX.METG | Number of visits or required meetings with tax officials (average for affected firms) | Average number of visits or required meetings with tax officials during the year. The value represents the average number of visits for all firms which reported being visited or required to meet with tax officials (please see indicator IC.FRM.METG.ZS).... |
16 | SL.TLF.0714.WK.MA.TM | Average working hours of children, working only, male, ages 7-14 (hours per week) | Average working hours of children working only refers to the average weekly working hours of those children who are involved in economic activity and not attending school.... |
17 | SP.POP.TOTL.FE.ZS | Population, female (% of total population) | Female population is the percentage of the population that is female. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.... |
18 | SP.DYN.TO65.MA.ZS | Survival to age 65, male (% of cohort) | Survival to age 65 refers to the percentage of a cohort of newborn infants that would survive to age 65, if subject to age specific mortality rates of the specified year.... |
19 | EG.ELC.PETR.ZS | Electricity production from oil sources (% of total) | Sources of electricity refer to the inputs used to generate electricity. Oil refers to crude oil and petroleum products.... |
20 | SE.ADT.1524.LT.FM.ZS | Literacy rate, youth (ages 15-24), gender parity index (GPI) | Gender parity index for youth literacy rate is the ratio of females to males ages 15-24 who can both read and write with understanding a short simple statement about their everyday life.... |
21 | CM.MKT.TRAD.GD.ZS | Stocks traded, total value (% of GDP) | The value of shares traded is the total number of shares traded, both domestic and foreign, multiplied by their respective matching prices. Figures are single counted (only one side of the transaction is considered). Companies admitted to listing and admitted to trading are included in the data. Data are end of year values.... |
22 | EN.CO2.MANF.ZS | CO2 emissions from manufacturing industries and construction (% of total fuel combustion) | CO2 emissions from manufacturing industries and construction contains the emissions from combustion of fuels in industry. The IPCC Source/Sink Category 1 A 2 includes these emissions. However, in the 1996 IPCC Guidelines, the IPCC category also includes emissions from industry autoproducers that generate electricity and/or heat. The IEA data are not collected in a way that allows the energy consumption to be split by specific end-use and therefore, autoproducers are shown as a separate item (Un... |
23 | SH.TBS.DTEC.ZS | Tuberculosis case detection rate (%, all forms) | Tuberculosis case detection rate (all forms) is the number of new and relapse tuberculosis cases notified to WHO in a given year, divided by WHO's estimate of the number of incident tuberculosis cases for the same year, expressed as a percentage. Estimates for all years are recalculated as new information becomes available and techniques are refined, so they may differ from those published previously.... |
24 | FX.OWN.TOTL.FE.ZS | Account ownership at a financial institution or with a mobile-money-service provider, female (% of population ages 15+) | Account denotes the percentage of respondents who report having an account (by themselves or together with someone else) at a bank or another type of financial institution or report personally using a mobile money service in the past 12 months (female, % age 15+).... |
25 | SE.SEC.UNER.LO.FE.ZS | Adolescents out of school, female (% of female lower secondary school age) | Adolescents out of school are the percentage of lower secondary school age adolescents who are not enrolled in school.... |
26 | FX.OWN.TOTL.YG.ZS | Account ownership at a financial institution or with a mobile-money-service provider, young adults (% of population ages 15-24) | Account denotes the percentage of respondents who report having an account (by themselves or together with someone else) at a bank or another type of financial institution or report personally using a mobile money service in the past 12 months (young adults, % of population ages 15-24).... |
27 | per_lm_alllm.ben_q1_tot | Benefit incidence of unemployment benefits and ALMP to poorest quintile (% of total U/ALMP benefits) | Benefit incidence of unemployment benefits and active labor market programs (ALMP) to poorest quintile shows the percentage of total unemployment and active labor market programs benefits received by the poorest 20% of the population. Unemployment benefits and active labor market programs include unemployment compensation, severance pay, and early retirement due to labor market reasons, labor market services (intermediation), training (vocational, life skills, and cash for training), job rotati... |
28 | HD.HCI.OVRL.FE | Human capital index (HCI), female (scale 0-1) | The HCI calculates the contributions of health and education to worker productivity. The final index score ranges from zero to one and measures the productivity as a future worker of child born today relative to the benchmark of full health and complete education.... |
29 | BX.GSR.CCIS.CD | ICT service exports (BoP, current US$) | Information and communication technology service exports include computer and communications services (telecommunications and postal and courier services) and information services (computer data and news-related service transactions). Data are in current U.S. dollars.... |
Here's a more complex query: Get the newest data for the youth literacy rate indicator (from 2010 or later), then show the 20 countries with the lowest rate.
with newest_data as (
select country_code, indicator_code, max(year) as year from wdi_data
where
indicator_code = (select series_code from wdi_series where indicator_name = 'Literacy rate, youth total (% of people ages 15-24)')
and year > 2010
group by country_code
)
select c.long_name as country, printf('%.1f %%', value) as "Youth Literacy Rate"
from wdi_data, newest_data
join wdi_country c on c.country_code = wdi_data.country_code
where wdi_data.indicator_code = newest_data.indicator_code and wdi_data.country_code = newest_data.country_code and wdi_data.year = newest_data.year
order by value asc limit 20
country | Youth Literacy Rate | |
---|---|---|
0 | Republic of Chad | 30.8 % |
1 | Central African Republic | 38.3 % |
2 | Republic of Niger | 43.5 % |
3 | Republic of Guinea | 46.3 % |
4 | Republic of South Sudan | 47.9 % |
5 | Republic of Mali | 50.1 % |
6 | Republic of Liberia | 55.4 % |
7 | Burkina Faso | 58.3 % |
8 | Republic of Côte d'Ivoire | 58.4 % |
9 | Republic of Guinea-Bissau | 60.4 % |
10 | Republic of Benin | 60.9 % |
11 | Islamic Republic of Mauritania | 63.9 % |
12 | Islamic State of Afghanistan | 65.4 % |
13 | Republic of Sierra Leone | 66.6 % |
14 | Republic of The Gambia | 67.2 % |
15 | Republic of Senegal | 69.5 % |
16 | Republic of Mozambique | 70.9 % |
17 | Federal Democratic Republic of Ethiopia | 72.8 % |
18 | Republic of Malawi | 72.9 % |
19 | Republic of the Sudan | 73.0 % |
select * from sqlite_master
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | wdi_data | wdi_data | 2 | CREATE TABLE "wdi_data" ( |
"indicator_code" TEXT, | |||||
"country_code" TEXT, | |||||
"year" INTEGER, | |||||
"value" REAL | |||||
) | |||||
1 | table | wdi_country | wdi_country | 3 | CREATE TABLE "wdi_country" ( |
"country_code" TEXT, | |||||
"short_name" TEXT, | |||||
"table_name" TEXT, | |||||
"long_name" TEXT, | |||||
"2-alpha_code" TEXT, | |||||
"currency_unit" TEXT, | |||||
"special_notes" TEXT, | |||||
"region" TEXT, | |||||
"income_group" TEXT, | |||||
"wb-2_code" TEXT, | |||||
"national_accounts_base_year" TEXT, | |||||
"national_accounts_reference_year" REAL, | |||||
"sna_price_valuation" TEXT, | |||||
"lending_category" TEXT, | |||||
"other_groups" TEXT, | |||||
"system_of_national_accounts" TEXT, | |||||
"alternative_conversion_factor" TEXT, | |||||
"balance_of_payments_manual_in_use" TEXT, | |||||
"external_debt_reporting_status" TEXT, | |||||
"system_of_trade" TEXT, | |||||
"government_accounting_concept" TEXT, | |||||
"imf_data_dissemination_standard" TEXT, | |||||
"latest_population_census" TEXT, | |||||
"latest_household_survey" TEXT, | |||||
"source_of_most_recent_income_and_expenditure_data" TEXT, | |||||
"vital_registration_complete" TEXT, | |||||
"latest_agricultural_census" TEXT, | |||||
"latest_industrial_data" REAL, | |||||
"latest_trade_data" REAL | |||||
) | |||||
2 | table | wdi_country_series | wdi_country_series | 4 | CREATE TABLE "wdi_country_series" ( |
"countrycode" TEXT, | |||||
"seriescode" TEXT, | |||||
"description" TEXT | |||||
) | |||||
3 | table | wdi_footnote | wdi_footnote | 5 | CREATE TABLE "wdi_footnote" ( |
"countrycode" TEXT, | |||||
"seriescode" TEXT, | |||||
"year" TEXT, | |||||
"description" TEXT | |||||
) | |||||
4 | table | wdi_series | wdi_series | 6 | CREATE TABLE "wdi_series" ( |
"series_code" TEXT, | |||||
"topic" TEXT, | |||||
"indicator_name" TEXT, | |||||
"short_definition" TEXT, | |||||
"long_definition" TEXT, | |||||
"unit_of_measure" TEXT, | |||||
"periodicity" TEXT, | |||||
"base_period" TEXT, | |||||
"other_notes" TEXT, | |||||
"aggregation_method" TEXT, | |||||
"limitations_and_exceptions" TEXT, | |||||
"notes_from_original_source" TEXT, | |||||
"general_comments" TEXT, | |||||
"source" TEXT, | |||||
"statistical_concept_and_methodology" TEXT, | |||||
"development_relevance" TEXT, | |||||
"related_source_links" TEXT, | |||||
"related_indicators" TEXT, | |||||
"license_type" TEXT | |||||
) | |||||
5 | table | wdi_series_time | wdi_series_time | 7 | CREATE TABLE "wdi_series_time" ( |
"seriescode" TEXT, | |||||
"year" TEXT, | |||||
"description" TEXT | |||||
) | |||||
6 | index | ix_wdi_data_indicator_code_country_code_year | wdi_data | 8 | CREATE INDEX "ix_wdi_data_indicator_code_country_code_year"ON "wdi_data" ("indicator_code","country_code","year") |