A way to convert `code_muni` into any other higher level aggregations
rafalopespx opened this issue · 17 comments
Hello there,
Is there any rapid and fast way to know from a code_muni
in which health_region, macro or micro, this municipality is inserted?
This is tricky to do if you trust only on the code, although the code_muni
carries information from the state level and region, it is not true that part of the code_muni
can be assigned to a code_health_region
or code_health_macrorregion
, just taking the first 5 digits from the code_muni
A workaround that I developed is using one of the relational tables from dataSUS to add the code_health_region
and/or code_health_macrorregion
to a data.frame with code_muni
maybe this can be implemented in read_municipality
function, as a parameter being passed by the user to return with other code other than code_muni
Thanks for the amazing package!
Adding on this, at DataSUS's FTP there is a registry of changes of municipalities that were moved from one health region to another, as well as, a table that relates this change and can relate the health regions with the municipalities, by code_muni
and code_health_region
I have the relational database and can send it to anyone who will be working on this conversion, please mail me.
Hi @rafalopespx . Thanks for opening this issue. I like the idea of a table that associates each code_muni
to the codes of other geographical units. Currently, the closest we have to this is this:
df <- geobr::lookup_muni(name_muni = 'all')
head(df)
#> code_muni name_muni code_state name_state abbrev_state code_micro
#> 1 1100015 Alta Floresta D'Oeste 11 Rondônia RO 11006
#> 2 1100023 Ariquemes 11 Rondônia RO 11003
#> 3 1100031 Cabixi 11 Rondônia RO 11008
#> 4 1100049 Cacoal 11 Rondônia RO 11006
#> 5 1100056 Cerejeiras 11 Rondônia RO 11008
#> 6 1100064 Colorado do Oeste 11 Rondônia RO 11008
#> name_micro code_meso name_meso code_immediate name_immediate
#> 1 Cacoal 1102 Leste Rondoniense 110005 Cacoal
#> 2 Ariquemes 1102 Leste Rondoniense 110002 Ariquemes
#> 3 Colorado do Oeste 1102 Leste Rondoniense 110006 Vilhena
#> 4 Cacoal 1102 Leste Rondoniense 110005 Cacoal
#> 5 Colorado do Oeste 1102 Leste Rondoniense 110006 Vilhena
#> 6 Colorado do Oeste 1102 Leste Rondoniense 110006 Vilhena
#> code_intermediate name_intermediate
#> 1 1102 Ji-Paraná
#> 2 1101 Porto Velho
#> 3 1102 Ji-Paraná
#> 4 1102 Ji-Paraná
#> 5 1102 Ji-Paraná
#> 6 1102 Ji-Paraná
We could probably try to add the code of health regions to this output. Note however, that this output refers to the year 2010. I'm planning to update the function to include the Census 2022 soon.
Hi everyone! I am working on something similar here, trying to make compatible all Brazilian Territorial Divisions (DTB) from IBGE.
https://github.com/rfsaldanha/rdtb
Very early stage development, but the goal may be to track the municipality changes overtime and space.
Hi @rfsaldanha , thanks for the ping. It looks like you are trying to create a correspondence table for each year. Correct?
Yes, to have the corresponding DTB for each year. The problem is that the official IBGE DTB does not agree with the also official IBGE spatial dataset of municipalities of the same year. :-(
That's a problem, indeed. Which one should we trust, the spatial data or the table data?
I think that the spatial data is more widely used, then more trusted…
One thing that I encountered too, sometimes the spatial data do not agree with other levels or year of the same spatial data, if we pick all municipalities should this cover the country shapefile map or the state shapefile map, but this does not occur for some years
A "tidy" geobr with topological validation of spatial features would be interesting.
Hi both. Before We we make any data available in geobr, we process the data to harmonize column names, projections etc etc and we already "fix" the topolgy by applying sf::st_make_valid()
. I understand this only fixes topological errors to some extent, though.
For example, here's an example of the problem mentioned by @rafalopespx. The total area of the country polygon is not the same as the sum of areas of each state. This is an inconsistency (in this case a small one) in the original IBGE data, and there is not much we can do about it. The impression I have is that any attempt to solve this inconsistency should be done by IBGE in the original raw data.
library(geobr)
library(sf)
options(scipen = 999)
c <- read_country(year = 2010, simplified = FALSE)
s <- read_state(year = 2010, simplified = FALSE)
area_c <- st_area(c)
area_s <- st_area(s) |> sum()
area_c
#> 8535238245979 [m^2]
area_s
#> 8535240429377 [m^2]
I totally agree with you @rafapereirabr on this, and as I remember, I'm not so sure that the topological data is better than table data. And totally agrees, that the problem is at IBGE to be fixed.
One thing that will be really helpful is to have the table data that relates any code_muni
with any other code on higher spatial levels, as before mentioned, e.g. the code_health_macrorregion
and code_health_region
. I think the faster solution is to add such on the lookup table such columns, this will permit generating and relating the codes on different aggregations.
I agree, ideally, we would have all columns added to the output of the lookup table.
obs. Do you know where to find the list of municipalities in each health region and macro region? I haven't found this table anywhere. It's is possible to determine this using a spatial join operation, but the original boundaries of health regions don't match those of municipalities, which creates some strange results (like a municipality from one state 'included' in the macro health region of another state)
I have, I'll send you, where can I send you?
Thanks, @rafalopespx . You can send it to rafa.pereira.br [at] gmail.com.
However, ideally, we would prefer to have an official document or piece of data with this info with an url so we can refer it.
Okay, I have found the relational data table in some different services from the Ministry of Health, you can download it directly from here: ftp://ftp.datasus.gov.br/territorio/tabelas/base_territorial.zip, which is the FTP server from DataSUS, and it points to the latest version of it.
Or you can download it from here: https://datasus.saude.gov.br/transferencia-de-arquivos/
and you have to go to Base Territorial at Fonte, then Bases Territoriais at Modelidade, and finally Bases Territoriais under the Tipo de Arquivo field
If none of them works I can send you the latest one that I downloaded, but I think downloading from there will guarantee you pick the official and latest version of each, and can be easily incorporated into a function
Hey, we keep an updated directory of codes that a municipality can be related to at Base dos Dados.
It also brings other codes besides health regions.
you can either download it at the website or R
install.packages("basedosdados")
library("basedosdados")
# Defina o seu projeto no Google Cloud
set_billing_id("<YOUR_PROJECT_ID>")
# Para carregar o dado direto no R
query <- bdplyr("br_bd_diretorios_brasil.municipio")
df <- bd_collect(query)```