fodr
is an R package to access various French Open Data portals.
Many of those portals use the OpenDataSoft platform to make their data available and this platform can be accessed with the OpenDataSoft APIs.
fodr
wraps this API to make it easier to retrieve data directly in R.
The devtools
package is needed to install fodr
:
devtools::install_github("tutuchan/fodr")
The following portals are currently available with fodr
:
library(fodr)
list_portals()
#> # A tibble: 15 x 3
#> name portals base_urls
#> <chr> <chr> <chr>
#> 1 RATP ratp http://data.ratp.fr
#> 2 Région Ile-de-France iledefra… http://data.iledefrance.…
#> 3 Infogreffe infogref… http://datainfogreffe.fr
#> 4 Toulouse Métropole toulouse https://data.toulouse-me…
#> 5 STAR star https://data.explore.sta…
#> 6 Issy-les-Moulineaux issy http://data.issy.com
#> 7 STIF stif http://opendata.stif.info
#> 8 Paris paris http://opendata.paris.fr
#> 9 Tourisme Alpes-Maritimes 04 http://tourisme04.openda…
#> 10 Tourisme Pas-de-Calais 62 http://tourisme62.openda…
#> 11 Département des Hauts-de-Seine 92 https://opendata.hauts-d…
#> 12 Ministère de l'Education Nationale,… enesr http://data.enseignement…
#> 13 ERDF erdf https://data.erdf.fr
#> 14 RTE rte https://opendata.rte-fra…
#> 15 OpenDataSoft Public ods https://public.opendatas…
The portals have been identified from the Open Data Inception website. Many of these portals do not actually contain data and a large number of them are available via the ArcGIS Open platform. This API will be supported in a future release.
Use the fodr_portal
function with the corresponding fodr slug to
create a FODRPortal
object:
library(fodr)
portal <- fodr_portal("paris")
portal
#> FODRPortal object
#> ---------------------------------------------------------------
#> Portal: paris
#> Number of datasets: 249
#> Themes:
#> - Administration et Finances Publiques
#> - Citoyenneté
#> - Commerces
#> - Culture
#> - Environnement
#> - Equipements, Services, Social
#> - Mobilité et Espace Public
#> - Services
#> - Urbanisme et Logements
#> ---------------------------------------------------------------
The search
method allows you to find datasets on this portal (see the
function documentation for more information). By default, and contrary
to the Open Data Soft API, all elements satisfying the search are
returned.
Let’s look at the datasets that contain the word vote:
list_datasets <- portal$search(q = "vote")
#> 36 datasets found ...
list_datasets[[1]]
#> FODRDataset object
#> ---------------------------------------------------------------
#> Dataset id: secteurs-des-bureaux-de-vote
#> Theme: Citoyenneté
#> Keywords: bureau de vote, elections, votes, suffrages
#> Publisher: Mairie de Paris / Direction de la Démocratie, des Citoyens et des Territoires
#> ---------------------------------------------------------------
#> Number of records: 896
#> Number of files: 0
#> Modified: 2017-03-30
#> Sortables: objectid, nbr_elect_f, nbr_elect_e_m, nbr_elect_e_e, nbr_elect_l12, arrondissement, num_bv
#> ---------------------------------------------------------------
#> Description:
#> Sectionnement des bureaux de vote en vigueur à partir du 01 mars 2017Donnée initialement en NTF Lambert Zone I(EPSG : 27561)et reprojetée en RGF 93 Lambert 93(EPSG : 2154) Représentation du sectionnement des bureaux de vote, applicable à partir du 1emars 2017. La représentation du sectionnement ne se calque pas sur le bâti ou sur le parcellaire car c’est le rattachement au point-adresse qui est pris en considération.
#> ---------------------------------------------------------------
library(magrittr)
list_culture_datasets <- portal$search(theme = "Culture")
#> 249 datasets found ...
lapply(list_culture_datasets, function(dataset) dataset$info$metas$theme) %>%
unlist() %>%
unique()%>%
sort()
#> [1] "Culture"
dts <- list_datasets[[1]]
dts$get_records()
#> # A tibble: 896 x 14
#> shape_area objectid nbr_elect_l12 arrondissement validite shape_len
#> <dbl> <int> <int> <int> <chr> <dbl>
#> 1 0 4 0 19 oui 0
#> 2 0 5 0 19 oui 0
#> 3 0 9 0 19 oui 0
#> 4 0 10 0 19 oui 0
#> 5 0 12 0 19 oui 0
#> 6 0 17 0 19 oui 0
#> 7 0 19 0 19 oui 0
#> 8 0 21 0 19 oui 0
#> 9 0 22 0 19 non 0
#> 10 0 29 0 19 non 0
#> # ... with 886 more rows, and 8 more variables: nbr_elect_f <int>,
#> # nbr_elect_e_m <int>, nbr_elect_e_e <int>, num_bv <int>, id_bv <chr>,
#> # lng <dbl>, lat <dbl>, geo_shape <list>
dts <- list_datasets[[1]]
dts$get_records(nrows = dts$info$metas$records_count, refine = list(validite = "oui"))
#> # A tibble: 54 x 14
#> shape_area objectid nbr_elect_l12 arrondissement validite shape_len
#> <dbl> <int> <int> <int> <chr> <dbl>
#> 1 0 4 0 19 oui 0
#> 2 0 5 0 19 oui 0
#> 3 0 9 0 19 oui 0
#> 4 0 10 0 19 oui 0
#> 5 0 12 0 19 oui 0
#> 6 0 17 0 19 oui 0
#> 7 0 19 0 19 oui 0
#> 8 0 21 0 19 oui 0
#> 9 0 33 0 19 oui 0
#> 10 0 45 0 19 oui 0
#> # ... with 44 more rows, and 8 more variables: nbr_elect_f <int>,
#> # nbr_elect_e_m <int>, nbr_elect_e_e <int>, num_bv <int>, id_bv <chr>,
#> # lng <dbl>, lat <dbl>, geo_shape <list>
Some datasets have attached files in a pdf, docx, xlsx, … format. These
can be retrieved using the get_attachments
method:
dts <- fodr_dataset("erdf", "coefficients-des-profils")
dts$get_attachments("DictionnaireProfils_1JUIL18.xlsb")
Some datasets have geographical information on each data point.
For these datasets, two additional columns will be present when fetching
records: lng
and lat
that correspond to the longitude and latitude
of the coordinates of the data point. Additionally, if there are shapes
associated to data points (polygons or linestrings for example), they
will be stored in the geo_shape
column either as a list of
data.frame
s with the same two columns lng
and lat
or in a list
sf
objects if sf
package is already installed. The latter allows a
straigtforward way to plot geometric data.
See for example the following dataset:
dts <- fodr_dataset("stif", "gares-routieres-idf")
dfRecords <- dts$get_records(nrows = 10)
dfRecords
#> # A tibble: 10 x 14
#> gr_id dpt_id lda_nom gare_nom zdl_id insee_txt gr_nom comm_nom zdl_nom
#> <int> <int> <chr> <chr> <int> <chr> <chr> <chr> <chr>
#> 1 22 95 Argent… ARGENTE… 47875 95018 Argen… Argente… Argent…
#> 2 183 77 Combs-… COMBS-L… 45771 77122 Combs… Combs-l… Combs-…
#> 3 567 77 Nemour… NEMOURS… 43245 77431 Nemou… Saint-P… Nemour…
#> 4 338 78 Houill… HOUILLE… 47439 78311 Houil… Houilles Houill…
#> 5 854 91 Vigneu… VIGNEUX… 45735 91657 Vigne… Vigneux… Vigneu…
#> 6 571 94 Nogent… NOGENT-… 46552 94058 Nogen… Le Perr… Nogent…
#> 7 615 95 Persan… PERSAN-… 43178 95487 Persa… Persan Persan…
#> 8 865 77 Villep… VILLEPA… 46725 77294 Ville… Mitry-M… Villep…
#> 9 758 93 Saint-… SAINT-O… 43203 93070 Saint… Saint-O… Saint-…
#> 10 690 77 Provin… PROVINS 47181 77379 Provi… Provins Provin…
#> # ... with 5 more variables: acces_pmr <chr>, lda_id <int>, lng <dbl>,
#> # lat <dbl>, geo_shape <list>
You can then use leaflet to easily plot this data on a map, either using data points:
library(leaflet)
leaflet(dfRecords) %>%
addProviderTiles("CartoDB.Positron") %>%
addMarkers(popup = ~gare_nom)
or using the geo_shape
column:
library(sf)
dfRecords[1,] %>%
st_as_sf() %>%
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(label = ~gare_nom)
Most of the data is available under the Open Licence (english PDF version) but double check if you are unsure.
- handle portals that require authentification,
- handle ArcGIS-powered portals,
- possibly handle navitia.io portals,
- ?