The main repository of this package is stored in codeberg.org/teoten/mapic
A mirror is kept under github.com/teotenn/mapic for major releases only, but it might be removed any time. We recommend using the main repository.
Mapic is an opinionated R package with the single purpose of creating geographic maps showing uni variate numerical data as dots. Time can be considered a second variable that can be visualized by creating several maps, resulting in dynamic maps as video or gif files.
Its power lies in the possibility of mapping any region in the world that can be found on open street map. This can be achieved with other packages however, each has its own limitations. The most powerful R packages for the creation of maps are also very big, and often require numerous dependencies, making them less portable. Although they can achieve the same as mapic, their purpose is different. Other more simple packages exist but they contain data for particular countries or regions only. Mapic is positioned in a middle point between these two kinds of packages.
The target of mapic is to make the process of creation of the maps simple, while keeping the quality standardized. During the creation of a map, there are many variables to consider and in many different areas (geographical, stylistic, data-related, etc.). Mapic reduces the complexity and gives the user the possibility of keeping the same standards among teams or when creating several maps.
Mapic relies on 2 main packages to achieve its purpose: maps by Richard A. Becker and Allan R. Wilks, and the well known ggplot2 by Hadley Wickham and collaborators.
The documentation for its usage is still to be improved. In the meantime, details of the development and the concept can be found in the blog https://blog.teoten.com in the series maps-app.
The project is under the GNU GENERAL PUBLIC LICENSE Version 3 (GPL-3). Your are free to copy it and reproduce it with no guarantee.
It is recommended to have the following packages pre-installed: R (>= 4.0.0), maps (>= 3.3.0), ggplot2, httr, stringr (>= 1.5.0), dplyr, RJSONIO, purrr, tidyr. Depending on the database type to be used, the following packages could be also necessary: RSQLite (>= 2.2.3), RPostgreSQL.
The easiest way to install for now is to clone the repository and install from source. You can use devtools
to do it in the simplest way:
devtools::install("/path_to_mapic/mapic", quick = TRUE, build = FALSE, keep_source = TRUE)
You can also try `devtools::install_git(“https://codeberg.org/teoten/mapic”)`, or download the compressed files from the tags and/or releases.
A basic workflow goes from the raw data to plot, still without coordinates, to the creation of the maps. The main aim of mapic is to make the process easy and simple, as you will see in the following lines. The basic workflow also shows the main usage of the package and its functions.
We start showing an example, an imaginary dataset of Mexican cities. You can call it from the library data as mexico
.
Suppose that we have a company that opened 10 offices in the country between 2001 and 2010, one each year. It went bankrupt in 2021 and all the offices had to close. We want to get a map of the country showing dots on each city where there was an office for a selected year. That is the main purpose of mapic.
Here is how the data looks like:
library(mapic)
data(mexico)
mexico[,c(-3, -9)]
id name registration_year year_end country region 1 1 org1 2001 2021 MX Mexico 2 2 org2 2002 2021 MX Baja California Norte 3 3 org3 2003 2021 MX Mexico 4 4 org4 2004 2021 MX Jalisco 5 5 org5 2005 2021 MX Queretaro 6 6 org6 2006 2021 MX Baja California Norte 7 7 org7 2007 2021 MX Mexico 8 8 org8 2008 2021 MX Morelos 9 9 org9 2009 2021 MX Mexico 10 10 org10 2010 2021 MX Estado de Mexico city 1 Ciudad de Mexico 2 Tijuana 3 Ciudad de Mexico 4 Guadalajara 5 Queretaro 6 Tijuana 7 Ciudad de Mexico 8 Cuernavaca 9 Ciudad de Mexico 10 Texcoco
First we need to specify what will be used as database. Here I chose to use SQLite
. database_configuration
creates an S3 object that can be used whenever we need to refer to the database in mapic.
mx_db <- database_configuration("SQLite",
table = "orgs",
database = "~/Downloads/mx.sqlite")
class(mx_db)
The first element that we need to make even such simple maps are the coordinates of each city. **Mapic’s** api_to_db
searches for the coordinates of each city in our data frame and send it directly to the database that we specified before, together with the rest of the data necessary to make maps.
The function and results look similar to the following:
api_to_db(mx_db,
dat = mexico,
city = "city",
country = "country",
state = "region",
year_start = "Registration_year",
year_end = "year_end",
db_backup_after = 5)
[1] "Searching entry 1" Several entries found for Ciudad de Mexico MX [1] "Searching entry 2" Found Tijuana, Municipio de Tijuana, Baja California, 22320, México [1] "Searching entry 3" Several entries found for Ciudad de Mexico MX [1] "Searching entry 4" Found Guadalajara, Jalisco, México [1] "Searching entry 5" Found Santiago de Querétaro, Municipio de Querétaro, Querétaro, México [1] "Searching entry 6" [1] "Found from memory" [1] "Searching entry 7" [1] "Found from memory" [1] "Searching entry 8" Found Cuernavaca, Morelos, 62000, México [1] "Searching entry 9" [1] "Found from memory" [1] "Searching entry 10" No results found for &city=Txcoco&state=Estado%20de%20Mexico Search finished. 10 entries searched. 1 ENTRIES NOT FOUND
db_load
help us to load the data back to R using our mapic object defined before.
mx <- db_load(mx_db)
mx[,c("id", "city", "state", "lon", "lat")]
id city state lon lat 1 1 Ciudad de Mexico Mexico -99.13316 19.43271 2 2 Tijuana Baja California Norte -117.01953 32.53174 3 3 Ciudad de Mexico Mexico -99.13316 19.43271 4 4 Guadalajara Jalisco -103.33840 20.67204 5 5 Queretaro Queretaro -100.39706 20.59547 6 6 Tijuana Baja California Norte -117.01953 32.53174 7 7 Ciudad de Mexico Mexico -99.13316 19.43271 8 8 Cuernavaca Morelos -99.23423 18.92183 9 9 Ciudad de Mexico Mexico -99.13316 19.43271
When some of the entries are not found we can add them “manually” using add_coords_manually
. The function needs a data.frame
or csv
file that contains exactly the same fields as our database. In the example the city “Txcoco” was not found due to a misspelling. we could fix that directly in the data frame and search again. Or we can add the coordinates “manually” as below.
rown <- 10
to_add <- data.frame(id = rown,
year_start = mexico$registration_year[rown],
year_end = mexico$year_end[rown],
city = "Texcoco",
country = mexico$country[rown],
region = "",
state = mexico$region[rown],
county = "",
osm_name = "",
lon = 98.88, lat = 19.51)
add_coords_manually(to_add, mx_db)
We have now the coordinates of all the cities that we need safely saved in a database. We can resume our work whenever we want.
mx <- db_load(mx_db)
mx[,c("id", "city", "state", "lon", "lat")]
id city state lon lat 1 1 Ciudad de Mexico Mexico -99.13316 19.43271 2 2 Tijuana Baja California Norte -117.01953 32.53174 3 3 Ciudad de Mexico Mexico -99.13316 19.43271 4 4 Guadalajara Jalisco -103.33840 20.67204 5 5 Queretaro Queretaro -100.39706 20.59547 6 6 Tijuana Baja California Norte -117.01953 32.53174 7 7 Ciudad de Mexico Mexico -99.13316 19.43271 8 8 Cuernavaca Morelos -99.23423 18.92183 9 9 Ciudad de Mexico Mexico -99.13316 19.43271 10 10 Texcoco Estado de Mexico 98.88000 19.51000
To start creating the maps we first we define the colors that we want to use with the function define_map_colors
. The values of the colors have to be in hex notation. Here is the list of colors to define.
my_colors <- define_map_colors(dots_orgs = "#493252",
target_country = "#8caeb4",
empty_countries = "#f3f3f3",
border_countries = "#9c9c9c",
oceans = "#4e91d2",
text_cities = "#a0a0a0",
text_legend = "#493252",
background_legend = "#ffffff",
text_copyright = "#f3f3f3")
We can as well use the default colors:
default_map_colors
[1] "The chosen colors" dots_orgs : #493252 target_country : #8caeb4 empty_countries : #f3f3f3 border_countries : #9c9c9c oceans : #4e91d2 text_cities : #a0a0a0 text_legend : #493252 background_legend : #ffffff text_copyright : #f3f3f3
Or modify some of the defaults
(my_cols <- with_default_colors(list(dots_orgs = "#D30000",
text_legend = "#ffffff",
text_cities = "#000000",
background_legend = "#000000")))
[1] "The chosen colors" dots_orgs : #D30000 target_country : #8caeb4 empty_countries : #f3f3f3 border_countries : #9c9c9c oceans : #4e91d2 text_cities : #000000 text_legend : #ffffff background_legend : #000000 text_copyright : #f3f3f3
Now we can create the maps by calling all the functions that build it up, one after the other using the pipe.
## Define limits to plot
x_lim <- c(-118, -86)
y_lim <- c(14, 34)
selected_year <- 2020
## Plot
base_map("Mexico",
x_lim,
y_lim,
map_colors = my_cols) |>
mapic_city_dots(mx,
year = selected_year) |>
mapic_city_names(c("Ciudad de Mexico", "Guadalajara", "Tijuana")) |>
mapic_year_internal(year_label = "Año") |>
mapic_totals_internal(totals_label = "Totales")
We can create the map for a different year by changing only one value in the whole pipe.
selected_year <- 2002
## Plot
base_map("Mexico",
x_lim,
y_lim,
map_colors = my_cols) |>
mapic_city_dots(mx,
year = selected_year) |>
mapic_city_names(c("Ciudad de Mexico", "Guadalajara", "Tijuana")) |>
mapic_year_internal(year_label = "Año") |>
mapic_totals_internal(totals_label = "Totales")
Now we see only two small dots, one for Mexico city and the second one for Tijuana.
The coordinates are searched through the API of open street map (OSM) nominatim using the function coords_from_city
.
coords_from_city("Houston", "US", state = "Texas")
Several entries found for Houston US lon lat osm_name 1 -95.3677 29.75894 Houston, Harris County, Texas, United States
If a particular place is not found, I recommend going directly to the nominatim web page and search there, if you cannot find it, mapic won’t either. You can alternatively add the coords “by hand” using the function add_coords_manually
as exemplified above.
The function api_to_db
uses coords_from_city
recursively over a data frame and stores the results directly in a database or equivalent (as specified in the argument mdb
). Its homologous, api_no_city
fulfills the same function but for a county or state.
NOTE: Currently mapic search of coordinates supports only English characters. Other characters containing non-english symbols such as á, ö, è, ñ, etc., will trigger an error.
The object “mdb” is an S3 object that specifies the type of database to be used and its details. It can be easily created with the function database_configuration
, see its documentation for more information.
The options supported currently are:
- SQLite (Requires library
RSQLite
) - PostgreSQL (Requires library
RPostgreSQL
) - R’s internal data frame
- csv file
We already show in the example above how to define the colors to be used in the map. Since mapic focuses on the standardization of maps, this is an important step that helps mapic to choose the same colors for all the maps.
The base function base_map
creates a map of any country found in the package maps
. It can be as simple as writing the name of the country as defined in the package. The function below renders a world wide map, highlighting Brazil in a different color. By default it shows the coordinates so that the user can have a point of reference and choose the limits.
base_map("Brazil")
There are basically 2 ways of creating the maps either using ggplot
objects or using mapic’s S3 objects of class mapicHolder
. Both start with the creation of the base_map
.
## Define limits to plot
x_lim <- c(-118, -86)
y_lim <- c(14, 34)
selected_year <- 2020
map_ggplot <- base_map(
country = "Mexico",
x_limits = x_lim,
y_limits = y_lim,
show_coords = TRUE,
return_mapic_obj = FALSE)
class(map_ggplot)
[1] "gg" "ggplot"
map_mapic <- base_map(
country = "Mexico",
x_limits = x_lim,
y_limits = y_lim,
show_coords = TRUE,
return_mapic_obj = TRUE)
class(map_mapic)
[1] "mapicHolder"
Using mapicHolder
is the recommended way of mapic because it reduces the information that each function needs to take, sharing info among them using the object as messenger. Thus, object holds the information used to create the map, including data and other values and you can access it individually.
As seen in the example of the workflow
mx_map <- base_map("Mexico",
x_lim,
y_lim,
map_colors = my_cols) |>
mapic_city_dots(mx,
year = selected_year) |>
mapic_city_names(c("Ciudad de Mexico", "Guadalajara", "Tijuana")) |>
mapic_year_internal(year_label = "Año") |>
mapic_totals_internal(totals_label = "Totales")
names(mx_map)
[1] "mapic" "base_map" "x_limits" "y_limits" [5] "colors" "legend" "theme_labels" "mapic_dots" [9] "year" "data" "mapic_city_labels" "mapic_year" [13] "mapic_totals" "totals"
Here is a short description:
- The element
mapic
contains the plot that you see when calling the function. It collectes all the elements that are piped. - Each
ggplot
element is kept separately and it is named after the function:base_map
contains the base map,mapic_dots
contains the dots, etc. - The limits selected for x and y axes are contained in
x_limits
andy_limits
respectively. - The object that we created to define the colors is in
colors
. theme_labels
is the object of classtheme
fromggplot2
used for the labels of the years and totals.year
andtotals
are the values printed in the labels.legend
is a legend that can be added outside of the map.data
contains 2 elements:base
is the original data passed to the function andmap
which is the modified data with the required wrangling.
Putting together all the elements of the map in a ggplot2
style would achieve the same results as calling the object mapic
on its own. Thus, we can choose particular elements to show on the map using this strategy.
mx_map$base_map +
mx_map$mapic_city_labels +
mx_map$mapic_totals
We can also add ggplot
elements to the main map by accessing the object $mapic
mx_map$mapic +
ggtitle("A map of Mexico")
The method above using mapic’s default objects allows us to modify elements within the object to achieve a different map. However, this is not recommended. If that is desired, the recommended method is to use default ggplot
objects as shown above (using return_mapic_obj = FALSE
in base_map
function).
base_map(
country = "Mexico",
x_limits = x_lim,
y_limits = y_lim,
show_coords = TRUE,
return_mapic_obj = FALSE) +
mapic_city_dots(mx,
year = 2020,
column_names = list(
lat = "lat",
lon = "lon",
cities = "city",
year_start = "year_start",
year_end = "year_end")) +
mapic_city_names(.df = mx,
list_cities = c("Ciudad de Mexico", "Guadalajara", "Tijuana")) +
mapic_year_internal(year = 2020,
x_limits = x_lim,
y_limits = y_lim,
year_label = "Año") +
mapic_totals_internal(totals = 10,
x_limits = x_lim,
y_limits = y_lim,
totals_label = "Totales")
As you can see, the results are the same (here I used default mapic’s colors) but the information needed for each function is more. This could lead to mistakes, for example regarding the totals that are mentioned in the label vs the actual totals shown in the map. On the other hand, this option gives us flexibility and allows to use each component independently if necessary.
Currently mapic has the following limitations with plans to be improved:
Due to the differences in the connections and queries mapic supports a limited list of databases, so far PostgreSQL and SQLite. Support for other databases is not planned but it can be easily implemented by adding certain methods. Here is an example on how to implement PostgreSQL (NOTE it is not necessary to add it since support for PostgreSQL is included, but it is rather an example for other databases).
There are basically 3 functions that require the connection to the database: db_load
, db_remove_empty
and db_append
. You can add methods to this function as any other method for S3 objects in R.
db_load.my_postgres <- function(mdb) {
require(RPostgreSQL)
table <- mdb$table
schema <- mdb$schema
driv <- DBI::dbDriver("PostgreSQL")
con <- DBI::dbConnect(driv,
dbname = mdb$database,
host = mdb$host,
port = mdb$port,
user = mdb$user,
password = mdb$password)
query_create_table <- paste0(
"CREATE TABLE IF NOT EXISTS ",
schema, ".", table,
"(id INTEGER UNIQUE,
year_start INTEGER,
year_end INTEGER,
city TEXT,
country TEXT,
region TEXT,
state TEXT,
county TEXT,
lon REAL,
lat REAL,
osm_name TEXT)"
)
DBI::dbExecute(conn = con, query_create_table)
db <- DBI::dbReadTable(con, name, c(schema, table))
DBI::dbDisconnect(con)
return(db)
}
db_remove_empty.my_postgres <- function(mdb) {
require(RPostgreSQL)
schema <- mdb$schema
table <- mdb$table
driv <- DBI::dbDriver("PostgreSQL")
con <- DBI::dbConnect(driv,
dbname = mdb$database,
host = mdb$host,
port = mdb$port,
user = mdb$user,
password = mdb$password)
dbExecute(conn = con,
paste0("DELETE FROM ", schema, ".", table, " WHERE lon IS NULL OR lat IS NULL"))
dbDisconnect(con)
}
db_append.my_postgres <- function(mdb, df) {
require(RPostgreSQL)
path_to_db <- mdb$database
table <- mdb$table
driv <- DBI::dbDriver("PostgreSQL")
con <- DBI::dbConnect(driv,
dbname = mdb$database,
host = mdb$host,
port = mdb$port,
user = mdb$user,
password = mdb$password)
dbWriteTable(con,
name = c(mdb$schema, mdb$table),
value = df,
row.names = FALSE,
append = TRUE)
dbDisconnect(con)
}
Therefore, now we need to create the S3 object of class my_postgres
that should contain all the parameters necessary for our method to work.
mdb_obj <- structure(
list(
table = "table-name",
database = "database-name",
schema = "public",
host = "localhost",
port = 5432,
user = "user",
password = "password"),
class = c("my_postgres"))
Now we can do db_load(mdb_obj)
and it will load "table-name"
or create it if it doesn’t exist. Once this methods exist they are automatically used by api_to_db
and any other function that might need them.
Currently we support data only in the long format. This means that each row is one entry and mapic will count the amount of rows automatically, in order to know the totals. We intend to extend this to summarized data that includes a particular field or column for counts/totals.
Currently the labels for totals and year placed internally in the map can be positioned only in the lower left corner of the map. We intend to extend this.
Private information to be included within the map is not yet support but it has priority to be added. This includes the possibility of adding a logo, watermark and/or copyright text. However, this can also be easily achieved through simple ggplot2
functions.
Currently the only timeline that is supported is year, and the year has to be passed as a integer numeric value. We plan to extend this to more date times.
Currently the maps can be render only by country. We plan to extend this to create maps of regions that include several countries (i.e., continents).
Lower granularity maps are not planned.
Currently the coordinates can be searched only by City, State and County. We intend to extend this to postal code as well.
Also, mapic can only recognize English characters for now. This is a limitation from the API and we will find ways to extend it. Thus, characters containing non-english symbols such as á, ö, è, ñ, etc., will trigger an error.
Additionally, when more than one result is found, mapic takes the first entry found. We expect to extend this to allow the user to select which entry to choose. This also means that the more specific the data, the more precise mapic can be.
mapic many other limitation as a generic map generator, but this is not its purpose. Please read the About section at the beginning of this document to find out what mapic is.
The package is still under development in a partially stable version. It does not implement an official changelog yet but you can always visit the file org to see future plans and old changes summarized.
Here is a summary of main releases.
Version 2.5.0 extends support for other types of databases by implementing S3 objects and methods. Now aside of SQLite, csv files and data frames can be used as a serverless option for storing data. The new objects facilitate the extension of basic functions for implementing other databases. It is planned to include at least Postgres and MariaDB for future releases.
The first stable version, 2.4.2, is able to create complete maps following the strategy of its parent project (a private version of the code).