`{homophily}`

They say that “birds of a feather flock together”, but why take their word for it?

Introduction

In social networks, actors tend to associate with others who are similar in some way, such as race, language, creed, or class. This phenomenon is called homophily.

The {homophily} package provides flexible routines to measure mixing patterns using generic methods that are compatible with <network> and <igraph> objects, including {tidygraph}’s <tbl_graph> objects.

Installation

# install.packages("remotes")
remotes::install_github("knapply/homophily")

Usage

library(homophily)

data("jemmah_islamiyah", package = "homophily") # undirected <igraph>
data("sampson", package = "ergm")               # directed <network>

Mixing Matrices

We can easily build classical mixing matrices for undirected and directed graphs.

as_mixing_matrix(jemmah_islamiyah, dim1 = "role")

#> 5 x 5 Matrix of class "dtrMatrix"
#>                      
#>                       command team operation assistant bomb maker suicide bomber Team Lima
#>   command team                   6                  16         30              2         8
#>   operation assistant            .                   2         10              2         0
#>   bomb maker                     .                   .         20             10         0
#>   suicide bomber                 .                   .          .              0         8
#>   Team Lima                      .                   .          .              .        12

as_mixing_matrix(samplike, dim1 = "group")

#> 3 x 3 Matrix of class "dgeMatrix"
#>           
#>            Turks Outcasts Loyal
#>   Turks       30        1     5
#>   Outcasts     7       10     1
#>   Loyal        9        2    23

Remixing Mixing Matrices

We can also build generalized mixing matrices to explore mixing patterns across different dimensions.

For example, if we want to explore ties between each individual node and a group attribute, we can provide arguments to both dim1= and dim2=.

We’ll use the {network} convention of node names being stored in an attribute called "vertex.names" to see mixing patterns between each node and the "group" attribute.

as_mixing_matrix(samplike, dim1 = "vertex.names", dim2 = "group")

#> 18 x 3 Matrix of class "dgeMatrix"
#>              
#>               Turks Outcasts Loyal
#>   John Bosco      9        3     5
#>   Gregory        11        3     1
#>   Basil           3        5     0
#>   Peter           0        0     9
#>   Bonaventure     3        2     8
#>   Berthold        1        0     5
#>   Mark            8        2     1
#>   Victor          4        0     7
#>   Ambrose         2        0     6
#>   Romauld         1        1     6
#>   Louis           3        0     5
#>   Winfrid        10        0     1
#>   Amand           1        4     3
#>   Hugh            8        0     3
#>   Boniface        8        0     1
#>   Albert          6        0     2
#>   Elias           1        5     0
#>   Simplicius      3        6     0

Going further, we can also explore mixing patterns across group attributes. samplike’s "cloisterville" attribute notes whether each individual attended the Cloisterville monastery.

as_mixing_matrix(samplike, dim1 = "cloisterville", dim2 = "group")

#> 2 x 3 Matrix of class "dgeMatrix"
#>        
#>         Turks Outcasts Loyal
#>   TRUE     34       15    24
#>   FALSE    48       16    39

For directed graphs, the default behavior considers both outgoing and inbound ties, but you can provide "out" or "in" to direction= as desired.

as_mixing_matrix(samplike, dim1 = "cloisterville", dim2 = "group",
                 direction = "out")

#> 2 x 3 Matrix of class "dgeMatrix"
#>        
#>         Turks Outcasts Loyal
#>   TRUE     15        5    10
#>   FALSE    31        8    19

as_mixing_matrix(samplike, dim1 = "cloisterville", dim2 = "group",
                 direction = "in")

#> 3 x 2 Matrix of class "dgeMatrix"
#>           
#>            TRUE FALSE
#>   Turks      19    17
#>   Outcasts   10     8
#>   Loyal      14    20

E-I Index

ei_index(jemmah_islamiyah, node_attr_name = "role")

#> [1] 0.3650794

ei_index(jemmah_islamiyah, node_attr_name = "role", scope = "group")

#>        command team operation assistant          bomb maker      suicide bomber           Team Lima 
#>           0.8064516           0.7142857          -0.3333333           1.0000000          -1.0000000

ei_index(jemmah_islamiyah, node_attr_name = "role", scope = "node")

#>     MUKLAS     AMROZI      IMRON    SAMUDRA   DULMATIN      IDRIS    MUBAROK    AZAHARI      GHONI 
#>  0.5555556  0.5000000  1.0000000  0.7333333  0.1111111  0.6000000  0.3333333  0.1111111  0.1111111 
#>    ARNASAN       RAUF    OCTAVIA    HIDAYAT    JUNAEDI      PATEK       FERI     SARIJO 
#>  1.0000000 -0.2000000 -0.2000000 -0.2000000 -0.2000000  0.1111111  1.0000000  0.1111111

ei_index(samplike, node_attr_name = "group")

#> [1] -0.4318182

ei_index(samplike, node_attr_name = "group", scope = "group")

#>      Turks   Outcasts      Loyal 
#> -0.6666667 -0.8181818 -1.0000000

ei_index(samplike, node_attr_name = "group", scope = "node")

#>  John Bosco     Gregory       Basil       Peter Bonaventure    Berthold        Mark      Victor 
#> -0.05882353 -0.46666667 -0.25000000 -1.00000000 -0.23076923 -0.66666667 -0.45454545 -0.27272727 
#>     Ambrose     Romauld       Louis     Winfrid       Amand        Hugh    Boniface      Albert 
#> -0.50000000 -0.50000000 -0.25000000 -0.81818182  0.00000000 -0.45454545 -0.77777778 -0.50000000 
#>       Elias  Simplicius 
#> -0.66666667 -0.33333333

Assortativity

assort_discrete(jemmah_islamiyah, node_attr_name = "role")

#> [1] 0.09078704

assort_discrete(samplike, node_attr_name = "group")

#> [1] 0.5445606

assort_degree(samplike)

#> [1] 0.05569702

Benchmarks

library(tidyr)
library(bench)
library(ggplot2)
library(igraph)


build_it <- function(n_nodes, prob = 0.25, dir = TRUE) {
  g <- random.graph.game(n_nodes, prob, directed = dir)
  vertex_attr(g, name = "group") <- sample(letters, n_nodes, replace = TRUE)
  g
}

bench_it <- function(bench_foo, seq_nodes = seq(10, 2000, by = 100), ...) {
  all_res <- lapply(seq_nodes, function(x) {
    g <- build_it(x)
    res <- mark(
      bench_foo(build_it(x), node_attr_name = "group"),
      iterations = 20
    )
    res[["n_nodes"]] <- x
    res
  })
  do.call(rbind, all_res)
}

set.seed(831)
res <- bench_it(ei_index)

res %>% 
  unnest() %>% 
  ggplot(aes(x = n_nodes, y = time)) +
  ggbeeswarm::geom_quasirandom(aes(color = gc)) +
  coord_flip()

`R CMD Check`

devtools::check(quiet = TRUE)

#> Writing NAMESPACE
#> Writing NAMESPACE

#> -- R CMD check results --------------------------------------------------- homophily 0.0.0.9000 ----
#> Duration: 34.9s
#> 
#> 0 errors v | 0 warnings v | 0 notes v

Cite

citation("homophily")

#> 
#> To cite homophily use:
#> 
#>   Knapp, B. G. (2019). homophily: Measuring Network Homophily Data. R package version
#>   0.0.0.9 Retrieved from https://knapply.github.io/homophily
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{homophily-package,
#>     title = {homophily: Measuring Network Homophily},
#>     author = {Brendan Knapp},
#>     year = {2019},
#>     note = {R package version 0.0.0.9},
#>     url = {https://knapply.github.io/homophily},
#>   }