Promoting the use of R in the NHS

Blog Article

This post was originally published on this site

(This article was first published on schochastics, and kindly contributed to R-bloggers)

This post introduces the new R package `graphlayouts` which is available on CRAN since a few days. We will use
network data from the Game of Thrones TV series (seemed timely at the time of writing)
to illustrate the core layout algorithms of the package. Most of the algorithms use
stress majorization as its basis, which I described in more detail in and older post. Here, I will only
focus on the practical aspects of the package. ``````library(tidyverse)
library(igraph)
library(graphlayouts)
library(ggraph)
library(extrafont)

Preparing the Data

To illustrate the functionality of `graphlayouts` we will use data compiled from the
TV Show Game of Thrones, specifically the character interaction networks. See here and here for original work with the data. The raw data can be downloaded from github. We do this using the `map()` function of `purrr`.

``edges ``

Next, we transform the raw edgelists into igraph objects, again using `map`.

``got_graphs ``

Lastly, we transform and add some node variables. First, we change the character names from
upper case to title case. Then we compute a clustering, and the total number of interactions per character.

``````mutate_graph %
str_to_title()
clu ``````

`got_graphs` now contains all seven season networks in a list. `graphlayouts` contains
the function `qgraph()` which can be used to get a very rough visualization of the network,
without the need of lengthy `ggraph` code.

``qgraph(got_graphs[])`` The function is very similar to `qplot()` from `ggplot2`, yet it does not take any additional arguments.
This is planned for a later release.

Visualizing each Season

The core layout algorithm of `graphlayouts` is implemented in the function `layout_with_stress()`, which is also
called in `qgraph` by the way. The package also contains a convenience function to work smoothly with `ggraph`.

``xy ``

you can do

``ggraph(x,layout = "stress")+...``

Creating nice plots is now “just” a matter of stitching some `ggraph` code together.
For our character networks, we define a function that does that for all seasons simultaneously.
I usually would not use hard to read fonts for visualizations but I thought Enchanted Land (available here) kind of fits here.

``got_palette `` (Open image in a new tab to view it in full size)

The advantages of stress based layouts are outlined in this post. It has been my goto
layout algorithm since years and I’d wish that more SNA software would implement it.
The below figure shows all seven network in one plot. Focus on Characters

While “stress” is the key graph layout in the package, there are other, more specialized layouts
that can be used for different purposes. `layout_with_focus()` for instance allows you to focus
the network on a specific character and order all other nodes in concentric circles (depending on distance) around it. Here is Season one with focus on Ned Stark.

``````ggraph(got_graphs[],layout="focus",v=1)+
geom_node_point(aes(fill=clu,size=size),shape=21,col="grey25")+
geom_node_text(aes(filter=(name=="Ned"),size=size,label=name),family = "Enchanted Land",repel=F)+
scale_edge_width_continuous(range=c(0.2,0.9))+
scale_size_continuous(range=c(1,8))+
scale_fill_manual(values=got_palette)+
theme_graph(title_family = "Enchanted Land",
subtitle_family = "Enchanted Land",
title_size = 20,
subtitle_size = 16)+
labs(title=paste0("Game of Thrones (Season 1)"),
subtitle = "Focus on Ned Stark")+
theme(legend.position = "none")`````` Based on a similar principle is `layout_with_centrality()`. You can specify any centrality index (or numeric vector for that matter), and create a concentric layout where the most central nodes are put in the center.

``````ggraph(got_graphs[],layout="centrality",cent=graph.strength(got_graphs[]))+
geom_node_point(aes(fill=clu,size=size),shape=21,col="grey25")+
geom_node_text(aes(size=size,label=name),family = "Enchanted Land",repel=F)+
scale_edge_width_continuous(range=c(0.2,0.9))+
scale_size_continuous(range=c(1,8))+
scale_fill_manual(values=got_palette)+
theme_graph(title_family = "Enchanted Land",
title_size = 20)+
labs(title=paste0("Game of Thrones (Season 1)"),
subtitle = "weighted degree layout")+
theme(legend.position = "none")`````` To get someone else in the center than Ned Stark, here is season seven.

``````ggraph(got_graphs[],layout="centrality",cent=graph.strength(got_graphs[]))+
geom_node_point(aes(fill=clu,size=size),shape=21,col="grey25")+
geom_node_text(aes(size=size,label=name),family = "Enchanted Land",repel=F)+
scale_edge_width_continuous(range=c(0.2,0.9))+
scale_size_continuous(range=c(1,8))+
scale_fill_manual(values=got_palette)+
theme_graph(title_family = "Enchanted Land",
title_size = 20)+
labs(title=paste0("Game of Thrones (Season 7)"),
subtitle = "weighted degree layout")+
theme(legend.position = "none")`````` Combining all Seasons

The last important layout algorithm is `layout_as_backbone()` which is tailored to work
with “hairball” networks that may contain a hidden group structure. The below plot shows its impressive performance. Of course the network used in the above example is specifically tailored to show this power. So

We will put all seasons together and create one big GoT network that contains all
character interactions from Season 1-7 to illustrate its performance with our example.

``````got_all %
group_by(Source,Target) %>%
summarise(Weight=sum(Weight)) %>%
ungroup() %>%
rename(weight=Weight) %>%
mutate_if(is.character,function(x) str_replace_all(x,"_"," ") %>%
str_to_title()) %>%
graph_from_data_frame(directed=FALSE)

clu ``````

First, let’s check what it looks like using `layout_with_stress()`.

``````ggraph(got_all,layout="stress")+
geom_node_point(aes(fill=clu,size=size),shape=21)+
scale_edge_width_continuous(range=c(0.8,1.8))+
scale_size_continuous(range=c(2,10))+
scale_fill_manual(values=got_palette)+
theme_graph(title_family = "Enchanted Land",
subtitle_family = "Enchanted Land",
title_size = 20,
subtitle_size = 16)+
labs(title=paste0("Game of Thrones"),subtitle = "Character Network (Season 1-7)")+
theme(legend.position = "none")`````` It is clearly hard to see anything here, since the network is too dense. Enter,
`layout_as_backbone()`. The algorithm itself is rather involved but some of the key points are:

• Find strongly embedded edges depending on graph motifs
• Compute the union of all maximum spanning trees to “hold the network together”

If you are interested in all the technical details, consult the original paper.

`layout_as_backbone()` takes a parameter `to_keep` which determines the percentage of edges
to keep for the layout calculation. In our case, we will the 20% most embedded edges.
(The parameter always requires some experimenting to find out what works best). Note that this does not mean that we throw away the rest of the edges. They are just not used to calculate the layout.
But you can still do that afterwards (as we will do here) since the function returns a logical
vector which indicates if an edge belongs to the backbone or not.

``````bb =800),size=size,label=name),
family = "Enchanted Land",repel=T)+
scale_edge_width_continuous(range=c(0.8,1.8))+
scale_size_continuous(range=c(2,10))+
scale_fill_manual(values=got_palette)+
theme_graph(title_family = "Enchanted Land",
subtitle_family = "Enchanted Land",
title_size = 20,
subtitle_size = 16)+
labs(title=paste0("Game of Thrones"),subtitle = "Character Network (Season 1-7)")+
theme(legend.position = "none")`````` There is definitely not as much structure going on as in the contrived example. The algorithm can’t
uncover a hidden structure if there is no such thing to uncover. Yet, the layout still
reveals some structure and clearly enhances readability over the stress based layout.

This post introduced only the core layout algorithms of `graphlayouts`. Check out
the vignette of the package for a more extensive list of algorithms. Also, if you are having trouble
with the `ggraph` code, check out my RStudio Addin that
provides a tiny GUI to plot networks. Now that `graphlayouts` is out, I will continue
working on that package. 