Something that irked me when I first used network layout techniques and explored clustering methods was the amount of subjectivity there could be in the process, in particular how much those choices made in the layout are obscured from the final rendering. I had no idea there were so many choices made that could affect the final plot so as to change any characteristics the conclusions depend on. Namely a parameter when slightly changed- would change the layout.
This is not a trivial problem. Visualization and network layouts can help facilitate scientific insights- and unfortunately a poor layout can be misleading and instead encourage incorrect conclusions.
I came across a network structure visualization technique called Hive plots, and on it’s research website one passage echoes this sentiment:
Consider a typical hairball. Now think of how you’d describe to someone the method used to create it. Chances are, even you don’t know the full details of the layout algorithm. And even if you did, you could not necessarily relate how specific network structures would translate into output.
Even if you did describe how the hairball was created (you’d probably name the layout algorithm), it would be very likely that the description would not contain any phrases that relate to the structure of the network (which is, after all, what your audience is keen on).
On the other hand, it is easy to describe how a hive plot was created, and likewise easy for your audience to understand, because you can use terms relevant to the questions your visualization is designed to address. Instead of saying “I used a force-directed approach to place the nodes.”, which does not help your audience relate to the network’s structure, you can say “I put all the sink nodes on this axis and ordered them by absolute connectivity.”, which is immediately meaningful.
The basic idea of a hive plot is to arrange nodes on radial axes and base their location on a structural property. For example, for a directed graph there could be two axis, one for sources and one for sinks. The source nodes and sink nodes could be arranged by degree from the center, and edges from one axis to the next would connect nodes to reflect the source to sink flow in the network. There are a lot of possible structures to focus on, but the actual layout is constrained quite constrained- radial axis with nodes and edges- but the benefit is that once you have seen a few hive plots you get the hang of it and can identify patterns quickly. The best part is that the layout does not depend on unstable parameter, and you won’t have to worry about layout artifacts clouding your pattern recognition since the layout is determined by the network structure.
Here is an intro and js example by Mike Bostock, and also a set of slides from a presentation by Martin Krzywinski, the researcher who developed hive plots, who gives a general introduction to the plots [pdf]. Also scroll through the website of the plots, there are fantastic images and examples. Especially engaging is seeing several parameters visualized next to one another. I look forward to exploring this technique.