Plotting a Degree Distribution on the Command Line

The following shell command plots a degree distribution in a log-log scale.  It uses only standard unix commands:

<INPUTFILE sed -re ‘/^\%/d;s,^\s*\S+\s*(\S+).*$,\1,’ | sort | uniq -c | sort -n | sed -re ‘s,^\s*(\S+).*$,\1,;s,.,#,g’ | uniq -c | sed -re ‘s,^\s*(\S+).*,\1,;s,.,#,g’

The file “INPUTFILE” must be a text representation of a directed network with one edge per line, each containing a from-node-ID and to-node-ID.  The command will plot the indegree distribution on a doubly logarithmic scale.  Here it is executed for KONECT‘s “YouTube links” dataset:

$ < sed -re ‘/^\%/d;s,^\s*\S+\s*(\S+).*$,\1,’ | sort | uniq -c | sort -n | sed -re ‘s,^\s*(\S+).*$,\1,;s,.,#,g’ | uniq -c | sed -re ‘s,^\s*(\S+).*,\1,;s,.,#,g’

The file “” can be downloaded from here.  The degree distribution is transposed from its usual orientation:  The degrees are on the Y axis and the frequency is on the X axis.  As the output shows there is a power law here:  The lines get shorter in a linear fashion and thus the frequency of each degree value is a negative power of the degree value.

This shows that:

  • You don’t need megabytes of graphics code to generate plots
  • You don’t need a log() function to plot logarithmic axes
  • It absolutely makes sense to use both sort(1) and uniq(1) multiple times in a single pipeline

The drawn degree distribution also uses logarithmic binning, and thus it is much more useful in visualizing the tail of the distribution than the standard way of plotting degree distributions.

Try it out with more datasets:  List of datasets in KONECT

Can you write other shell one-liners for generating other network analysis plots?


