A common statistic computed for networks is the diameter. The diameter equals the longest shortest path length in a network. Let me explain: Given two nodes u and v in a network, their distance is the minimal number of edge needed to reach one from the other. In other words, the distance between u and v equals the shortest path length between these two nodes. Now, find two nodes u and v such that the distance between them is maximal – this is the diameter of the network. The diameter is thus an integer that characterizes a network.
The diameter is an important statistic for a network, since it is used to characterize a network as a small world. The expression small world refers to the observation that it is possible to reach any node from any node in a small number of steps, as in the famous six degrees of separation.
The diameter is only one of several ways of measuring the “typical path length” in networks, and in fact, it may not even be the most representative one. As it is a maximum, it is not representative of the full distribution of path lengths in the network. Also, being defined as a maximum, it is sensitive to outliers, e.g., tendrils in the networks that increase the diameter but do not affect the overall structure of the network. Therefore, several alternatives measures of typical path lengths can be used:
- The diameter (as defined above) is the maximum path length
- The mean path length, averaged over all node pairs
- The median path length
- The 90-percentile effective diameter
- The 50-percentile effective diameter
The effective diameters (50- and 90-percentile) are the path lengths such that 50 or 90 percent of node pairs are a smaller or equal distance apart, interpolated between integer values. Thus, the median path length is the 50-percentile effective diameter rounded down.
The statistics are shown in the following plot, for the Slashdot Zoo network:
This shows the cumulated distribution of path lengths over all node pairs in the network. In this network, the diameter (δ) equals 12. The X-percentile effective diameters are shown as the two vertical red lines and equal 3.51 for the 50-percentile effective diameter (δ0.5), and 4.71 for the 90-percentile effective diameter (δ0.9). The median path length is thus 3 in this network. The mean path length (δm) equals 4.04 and is somewhere in between. We see that the distribution is skewed, because the mean and median path length are not equal in this network. However, when I did this analysis for more networks, I found that in most cases, both numbers coincide. The following plot shows many networks (those small enough such that the number could be calculated), plotted by the 50-percentile effective diameter (on the X axis) and the mean path length (on the Y axis). Each network is represented by a two-letter (and a for few three-letter) code. See the results for yourself:
This plot shows that both not only correlate very highly, but are almost equal for almost all networks. In other words: the 50-percentile effective diameter and the mean path length are almost equal in actual networks. What does this mean? I’m not sure, but this result is not trivial, since the distribution of path lengths is not symmetric in most networks.
For more path length distributions, see here.
For another blog post about the path length distribution, see