Good news for the network analysis community: A new very large Web hyperlink dataset was recently released: the Web Data Commons Hyperlink Graph consisting of 3.56 billion pages and 129 billion hyperlinks connecting them! For comparison, the Facebook friendship graph is reported to include 1.26 billion nodes (users) and 150 billion links (friendships).
The Web Data Commons Hyperlink Graph is thus the largest real-world network dataset available outside of the big companies themselves. What this means is, it is not only the big data companies anymore that can do actual “Big Data” research – now any research group can too. Therefore, we should be seeing studies using this dataset coming out in the next years. I can’t wait to find out what will be done with this dataset.
For another comparison, our own Koblenz Network Collection, which collects network datasets of many different types and sizes, maxes out at 1.9 billion edges, with the Twitter follower network. But since the dataset is open, we will now add it to it of course.