Largest Network Dataset Ever Released – As Big As Facebook

Good news for the network analysis community:  A new very large Web hyperlink dataset was recently released:  the Web Data Commons Hyperlink Graph consisting of 3.56 billion pages and 129 billion hyperlinks connecting them!  For comparison, the Facebook friendship graph is reported to include 1.26 billion nodes (users) and 150 billion links (friendships).

The Web Data Commons Hyperlink Graph is thus the largest real-world network dataset available outside of the big companies themselves.  What this means is, it is not only the big data companies anymore that can do actual “Big Data” research – now any research group can too.  Therefore, we should be seeing studies using this dataset coming out in the next years.  I can’t wait to find out what will be done with this dataset.

For another comparison, our own Koblenz Network Collection, which collects network datasets of many different types and sizes, maxes out at 1.9 billion edges, with the Twitter follower network.  But since the dataset is open, we will now add it to it of course.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s