I spent the last three years writing my PhD thesis. Since my thesis is about networks, I have used network datasets for research. Experimental results are more significant when done using multiple datasets and therefore I started collected network datasets. At first, I collected bipartite rating graphs such as MovieLens, Epinions and Netflix. Later I added social networks such as the Slashdot Zoo, Facebook and Flickr, hyperlink networks from Google, Wikipedia and TREC. Before my thesis was done, I noticed that I had collected over a hundred network datasets!
My PhD thesis is about the spectral evolution model, but such a large collection of network datasets is much too interesting to use it just for that. Therefore, I decided to perform some statistics with the collection of network datasets, trying to reproduce well-known results about networks. One such result is that the density of networks is increasing over time.
Let me explain that: A network such as Facebook consist of nodes and links. In the case of Facebook, the nodes are users and the links are friendship links. We cannot know the complete Facebook network, but a part of it is available for research. What’s great about this dataset: Not only do we know the friends of each users, we also know the date at which each friendship link was added. This allows us to look at the changes in the network over time. For instance, one important statistic in a network is the average number of links connected to each node. This number is called the network density. A well known result about networks that change over time is that their density is increasing. In other words, the average number of friends people have on Facebook gets larger over time. This is not a trivial property: When a new user account is created, the average number of friends goes down, because new users don’t have any friends.
In other words, the larger the number of nodes in a network, the larger the density. This result was shown to be valid for individual networks, for instance in Jure Leskovec‘s dissertation. Given a large collection of datasets, we may now ask the related question: Is the number of nodes in a network correlated to the density, when comparing different networks? This question can only be answered with a large collection of datasets. Using the network datasets I have available, the result looks like this:
The result is: Network size and density do not correlate! In other words, the growing density is a feature of individual networks, not of networks as a whole.
This kind of result is kind of straightforward taken individually, but is only made possible when using a large collection of datasets! Therefore, me and my colleagues at the Institute for Web Science and Technologies (WeST) at the University of Koblenz have decided to make the collection of datasets available in a form that makes it easy to do these kinds of studies. We will call the collection of datasets KONECT, short for Koblenz Network Collection. You can look at a poster of KONECT that we presented at the Web Science Conference that WeST organized in Koblenz a few weeks ago. We will also announce the KONECT web site soon—So stay tuned to this blog for more information!
NEW: All network datasets are available at KONECT – The Koblenz Network Collection
Here’s my PhD thesis:
On the Spectral Evolution of Large Networks, Jérôme Kunegis, PhD thesis, University of Koblenz–Landau, 2011.
Edit: removed preliminary information.