Last week in the party following my PhD thesis defense my colleagues gave me as a present the following T-shirt:
‘The trick is… Everything is a NETWORK!’
My colleagues from the WeST Institute had a brilliant idea here: Summarize my PhD in seven words! Let me explain what it means:
In my PhD thesis, I explain how to predict links in networks. For instance, the method described in my thesis can be used to implement the ‘You May Know These People’ feature on Facebook, by predicting who a Facebook user will add next. However, the method is more general. As I show in the thesis, it can be used to predict trust/distrust, to predict whether someone likes or doesn’t like a movie or any other item, to find documents related to a given document, to find which authors of scientific publications work on similar topics, and even to implement a search engine. Let me explain:
In a social network, people are connected by friendship links. If we can predict which friendship links will be added next, we can implement a social recommender system.
In a trust network, people are connected by trust and distrust links. We can predict whether a person will trust another and thus implement trust prediction (a.k.a. link sign prediction).
In a rating network, users are connected to items such as movies via rating links (which always connect a user with an item), we can predict the weight of new edges, and thus implement rating prediction (a.k.a. collaborative filtering).
In a hyperlink network, we have documents such as web pages connected by hyperlinks. By predicting missing links in this network, we can find related web pages, even if they have no connection between them.
In a bibliographic network, we have scientific publications (or patents, etc.) citing other scientific publications. As a result, we can find related literature by predicting links in this network.
In a collaborative network, we have scientists (or any other persons) connected when they collaborate, for instance on scientific papers. By predicting links in such a network we can predict or recommend future collaborations.
The bag of words model used in information retrieval leads to a bipartite document-word network, in which the problem of link prediction is used to find documents related to a word, i.e. a search engine.
Many other types of networks can be found in practically every area: In chemistry, molecules are modeled as networks of connected atoms. In quantum physics, networks consisting of particles and their interactions are aggregated into Feynman diagrams as a way to compute probabilities of reactions between them. In biology, the relations between genes and the interactions between proteins and other metabolites are modeled as a metabolic network in which a path of incident edges is called a biochemical pathway. Also in biology, a series of predator-prey relationships is called a food chain, and multiple food chains form a food web. In neuroscience, a population of interconnected neurons is modeled as a neural network. In epidemiology, the percolation of diseases from one organism to the other is modeled as a network. In sociology, various types of relationships between persons, groups and even countries are studied using network analysis. In linguistics, the relations between words make up a lexical network. In economics, the relations between actors are modeled as networks. Transportation and other infrastructural networks are used in operations research. For instance, the worldwide network of airports and ﬂights connecting them forms a network that combines geographic and geopolitical aspects. In technology, a sensor network consists of individual autonomous sensors connected wirelessly to other nearby sensors.
The list is so long that it almost gets boring to read! In fact, each of these examples can be used to formulate a specific link prediction problem, which consists of predicting links in the corresponding network. This is true of many problems in data mining and machine learning. I would even go so far a saying that about half of all papers presented at conferences about data mining, information retrieval, web science and related areas actually solve a link prediction problem of some form. To conclude, I can only cite my brilliant new T-shirt again:
Everything is a NETWORK!
My thesis is available here: PhD thesis of Jérôme Kunegis (5.9 MB, 159 pages)
- KONECT – The Koblenz Network Collection, a collection of 150+ network datasets from all kinds of different areas