The Pareto Principle

8.9% of bands make up 91.1% of plays on Last.fm.

9.4% of musical genres represent 90.6% of songs in Wikipedia.

9.5% of words make up 90.5% of Reuters news by word count.

14.4% of record labels represent 85.6% of all bands.

14.6% of user groups account for 75.4% of group memberships on Flickr.

16.2% of football players account for 83.8% positions in sports teams.

16.2% of songs make up 83.8% of plays on Last.fm.

17.4% of movies receive 82.6% of ratings on MovieLens.

17.7% of profiles receive 82.3% of ratings on Czech dating site Libimseti.cz.

18.8% of groups make up 81.2% of all group memberships on YouTube.

19.8% of all categories make up 80.2% of all category inclusions on the English Wikipedia.

20.3% of all users receive 79.7% of “friend” and “foe” links on Slashdot.

21.3% of Facebook users receive 78.7% of wall posts.

22.9% of users make up 77.1% of all “@” mentions on Twitter.

23.1% of projects receive 76.9% of project memberships on Github.

25.0% of Facebook users cover 75.0% of all friendship links.

26.2% of papers receive 73.8% of citations.

26.9% of persons send 73.1% of emails.

27.3% of users receive 72.7% of replies on Digg.

27.4% of actors account for 72.6% of movie credits.

29.9% of all patents receive 70.1% of all patent citations among US patents.

30.4% of all websites receive 69.6% of all hyperlinks.

Statements of this form can be read off a Lorenz curve.  A Lorenz curve plots the share of “things” covered by a given share of other “things”.  In the case of networks, the Lorenz curve shows the amount of edges covered by nodes.  For instance, the following graph shows the number of “plays” received by songs on Last.fm:

The value cited in the examples above can be read off this plot by noting where the Lorenz curve in blue crosses the diagonal, marked by the big dot and the letter P on the plot.  This gives us the phrase from above that 8.9% of bands make up 91.1% of plays on Last.fm.  When the distribution is totally uniform, the Lorenz curve follows the diagonal. The further down the Lorenz curve goes, the less “fair” the distribution of edges. Alternatively to the value P, the area between the Lorenz curve and the other diagonal multiplied by two is the Gini coefficient, marked G on the plot. The Gini coefficient is 0 when the distribution is completely uniform (i.e., “fair”), and 1 when all edge belong to one node. (which is impossible, since edges always attach to two nodes.)

All data was generated with KONECT – The Koblenz Network Collection.

PS the Pareto Principle in the narrow sense states that 20% of people own 80% of things.

About these ads

4 thoughts on “The Pareto Principle

  1. Pingback: Examples of the 80-20 Rule | networkscience

  2. Pingback: Question of the Day: How the hell do we reach more people?

  3. Pingback: How to Visualize Skewed Distributions | networkscience

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s