Showing posts with label Rank vs. Frequency Rule. Show all posts
Showing posts with label Rank vs. Frequency Rule. Show all posts

Oct 2, 2015

Zipf's Law

During 1949, the American linguist George Zipf noticed something odd about how often people use words in a given language. He found that a small number of words are used all the time, while the vast majority are used rarely. He ranked the words in order of popularity and a striking pattern emerged. The number one ranked word was always used twice as often as the second rank word, and three times as often as the third rank, and on, into the thousands with the same frequency.

In American English text, "the" is the most frequently occurring word, and accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). The second-place word "of" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "and" (28,852). Only 135 vocabulary items are needed to account for half the most common words used. The Zipf principle also holds true for other languages.

He did not claim to have originated it. The French stenographer Jean-Baptiste Estoup and German physicist Felix Auerbach called this a rank vs. frequency rule, and found that it could also be used to describe corporation sizes, income rankings, ranks of number of people watching the same TV channel, popularity of opening chess moves, etc.

Later dubbed Zipf's law, the rank vs. frequency rule also works if you apply it to the sizes of cities. The city with the largest population in any country is generally twice as large as the next-biggest, and so on. Zipf's law for cities has held true for every country in the world, for the past century.

It almost streamlines the Pareto Principle, which describes the 80/20 rule, such as 20% of the actions represent 80% of the consequences. Twenty percent of the customers represent eighty percent of the profits, etc. I presume 80% of you enjoy most of this stuff and 20% tolerate it, with hopes of enjoying some part.