On the Lavalette’s Nonlinear Zipf’s Law
Version February 2002
by Acad. Prof. Dr. Ioan-Iovitz Popescu
The ranking law, established by the French biophysicist Daniel Lavalette (1996), states that the impact factor q of a set of N scientific journals, ordered by descending ranking number n, obeys the general relationship
q(n) = c [Nn/(N-n+1)]-b
with only two fitting parameters, namely the exponent b and the scaling constant c = q(1). When plotted on a double logarithmic log (q), log (n) diagram, the corresponding line deviates from straightness in a smooth, characteristic fashion (Fig. 1), hence the alternate names we propose as curved Zipf line and nonlinear Zipf’s law. Indeed, holding a better promise for various applications and theoretical investigations, this law is barely more complex than the well known rank-frequency Zipf’s law
q(n) = c n-b
with the exponent b = 1 in the original expression of George Kingsley Zipf (1949). Two adjustable corrections have been subsequently introduced by Benoit Mandelbrot (1954), namely a slight correction added to the power 1 and a number d added to the rank n, the modified law becoming
q(n)µ
c (n + d)-b
with three
fitting parameters, b, c, and d. Thus, whereas the
role of independent variable in the Zipf’s and Mandelbrot’s laws is played
by the descending ranking number, n, in the Lavalette’s law this
is accomplished by the ratio n/(N-n+1) between the descending and
the ascending ranking numbers. Generally, q(n) can be any quantity
used in ordering a set of occurrences, such as the frequency of natural
or randomly generated words, size of cities and other settlements, income
size, citation frequency and impact factors, frequency of access to web
sites, size of oil and other mineral deposits, earthquake magnitude, galactic
intensity, and so on.
The interest in the Zipf’s law deviations, explanations and applications
has recently been rejuvenated with partial success by D.M.W. Powers
(1998), J. Laherrere (1996, 1998), S. Redner (1998), C. Tsallis (2000)
and others. Thus, the main results of the studies addressed to citations
of publications (Redner), or to citations of authors (Laherrere) were that
the stretched exponential
fits reasonably
well the data for relatively small n-values, while the needed asymptotic
behavior is the inverse power law q(n) = c n-b
with
,
which
by no means can be provided by the exponential. Considerably better results
have recently been obtained (Tsallis and de Albuquerque) with a
single
function of the power-law type, namely
p(n) = log(1+1/n)
for any first (leading, initial) digit n from 1 to 9. Surprisingly, for instance, the probability that the first digit be the number “1” is p(1) = 0.301, and not the value 1/9 = 0.111, as expected if all digits were equally likely. Next most popular digit follows “2” with p(2) = 0.176 an so on up to the least probable digit “9” with p(9) = 0.046. Here is a table of the initial digits percentages as predicted by Benford’s law and as collected from a list of the first 500 Fibonacci numbers (or, nowadays, produced with an Internet available Fibonacci Calculator):
|
|
|||||||||
|
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
|
|
30.2
%
|
17.6
%
|
12.6
%
|
9.4
%
|
8.0
%
|
6.6
%
|
5.8
%
|
5,4
%
|
4.4
%
|
|
|
30.1
%
|
17.6
%
|
12.5
%
|
9.7
%
|
7.9
%
|
6.7
%
|
5.8
%
|
5.1
%
|
4.6
%
|
The analysis has been extended to the second digit and so on up to the general case of the nth digit following the first non-zero digit. As expected, the corresponding probability quickly approaches 1/10 as we proceed to less significant digits. Generally, digits of all the numbers making up the Fibonacci sequence tend to conform to Benford's law.Forintroduction in this matter see Benford's Law in Probability and Statistics athttp://www.mathpages.com/home/iprobabi.htm and McMinn, Benford.s Law & the Fibonacci Numbers (2000).
Fig. 8 and the attached excel file on significant-digit percentage (click here) deduced from the occurrences in the first 500 Fibonacci numbers are intended to illustrate the extraordinary connection between the Benford’s law and Fibonacci numbers, and to demonstrate the excellent agreement between predicted and actual significant digit percentages also beyond the first digit “9”. Some details are reproduced in the following table below from the most common first two digits “10”, up to the least likely “99”.
|
|
|
|||||||||
|
|
10
|
20
|
30
|
40
|
50
|
60
|
70
|
80
|
90
|
|
|
|
4 %
|
2 %
|
1.4
%
|
1.0
%
|
0.8
%
|
0.8
%
|
0.8
%
|
0.4
%
|
0.4
%
|
11.6
%
|
|
|
19
|
29
|
39
|
49
|
59
|
69
|
79
|
89
|
99
|
|
|
|
2.2
%
|
1.4
%
|
1.2
%
|
0.8
%
|
0.6
%
|
0.6
%
|
0.4
%
|
0.6
%
|
0.4
%
|
8.2
%
|
From this table one may deduce, for example, that the second digit is most likely to be “0” (11.6 %) and “9” the least (8.2 %). Indeed, the percentage for the second digit being a “0” is equal to the sum of percentages of the first two digits being “10”, “20”, …, “90”, resulting by addition in an amount of 11.6 %; a similar reasoning for the second digit being a “9” results in an amount of 8.2 %. But the most significant new fact consists in the suitability of a single Lavalette function to match quite well the whole range of frequencies (probabilities, percentages) predicted by the Benford’s law and resulted from the Fibonacci number statistics, as it is shown in Fig. 8 for the first and second digits. Consequently and surprisingly, three apparently different branches can be inter linked – Fibonacci sequence and golden mean,Benford’s law and leading digit phenomena, and finally, the Lavalette’s nonlinear Zipf’s law.
![]() |
|
Fig.1.
Illustrating the curved Zipf line shape by the normalized avalette ranking
function q/c = [Nn/(Nn+1)] in terms of the descending anking number
n
for a (negative) slope b = 1/2 and various total numbers
N of
items of the considered set.
|
![]() |
|
Fig.2.
A slightly curved rank-frequency Zipf plot of 917 distinct word occurrences
in the text of USA Constitution. The
corresponding excel list of words is attached (click here)
|
![]() |
|
Fig.
3. Illustrating essential shapes of competing ranking distributions
|
![]() |
|
Fig.4.
Illustrating the Lavalette ranking law and the Lavalette fitting for 4
random subsets (of journals with title initial letter A, B, C, or D) excerpted
from the present collection (7557 journals) and ranked by average journal
impact factors (JIF).
The corresponding excel list is attached (click here) |
![]() |
|
Fig.5.
Illustrating the Lavalette ranking law for 26 random subsets (of journals
with title initial letter belonging to various letters of the alphabet)
excerpted from the present collection (7557 journals) and ranked by average
journal impact factors (JIF). The Lavalette shaping is obvious and the
fitting pleasure is left to the reader, together with the
attached excel list (click here)
|
![]() |
|
Fig.6.
Illustrating the Lavalette ranking law for 12 natural subsets (of journals
belonging to various scientific fields) excerpted from the present collection
(7557 journals) and ranked by average journal impact factors (JIF). The
Lavalette shaping is obvious and the fitting pleasure is left to the reader,
together with the attached excel list
(click here)
|
![]() |
|
Fig.
7a. Ranking Roumanian earthquake moment magnitudes in Mw = (2/3) log Mo
- 10.7 [where Mo is the scalar moment of the best double couple in dyne-cm,
according to the Hanks and Kanamori formula (1979), http://neic.usgs.gov/neis/phase_data/mag_formulas.html].
Data obtained by courtesy of the Roumanian National Seismic Network, The
National Institute of Earth Physics (NIEP), Bucharest-Magurele, Roumania.
The corresponding excel list of the
Roumanian earthquakes is attached (click here)
|
![]() |
|
Fig.
7b. An interchange of X-Y axes in the preceding figure reveals the earthquake
moment magnitude (Mw) as a natural rank and the (cumulative) ranking number
as a frequency of earthquakes stronger than Mw. In contrast to Fig. 7a,
the data exhibit in this case a Lavalette shaping.
|
![]() |
|
Fig.
8. Illustrating the Lavalette single function fitting of the significant-digit
percentages and, consequently, of the corresponding Benford distribution.
The excel list of the data is
attached (click here)
|