I
5. Conclusions
The definition of three-dimensional ants (3D ants) given here results in ants that build three-dimensional highways. This definition shows that the rules of Langton's ant are elementary and in some cases universally applicable. An interesting result is that there is more than one style of highway. First, classification systems were presented. The researches stated here are only a small part of the infinite world of 3D ants. But the search space for ants grows exponentially with the rule length and big three-dimensional worlds are hard to simulate (with respect to memory, computing time, and costly visualization). Some additional questions come up. Are there 3D ants that build arbitrarily complex highways? Does every 3D ant build a highway? Is there an ant and a number of steps for every arbitrary shape? The consumption of which resource (time or space) grows faster by increasing rule length? A simple software tool to simulate 3D ants made this work possible and will be made available by the author to those who are interested.
I
References
[1] C. G. Langton, "Studying Artificial Life with Cellular Automata," Physica D, 22 (1986) 120-149. [2] L. A. Bunimovich and S. E. Troubetzkoy, "Rotators, Periodicity and Absence of Diffusion in Cyclic Cellular Automata," Journal of Statistical Physics, 74 (1994) 1-14. [3]
J. Propp, "Further Ant-ics," Mathematical Intelligencer, 16 (1994) 37-42.
[4] D. Gale, J. Propp, S. Sutherland, and S. Troubetzkoy, "Further Travels with My Ant," The Mathematical Intelligencer, 17(3) (1995) 48-56. [5] O. Beuret and M. Tomassini, "Behavior of Multiple Generalized Langton's Ants," Proceedings ofthe Artificial Life V Conference, edited by C. Langton and K. Shimohara (MIT Press, 1998). [6] L. A. Bunimovich, "Many Dimensional Lorentz Cellular Automata and Turing Machines," International Journal of Bifurcation and Chaos, 6 (1996) 1127-1135.
Complex Systems, 14 (2003) 263-268
Read Before You Cite! M. V. Simkin V. P. Roychowdhury
Department of Electrical Engineering, University of California, Los Angeles, CA 90095-1594
We report a method for estimating what percentage of people who cited a paper had actually read it. The method is based on a stochastic modeling of the citation process that explains empirical studies of misprint distributions in citations (which we show follows a Zipf law). Our estimate is that only about 20% of citers read the original.
Many psychological tests have the so-called "lie-scale." A small but sufficient number of questions that admit only one true answer, such as: "Do you always reply to letters immediately after reading them?" are inserted among others that are central to the particular test. A wrong reply for such a question adds a point on the lie-scale, and when the lie-score is high, the over-all test results are discarded as unreliable. Perhaps, for a scientist the best candidate for such a lie-scale is the question: "Do you read all of the papers that you cite?" Comparative studies of the popularity of scientific papers has been a subject of much recent interest [1-8], but the scope has been limited to citation distribution analysis. We have discovered a method of estimating what percentage of people who cited the paper had actually read it. Remarkably, this can be achieved without any testing of the scientists, but solely on the basis of the information available in the lSI citation database. Freud [9] had discovered that the application of his technique of psychoanalysis to slips in speech and writing could reveal a lot of hidden information about human psychology. Similarly, we find that the application of statistical analysis to misprints in scientific citations can give an insight into the process of scientific writing. As in the freudian case, the truth revealed is.embarrassing. For example, an interesting statistic revealed in our study is that a lot of misprints are identical. Consider, for example, a four-digit page number with one digit misprinted. There can be 104 such misprints. The probability of repeating someone else's misprint accidentally is 10-4 • There should be almost no repeat misprints by coincidence. One concludes that repeat misprints are due to copying someone else's reference, without reading the paper in question. In principle, one can argue that an author might copy a citation from an unreliable reference list, but still read the paper. A modest Complex Systems, 14 (2003) 269-274; © 2003 Complex Systems Publications, Inc.
270
M. V. Simkin and V. P. Roychowdhury
•
Read Before You Cite!
271
100
repeat number
~
~ 0.1 o
li. 0.01
0.001 100
0.0001
rank
Figure 1. Rank-frequency distribution of misprints referencing a paper, which
had acquired 4300 citations. There are 196 misprints total, out of which 45 are distinct. The most popular misprint propagated 78 times. A good fit to Zipf's law is evident.
Fi~ure.2. Same data as in Figure 1, but in the number-frequency representation. Mlspnnts follow a power-law distribution with exponent close to 2.
~ page number propagated 78 times. Figure 2 shows the same data, but a number-frequency format. ~s a preliminary attempt, one can estimate an upper bound on the ratio of the number of readers to the number of citers R as the ratio of the number of distinct misprints D to the total number of misprints T. Cle~rly, .am?ng T citers, T - D copied, because they repeated someone else s mlspnnt. For the D others, with the information at hand we ~ave no evi~e~ce that they did not read, so according to the presu~ed mnocent prmclple, we ~ssume that they did. Then in our sample, we have D readers and T clters, which lead to: 10
reflection would convince one that this is relatively rare, and cannot apply to the majority. Surely, in the pre-internet era it took almost equal effort to copy a reference as to type in one's own based on the original, thus providing little incentive to copy if someone has indeed read, or at the very least has procured access to the original. Moreover, if someone accesses the original by tracing it from the reference list of a paper with a misprint, then with a high likelihood, the misprint has been identified and will not be propagated. In the past decade with the advent of the Internet, the ease with which would-be nonreaders can copy from unreliable sources; as well as would-be readers that can access the original, has become equally convenient. But there is no increased incentive for those who read the original to also make verbatim copies, especially from unreliable resources. 1 In the rest of this paper, giving the benefit of doubt to potential nonreaders, we adopt a much more generous view of a "reader" of a cited paper as someone who at the very least consulted a trusted source (e.g., the original paper or heavilyused and authenticated databases) in putting together the citation list. As misprints in citations are not too frequent, only celebrated papers provide enough statistics to work with. Figure 1 shows a distribution of misprints in citations to one such paper [10] in the rank-frequency representation, introduced by Zipf [11]. The most popular misprint in 1According to many researchers the Internet may end up even aggravating the copying problem: more users are copying second-hand material without verifying or referring to the original sources.
D T'
R:::::-
(1)
Substituting D = 45 and T = 196 in equation (1) we obtain that R ~ .0.23. :rhi~ estimate would be correct if the people'who introduced ongmal mlspnnts had always read the original paper. However, given th~ lo~ value o~ the upper bound on R, it is obvious that many original mlsprmts were mtroduced while copying references. Therefore a more careful analysis is neccessary. We need a model to accomplish i~. O~r model ~or misprint~ propagation, which was stimulated by Simon s explanation of the Zlpf law [12] and the idea of link redirection by Krapivsky and Redner [4] is as follows. Each new citer finds the referen~~ to the original in any of the papers that already cite it. With p~ob.ablhty R he reads the original. With probability 1 - R he copies the clt~tlon from the paper he found it in. In any case, with probability M he mtroduces a new misprint. Complex Systems, 14 (2003) 269-274
•
M. V. Simkin and V. P. Roychowdhury
272
The evolution of the misprint distribution (here N K denotes the number of misprints that propagated K times, and N is the total number of citations) is described by the following rate equations: dN 1 = M _ (1 - R) x (1 - M) x NN1 , dN (K-l)xNK -KxNK dN _K=(l-R)x(l-M)x -1 dN N .
(K>l).
(2)
As the exponent of the number-frequency distribution 'Y is related to the exponent of the rank-frequency distribution a by a relation 'Y = 1 +(1/a), equation (3) implies that: (4)
The rate equation for the total number of misprints is: dT T dN = M + (1 - R) x (1 - M) x N'
(6)
(7)
From equations (6) and (7) we obtain: D N-T R=TxN_D'
_ dl
M - (~) x (R + M - M x R) -
N n .
(9)
As long as M is small it is natural to assume that the first citation was correct. Then the initial condition is N = 1; T = O. Equation (9) can be solved to get: T=Nx
M
R+M-MxR
x
(
1)
1-~~~~
NR+M-MxR .
(10)
This should be solved numerically for R. For our guinea pig equa' tion (10) gives R = 0.17. Just as a cautionary note, equation (10) can be rewritten as:
T D
=!x X(l- ~). NX '
x=R+M-MxR.
(11)
. aX - l x...o x
lna=hm--.
The expectation value for the number of distinct misprints is obviously D=NxM.
and D = N x M x R. As a consequence, equation (1) becomes exact (in . terms of expectation values, of course). The preceding analysis assumes that the stationary state had been reached. Is this reasonable? Equation (5) can be rewritten as:
The definition of the natural logarithm is:
(5)
The stationary solution of equation (5) is: M T=Nx R+M-MR'
273
d(~)
These equations can be easily solved using methods developed in [4] to get: 1 (3) 'Y = 1 + (1 _ R) x (1 - M)'
a = (1 - R) x (1 - M).
Read Before You Cite!
(8)
Substituting D= 45, T = 96, and N = 4300 in equation (8),.we obt~in R ~ 0.22, which is very close to the initial estimate obtamed usmg equation (1). This low value of R is consistent with the "Principle of Least Effort" [11]. One can ask: Why did we not choose to extract R using equations (3) or (4)? This is because a and 'Yare not very sensitive to R when it is small. In contrast, T scales as 1/R. We can slightly modify our model and assume that original misprints are only introduced when the reference is derived from the original paper, while those who copy references do not introduce new misprints (e.g., they cut-and-paste). In this case one can show that T = N x M
Comparing this with equation (11) we see that when R is small (M is obviously always small):
T D ~ InN.
(12)
This means that a naive analysis using equations (1) or (8) can lead to an erroneous belief that more cited papers are less read. One can augment our results with a closer scrutiny of the data. In order to make sure that misprints have not been introduced by the lSI as it sometimes happens [13], we explicitly verified a dozen misprinted citations in the original articles. All of them were exactly as in the lSI database. There are also occasional repeat identical misprints in papers, which share individuals in their author lists. Such events constitute a minority of repeat misprints. It is not obvious what to do with such cases when the author lists are not identical: Should the set of citations be counted as a single occurrence (under the premise that the common co-author is the only source of the misprint); or as multiple repetitions? However, even if we count all such repetitions as only a single misprint occurrence, then the number of citation-copiers (i.e., T - D) shall drop from 151 to 112, bringing the upper bound for R (equation (1)) from 23% up to 29%. However a more detailed analysis via our model Complex Systems, 14 (2003) 269-274
M. V. Simkin and V. P. Roychowdhury
274
[14] will bring down the estimate closer to 20%, keeping the original
Creating Large Life Forms with Interactive Life
conclusions unaltered. . d' We conclude that misprints in scientific citations .shou~d not be IScarded as a mere happenstance, but, similar to FreudIan slips, analyzed.
William H. Paulsen
I
Department of Mathematics and Statistics, Arkansas State University, State University, AR 72467 Acknowledgments
We are grateful to J. M. Kosterlitz, A. V. Melechko, N. Sarshar, H. Muir, and many others for correspondence.
I
References
This paper demonstrates how very complicated Life forms can easily be created using the interactive Life program introduced by James Gilbert [1]. By having control over just a single cell, called the intelligent cell, a glider gun can be created in under 250 generations. Note that the standard rules of Life apply to the intelligent cell as well as the other cells.
[1] Z. K. Silagadze, "Citations and the Zipf-Mandelbr~t Law," Complex Systems, 11 (1997) 487-499; http://arxiv.orglabslphyslcsl9901035. [2] S. Redner, European Physics Journal B, 4 (1998) 131-134; http://arxiv.orglabs/cond-matl9804163. [3] C. Tsallis, and M. P. de Albuquerque, European Physics Journal B, 13 (2000) 777-780; http://arxiv.orglabslcond-matl9903433. [4] P. L. Krapivsky and S. Redner, Physical Review E, 63 (2001) Art. No. 066123; http://arxiv.orglabslcond-matl0011094. [5] H. Jeong, Z. Neda, and A.-L. Barabasi, http://arxiv.orglabslcond-matl0104131. [6] A. Vazquez, http://arxiv.orglabs/cond-matl0105031. [7] H. M. Gupta, J. R. Campanha, and B. A. Ferrari, http://arxiv.orglabslcond-matlOl12049. [8] S. Lehmann, B. Lautrup, and A. D. Jackson, http://arxiv.orglabs/physicsl0211010. [9] S. Freud, Zur Psychopathologie des Alltagslebens (Internationaler psychoanalytischer Verlag, Leipzig, 1920). [10] Our guinea pig is the Kosterlitz-Thouless paper (J. M. Kosterli~z a~d D. J. Thouless, Journal of Physics C,6 (1973) 1181-120.3): The mlsprmt distribution for a dozen other studied papers look very SImilar. [11] G. K. Zipf, Human Behavior and the Principle of Lea~t Effort: An Introduction to Human Ecology, (Addison-Wesley, Cambndge, MA, 1949). [12] H. A. Simon, Models of Man (Wiley, New York, 1957). [13] A. Smith, New Library World, 84 (1983) 198. [14] M. V. Simkin and V. P. Roychowdhury, to be published.
Ever since John Conway introduced the game of Life in 1970 [2], amateur and professional mathematicians have been obsessed by finding new complicated structures, and finding new applications for the seemingly chaotic behavior of Life forms. Perhaps its addictive nature stems from the simplistic rules: each cell on a square lattice is either on (alive) or off (dead). The Moore neighborhood of a cell is the eight surrounding cells (counting diagonals) [3]. If a dead cell has exactly three neighboring cells which are alive, then the cell becomes alive in the next generation. On the other hand, a living cell must have either two or three living neighbors to remain alive, otherwise the cell will die in the next generation. Figure 1 shows four generations of the Life form known as the "glider." Because every four generations, this glider moves one space diagonally, we say that this glider moves at 1/4 the speed of light. (In the Life model, light is said to travel at 1 square per generation.) The first open problem that Conway asked was whether any Life form could be proven to have unbounded growth. This was solved by Bill Gosper's discovery of the glider gun, a formation that spews a glider every 30 generations [4]. Other glider guns have since been discovered, but none have been as efficient as the original, now referred to as the "p30 glider gun. " Using glider guns, one in fact can engineer even more complex Life forms. Any computer circuit, and even a Turing machine, can be con-
91 II- II- 11-- 11-••
• •
Figure 1.
••
•
• •
Four generations of a glider.
Complex Systems, 14 (2003) 275-283; © 2003 Complex Systems Publications, Inc.