N ote regarding 2007 republication: Source and binary code referenced in note 34 are no longer at that URL. The same assets are now hosted at http://www.selfmummy.com/mss2dna
Ad am Breind el Departm ent of Classics, Brow n University May 1998
The Application of a D iscrete-Character Parsimony Phylogeny-Inference Algorithm to Classical Text Stemmata
The purpose of this paper is to present tw o interd isciplinary observations; a new technique for stem m atic analysis; and prelim inary results from an application of this technique. The first interd isciplinary observation is that the m ethod s and purpose of stem m atics overlaps substantially w ith the m ethod s and purpose of the biological subd iscipline of clad istic analysis. While this fact is rarely em phasized or exploited , it is not a new d iscovery, and its history w ill be d iscu ssed . The second interd isciplinary observation is that com puter softw are w hich has been d eveloped for biolog ists in ord er to solve problem s in clad istic analysis now offers us the possibility of ad vances in the construction of textual stem m ata, through a non -trad itional use of trad itional m anuscript collations. The analytic technique contained herein – w hich d oes not appear ever to have been attem pted heretofore – is the application of an existing clad istic analysis softw are package to the stem m atic analysis of a m anuscript collation. The use of this technique to analyze part of the Sallu stian corpus is thorough ly d ocum ented in this stud y. Prelim inary results ind icate that the technique prod uces a stem m a nearly id entical to
Breind el 2
that published by L.D. Reynold s, the ed itor of the Oxford text. H ence, this m ethod appears to offer an effective new approach to evaluating the relationships am ong extant versions of a text. The interplay betw een the d isciplines of biological system atics, genetics, and textual criticism , w hich m akes this paper possible, has a som ew hat Byzantine history spanning the last thirty years. I ask the read er to consid er w ith charity m y exposition of this history. For it seem s that the relative uniqueness of this paper d em and s an unusually large am ount of background inform ation. 1
Background
In 1968, John G. Griffith published a paper entitled “A Taxonom ic Stud y of the Manuscript Trad ition of Juvenal.” 2 In this stu d y, Griffith applied m ethod s of num erical taxonom y to the classification of Juvenal m anuscripts. The taxonom ic m ethod s, as he explains in a sim ilar article the follow ing year, 3 he had in turn learned from biologist Robert Sokal‟s 1966 Scientific A merican article on that topic. Griffith d escribes the biological ad vances w hich he exploits in analyzing the texts of Juvenal: 1
I have fou nd it necessary in the cou rse of this p ap er to refer to som e technical asp ects of system atics and genetics. I have attem p ted to restrict to an elem entary level the fam iliarity requ ired w ith these d iscip lines, in ord er to m ake this w ork accessible to a broad au d ience. N onetheless, read ers seeking an introd u ctory exp osition m ay find ap p rop riate sections of the follow ing textbooks u sefu l: Gam blin, Lind a and Gail Vines, ed s. (1991) The Evolution of Life, Oxford , chap ter 3. Maxson, Lind a R. and Charles H . Dau gherty (1992) Genetics: A Human Perspective, Du bu qu e, Iow a, chap ters 8, 10. Minkoff, Eli C. (1983) Evolutionary Biology, Read ing, Mass., chap ter 22. 2 Griffith, John G. (1968) “A Taxonom ic Stu d y of the Manu scrip t Trad ition of Ju venal” M useum Helveticum 25:101-38. 3 Griffith, John G. (1969) “N u m erical Taxonom y and Som e Prim ary Manu scrip ts of the Gosp els,” Journal of Theological Studies 20:389-406.
Breind el 3
Scientist have long been aw are of the lim itations of the trad itional m ethod s of classifying specim ens; biologists in particular have laboured und er this hand icap. Within the last 10-15 years consid erable ad vances have been m ad e, largely because techniques d eveloped for com puter use have enabled specialists in this activity, w ho style them selves num erical taxonom ists, to sift w ith speed and precision large m asses of unprom isingly heterogeneous m aterial, and thereby to isolate groups or „taxa‟ of related specim ens, on the basis of w hich further inquiry m ay be cond ucted . ... 4 Thus, Griffith id entifies a requirem ent w hich textual criticism has in com m on w ith taxonom y: in both d isciplines, objects m u st be grouped based on sm all num bers of d istinctions am ong vast am ounts of sim ilarity. The num eric taxonom y m ethod s appear, he says, to offer new quantitative approaches applicable to both problem s. H e then expresses the hope that w e m ight find associations betw een specim ens by evaluating large am ounts of d ata w ith m achine assistance. In light of the existing resources, though, he rem arks that “for a textual critic operating w ith only a few thousand lines of text it is sim ply not w orth the trouble of program m ing the d ata for m achineprocessing...” 5 The lim itations to Griffith‟s pioneering approach w ere unfortunately several. H is proced ure w as, first, extraord inarily laborious: for the fourteen Gospel m anuscripts analyzed in his article of 1969, up to fifty-six m anual record ing acts w ere required for every variant am ong one or m ore of the m anuscripts. Thus he w as constrained to loo k at only sm all sam ples of the d ata. Moreover, if he had had access to m ore d ata, he m ay likely have lacked access to the technology to evaluate it.
4 5
Griffith, op . cit. 1968, p p . 113-14. ibid .
Breind el 4
Griffith‟s proced ure (and , in all fairness, the biological m ethod s w ith he w orked ) had a m ore troublesom e lim itation in that they resulted only in associations of objects. Griffith could assert the d istribution of manu scripts into various sub -groups w ith statistically-argued accuracy, but the m ere grouping of the m anuscripts d oes not seem to have accom plished m uch. H is m ethod s said nothing about the genealogical relationships of the m anuscripts. For exam ple, if manu scripts A, B, and C are found to be in a single taxon, w e have only form alized their external sim ilarity. As useful as su ch form alization m ight be, little is ind icated about the genealogical relationships likely to inhere betw een the m anuscripts. Thus, Griffith su cceed ed in bringing num erical taxonom y into the arena of textual criticism , but the biological approach upon w hich he d epend ed w as not am bitious enough to d escribe the relationships am ong the specim ens and so his textual techniques appear to have fallen into d esuetud e. In 1973, Martin West published a short w ork on textual criticism , Textual Criticism and Editorial Technique.6 In this w ork, West explains that com puters m ight theoretically hold som e prom ise for stem m a construction, because, und er the best possible circum stances, build ing a stem m a d em and s only sim p le logic. Such a stem m a w ould naturally be an ad vance over Griffith‟s taxonom ic m an uscript associations. West is, how ever, skeptical about the id ea and hold s out som e theoretical reservations: If provid ed w ith suitable prepared transcriptions of the m anuscripts, purged of coincid ental errors, a com puter could d raw up a clum sy and unselective critical app aratus; and it could in principle – w here there w as no contam ination! – w ork out an „unoriented‟ stem m a. That m eans ... that it could w ork out a schem e sim ply by com paring the variants, w ithout 6
West, Martin L. (1973) Textual Criticism and Editorial Technique, Stu ttgart.
Breind el 5
regard to w hether they w ere right or w rong; but this schem e w ould be capable of su spension from any point [i.e., the schem e could not d istinguish the subarchetypes] ... The correct orientation could only be d eterm ined by evaluating the quality of the variants, w hich no m achine is capable of d oing.7 West‟s objections w ill be consid ered in d etail later, as they are im portant to the present investigation. But it is w orth noting for now that even if West had w anted to test a com puterized construction of a stem m a, there w ould have been obstacles to his p rogress. First, there w ould not have been read ily available technology for his purpose. But m ore im portantly, outsid e of theoretical com puter science or m athem atical graph theory, there had not been practical research on autom ating the construction of stem m ata when the data for the specimens is inconsistent or underdetermined. That is, if the variants in a set of manuscripts w ere com pletely com patible w ith a unique stem m a, w e w ould need only m ake the right inferences to generate it. In reality though, there is usually no stem m a w hich is not inconsistent w ith at least one locus in the m anuscripts; conversely, if a d egree of latitud e is allow ed so as to overcom e such strict inconsistencies, w e find a m ultitud e of possible stem m ata. These stem m ata w e m ust d istinguish on the basis of som e criterion capable of evaluating the likelihood that each w ould give rise to the m anuscripts as they exist. Thus, a variety of d ifficult problem s, theoretical and com p utational, inhere in the task of mechanically constructing a stem m a – and they are not problem s w hich classicists w ere likely to attack on their ow n. Fortuitously, how ever, d evelopm ent had sim ultaneously been taking place w ithin the biological d isciplines of taxonom y and 7
West, op . cit., p p . 71-2.
Breind el 6
system atics so as to m otivate biologists to attem pt these sam e problem s. For d erivation of the evolutionary relationships of a group of extant specim ens w as a key part of the em ergent stud y now called clad istics. Biologist Willi H ennig had begun to d evelop and ad vocate a strictly phylogenetic approach to arranging organism s.8 H ennig‟s view , that the evolutionary relationships of organism s form ed the best found ation for classifying and system atizing them , w as and rem ains the object of d ebate.9 Parts of his theory how ever, seem to have been ad opted or ad apted by increasing num bers of system atists throughout the 1970s. The clad istic approach seem s intuitively obvious, and G.D.C. Griffiths (along w ith m any d efend ers) insisted that it alone had the ad vantage of relying on objective fact about the organism s in question (rather than d eploying the organism s into classes invented by hum ans). Griffiths w rites, “[H ennig‟s m ethod ] provid es the only theoretically sound basis for achieving an objective equivalence betw een the taxa assigned to particular categories in a phylogenetic system .” 10 Unfortunately, w hat seem s intuitively obvious can also be d eceptively fallacious, and clad istics d oes have a d isingenuous sid e. It is w orth pointing out two objections to the system here, largely so that the read er m ay see that th ey do not apply to a textual application of the theory.
8
H ennig m ight be called the father of m od ern clad istics; his w ork w as d evelop ed and d ebated in variou s p u blications inclu d ing (1950) Grundzüge einer Theorie der Phylogenetischen Systematik, Berlin. (1966) Phylogenetic Systematics, Urbana, Illinois. (1971) “Zu r Situ ation d er biologischen System atik,” Erlanger Forschungen, R. Siew ing ed ., Erlangen. 9 For view s on the early intellectu al p ositions in the d ebates, see Ernst Mayr (1976) Evolution and the Diversity of Life: Selected Essays, Cam brid ge, Mass., p p . 435-41. 10 Griffiths, G.D.C. (1972) “The Phylogenetic Classification of Dip tera Cyclorrhap ha w ith Sp ecial Reference to the Stru ctu re of the Male Postabd om en,” W. Ju nk, N .V., The H agu e.
Breind el 7
First, even if w e are granted a thorough know led ge of the evolutionary interrelationships of the specim ens in question, no m ethod is thereby presented for d eterm ining the level of d escent at w hich class d ivisions should be m ad e. We are only show n that, having m ad e a choice, w e are bound to includ e and exclud e certain specim ens. Second , given three organism s, A, B, and C, suppose that A and B are sim ilar in form , w hile C d iffers greatly from both A and B. Suppose further that A and C are closer evolutionarily to one another than either is to B. In this situation – w hich is not uncom m on in nature – w e w ould be forced u nd er H ennig‟s system either to class A, B and C all together, or else to class A and C together against B (Figure 1).
Figure 1
N either of these options appeals to our intuition the w ay that the system at first d id . For A and B appear to form a group as against C, and yet this is precisely the classification w hich w e are prohibited from m aking. These tw o objections, w hile having m uch practical im port for the classifying of organism s, w ill clearly be irrelevant w hen w e com e to apply this m ethod to m anuscripts. First, w e need n‟t classify m anuscripts by nam e (and if w e d o, w e accept that classification as our ow n prod uction); second , w e have no sym pathy for sim ilarity
Breind el 8
of appearance betw een m anuscripts if w e have hard evid ence that they are unrelated in origin (since it is the origin that is the object of the textual quest). These clad istic m ethod s of analysis and classification, even if controversial, prom pted research into the creation and evalu ation of stem m ata (or clad ogram s) from incom plete and incom patible d ata. The clad istic approach d epend s for a starting point on d eterm ining the evolutionary relationships of the specim ens – and these relationships m ust be assem bled from lists of variations am ong the specim ens. H ence, in a sense, biologists set to w ork on the problem s w hich had stood in front of Martin West. But d ebate about the philosophical und erpinnings of the clad istic m ethod ology d id not subsid e. In 1977, the m ethod ology attracted a d efend er in University of Michigan classicist and zoologist H . Don Cam eron, d ue to clad istics‟ evid ent sim ilarity to established techniqu es in trad itional (i.e., not m echanical) textual criticism . 11 Cam eron along w ith N orm an I. Platnick d escribe the d ebate, and situate them selves in it, thus: Recent years have seen an increasing aw areness and u se am ong zoological system atists of the theory and m ethod s of phylogenetic analysis (clad istics) d eveloped by H ennig. These m ethod s have been w ell d efend ed by [E.O.] Wiley from the point of view of Popperian “hypothetico-d ed uctive” science. Critics, both of the m ethod s them selves and of their application to classification, have not been silent... The purpose of this paper is to point out a fact overlooked d uring the controversy, nam ely, that m ethod s analogous to those of H ennig are accepted as the stand ard tools of analysis in tw o other field s that resem ble phylogenetic system atics in being prim arily concerned w ith constructing and testing hypotheses about the interrelationships of taxa connected by ancestor-d escend ant sequences.
Platnick, N orm an I. and H . Don Cam eron (1977) “Clad istic Method s in Textu al, Lingu istic, and Phylogenetic Analysis,” Systematic Z oology 26:380-85. 11
Breind el 9
The field s referred to are ... textual criticism ... and ... linguistic reconstruction.12 Cam eron and Platnick, w riting for an aud ience of biologists, next sum m arize the techniques of textual criticism put forth by Paul Maas. 13 Differences of technique betw een biological and textual stem m atics – w hich Cam eron and Platnick view as subord inate to an overarching sim ilarity – are d escribed in m od erate d etail. 14 The paper is intend ed to provid e a critique of a situation w ithin a d iscipline of biology, but it serves also to ind icate that these scholars can recognize and m ake precise the correspond ence betw een stem m a construction and clad istic analysis. In a conference conclu d ed in 1983, Cam eron again presented his view of textual criticism . The conference had been organized to investigate the biological and clad istic m etaphor in other intellectual field s. 15 Cam eron treated stem m atics, but he d id not d iscu ss stem m ata as a m etaphor from biology, since, as he points out, the stem m atic m ethod s as used in both field s “w ere d eveloped by classical scholars system atically in the nineteenth century and ... the origins of the m ethod can be found as early as the sixteenth century...” 16 Beyond m erely recounting the techniques of Maas, Cam eron explores the d istinction – as far as it im pacts his clad istics-stem m atics com parison – betw een “vertical” or uncontam inated trad itions and “horizontal” transm issions, those
12
Platnick, op . cit., p . 380. Maas, P. (1958) Textual Criticism, Oxford ; Platnick, op . cit., p . 381-3. 14 Platnick, op . cit., p . 384. 15 Biological Metap hor Ou tsid e Biology (1982) and Interd iscip linary Rou nd -Table on Clad istics and Other Grap h Theoretical Rep resentations (1983) sym p osia at the University of Pennsylvania. Proceed ings in H oenigsw ald , H enry M. an d Lind a F. Wiener, ed s. (1987) Biological M etaphor and Cladistic Classification, Philad elp hia. 16 Cam eron, H .D. (1987) “The Up sid e-Dow n Clad ogram : Problem s in Manu scrip t Affiliation,” in H oenigsw ald , op . cit. 13
Breind el 10
“full of Byzantine, and even ancient, ed iting and conjecture.” 17 In the latter cases, “clad istic m ethod s give little aid .” But in the form er, he conclud es: [V]ertical transm ission and uncontam inated text trad ition m ake the m echanical application of clad istic m ethod s to reconstruct a single archetype a w orkable and successful m ethod , w ith a claim to being scientific...18 Thus, Cam eron argues that, at least in a vertical textual trad ition, w e ought to be able to use m ethod s from clad istics to d erive a stemm a and even an archetype. At this point, the next m ove for a textual critic m ight have appeared obvious: m ate West‟s insight about m echanical prod uction of stem m ata w ith Cam eron‟s insight that clad istics provid es the theoretical and algorithm ic und erpinning for West‟s operation. That is, use clad istic techniques to attack thorny problem s of textual transm ission. It is unclear w hy this approach w as not exploited in the 1980s. We m ight, how ever, hypothesize a paucity of tools to support such research. In the 1980s, three further d evelopm ents cam e about w hich m ad e the project presented herein m ore practicable. 19 One breakthrough w as im proved DN A sequencing:20 it becam e possible to put genetic m aterial from various species into an autom ated process and receive, as output, essentially a collation show ing every genetic d ifference betw een the sam ples. 21 More abund ant d ata w as now available w ith w hich clad istic analysis could w ork. 17
Cam eron, op . cit., p . 238. ibid . 19 It is im p ortant to note that none of these three d evelop m ents sp rang fu lly form ed from the head of Zeu s in the 1980s. It is convenient to d escribe them here, as their conflu ence seem s to change the research environm ent at the tim e, bu t research on DN A sequ encing, p arsim ony algorithm s, and of cou rse com p u ters had a long p rior history. 20 In p aticu lar the d evelop m ent of p olym erase chain reaction (PCR) d u p lication of DN A segm ents. 21 That is, in the sequ enced strand s of DN A. 18
Breind el 11
The second d evelopm ent of this tim e period w as the availability of com puters sophisticated enough to com pare and evaluate the thousand s or tens of thousand s of possible clad ogram s (stem m ata) w hich m ight result from com paring large num bers of species. That is, com p u ters allow ed biologists to overcom e that challenge w hich Maas had id entified for textual critics, w hen he observed that a large num ber of specim ens or w itnesses w ould prod uce an astronom ical num ber of possible stem m ata. 22 The last pre-requisite d evelopm ent w as softw are system s to put large quantities of d ata (w hether from DN A or elsew here) together w ith the com puters. Softw are to com pute likely stem m ata involves, at its core, algorithm s w hich have been topics in com puter science and m athem atics for a half-century or m ore. H ence, strictly speaking, appropriate softw are had probably been “in d evelopm ent” in research universities and corporate labs for som e tim e. But the early 1980s saw the release of packages d esigned specifically for clad istics, tailored to the needs of practicing biologists, and read y to run on existing m icrocom puters. The present experim ental stud y, d escribed below , is an attem pt to establish a stem m a for the textual trad ition of Sallust‟s De Coniuratione Catalinae using one such softw are package, the freely-d istributable Phylogeny Inference Package (or, as henceforth, PH YLIP).23
Maas, op . cit., p . 47: “If w e have fou r w it nesses, the nu m ber of p ossible typ es of stem m a am ou nts to 250, if w e have five, to ap p roxim ately 4,000, and so on in qu asi-geom etrical p rogression.” 23 Felsenstein, J. (1993) PHY LIP (Phylogeny Inference Package) version 3.5c, d istribu ted by the au thor, Dep t. of Genetics, Univ. of Washington, Seattle. See http :/ / evolu tion.genetics.w ashington.ed u 22
Breind el 12
Before proceed ing to describe the m ethod and outcom e of the experim ent, it is appropriate to consid er tw o technical objections w hich textual critics have put forw ard concerning stem m a construction. The first objection is one of M.L. West, printed above. West correctly pointed out that any stem m a d erived by algorithm w ould be an unoriented stem m a (or, as the clad ists say, an „unrooted clad ogram ‟).24 That is, the algorithm could d eterm ine the branchings of the stem m a but could not ascertain w hich branching belongs “at the top”(in practice, this am ounts to id entifying the nod es representing the subarchetypes). An unrooted clad ogram (Figure 2) can represent several d istinct rooted versions (Figure 3). Each rooted clad ogram can, in turn represent several d istinct possible phylogenies (Figure 4).25
Figure 2. Unrooted cladogram. This cladogram shows the relationships of the specimens relative to one another, but does not indicate their relationship to ancestors from which they descend.
Figure 3. Rooted cladograms. Each of these five rooted cladograms is consistent with the unrooted cladogram above (Figure 2). By postulating the first branching in the descent, the known relationships specify the remainder of the tree. Note, however, that the lenths of branches, and the specimens which might lie on the nodes of the tree, are not indicated.
24
West, op . cit., p p . 71-2. H u m p hries, C.J. and P.H . William s (1994) “Clad ogram s and Trees in Biod iversity,” M odels in Phylogeny Reconstruction, Robert W. Scotland , Darrell J. Siebert, and David M. William s, ed s., Oxford , p p . 336-7. 25
Breind el 13
Figure 4. Phylogenetic Trees. All four of these phylogenetic trees are compatible with a single cladogram above (Figure 3.ii). Note that schemata involving direct descent are included.
West‟s objection is legitim ate. It should not, though, prevent us from pursuing autom ated stem m a construction, for several reasons. First, the unrooted clad ogram is, if accurate, a great advance over no stem m a and an even greater advance over an incorrect stem m a. Second , it m ay in m any cases be tolerably easy to properly root the clad ogram , thus prod ucing a trad itional stemm a, based on our know led ge of the d ates and locales of origin for the various m anuscripts. Third , com puter m ethod s are particularly useful in the frequent circum stance that the collation is not uniquely com patible w ith any single proposed stem m a. In such cases, w e shall be happy to have an analysis of the entire collation, a m ost-likely stem m a, and a m athem atical justification for exclud ing m any other stem m ata. The second objection is one ad vanced by Roger David Daw e in stud ies of the trad itions of Aeschylus and Sophocles. 26 Daw e‟s contention is that there is so m uch horizontal transm ission in the trad itions for these authors, as ind icated by num erous true read ings appearing in d epend ent m anuscripts though absent in other m anuscripts, as to invalid ate the stem m atic approach. 27 Daw e confronts the m ethod ology of Pasquali
26
Daw e, R.D. (1964) The Collation and Investigation of M anuscripts of A eschylus, Cam brid ge and (1973) Studies on the Text of Sophocles, 2 vols., Leid en. 27 Cam eron, op . cit., p . 237.
Breind el 14
– and consequently confronts m y m ethod , w hich d erives partly through Pasquali, Maas, and West – at least in the case of ind ivid ual authors such as Aeschylus. H e w rites: We believe that the fact of unique preservation has been d em onstrated [in the Aeschylean case]; consequently the fault m ust lie w ith the theory of d escent, and w e conclud e that the ... stem m a d oes not after all represent, even in the sim plest form , the true ch aracter of the trad ition. ... It seem s clear that the picture presented by the m anuscripts is one of a recension so entangled that it is utterly im possible for us to unravel the thread s.28 Cam eron sum m arizes the problem s w hich Daw e‟s assertion poses to an y m ethod such as the one em ployed in the present stud y: Daw e d enies rad ically that archetypes can be reconstructed , but he necessarily pays a theoretical price for his conclusion... If there are no archetypes or stem m ata, and if true read ings are uniquely preserved in any m anuscript regard less of its stem m atic position, w e are then throw n back to a proced ure of evaluating read ings w hich is unaid ed by consid erations of outgroup com parison, reconstruction of an archetype, or to push the concept to its logical conclu sion, w ithout the consid eration of manuscript authority of any kind . 29 In ord er that w e m ay avoid an im broglio in Aeschylean Textkritik, w e m ight conced e Daw e‟s assertion to hold true in certain specific textual trad itions. But w e need not suppose that any particular num ber of su ch trad itions invalid ates the d ed uctive stem m atic m ethod in general. H ence, in the absence of any argum ent against stem m atic representation of the Sallustian trad ition, w e can proceed to analyze it via the clad istic approach.
Experimental Procedure
28 29
Daw e, op . cit. 1964, p p . 157-8. Cam eron, op . cit., p p . 237-8.
Breind el 15
In this stud y, the m anuscripts containing the De Coniuratione Catilinae and the De Bello Iugurthino w ere exam ined , as these tw o w orks are found together in one set of m anuscripts. Absent access to a com plete collation, an ad apted collation w as form ed by the follow ing m ethod . Eleven m anuscripts w ere selected from those includ ed in L.D. Reynold s‟ Oxford text of 1991 (Table 1). Siglum A B C D F H K N P Q V
Manuscript Parisinus 16025 Basileensis Parisinus 6085 Parisinus 10195 Hauniensis Fabricianus Berolinensis Phillippsianus 1902 Vaticanus Palatinus 887 Vaticanus Palatinus 889 Parisinus 16024 Parisinus 5748 Vaticanus 3864 (Florilegium Vaticanum) Table 1
Beginning at Catilina 1.1, the first 300 loci w ere selected w hich contain variants in one or m ore of the above eleven m anuscripts. 30 The ad apted collation w as then form ed by listing, for each locus, the groups of manuscripts w hich exhibited the sam e read ing. The collation then consisted of a sequence of row s su ch as appear in Table 2.
Locus: [rows 1-11] 12 13 [rows 14-300]
Group 1
Group 2
Group 3
Group 4
ABCDFNP C
HK N
V A
BDFHKPV
Table 2
30
To be m ore p recise, in keep ing w ith the biological m etap hor, only the latest m arkings in the m anu scrip ts w ere collated . Thu s, as corrected m arkings w ere ignored , loci containing variants in ear lier hand s are not inclu d ed in the 300. The selected loci d o, how ever, inclu d e every variant in the last hand (at each locu s) of the ap p rop riate m anu scrip t from Catilina 1.1 to 52.35.
Breind el 16
To analyze the collation, the DN APARS com ponent of the PH YLIP package w as to be em ployed , because it is the only com ponent of PH YLIP w hich can process m ulti-state d iscrete characters (albeit by m arking the states w ith DN A labels). 31 DN APARS is a program w hich com pares DN A base sequences for a set of specim ens and evaluates various possible clad ogram s on the basis of a parsim ony criterion. A parsim ony criterion favors arrangem ents of the specim ens w hich require the few est character state changes in the course of the specim ens‟ evolution. For exam ple, a phylogeny w hich requires a specim en possessing a DN A sequence of AAA to give rise to one possessing ACT and , thereafter, requires the specim en possessing ACT to give rise to one possessing the sequence AAA again w ould not be favored . This proposed phylogeny requires tw o bases to change state (AA to CT) and later to change again (back to AA), involving four base changes overall. Instead , a parsim ony criterion m ight favor an arrangem ent w here one specim en featuring the AAA sequence gives rise to the other w ith the AAA sequen ce, and the latter gives rise to that possessing the ACT sequence.32 This latter phylogeny requires only a single change of tw o bases, or tw o character state changes overall, and is thus m ore parsim onious than the form er. Further assum ptions involved in the parsimony m ethod , and d iffering view s about them , are listed (or references provid ed) by Felsenstein. 33 In ord er to evaluate the collation using DN APARS, the collation d ata had to be converted from the form illustrated in Table 2 to a form w herein m anuscripts grouped See “Frequ ently Asked Qu estions,” Felsenstein, op . cit. This p hylogeny “m ight” be favored becau se one can observe other p ossible p hylogenies w ith only tw o character state changes. Su ch p hylogenies w ou ld be equ ally p arsim oniou s w ith the one given, and hence w ou ld be ju d ged equ ally d esirable by a p arsim ony criterion. 33 “DN APARS – DN A Parsim ony Program ” (d ocu m entation) in Felsenstein, op . cit. 31 32
Breind el 17
by a shared read ing w ere each assigned a particular DN A base abbreviation (A, C, G, T, or “-“, w hich ind icates a fifth state to DN APARS). The DN A base label assigned to a m anuscript at a particular locus w ould correspond to the group in w hich t hat m anuscript resid ed at that locus. Each row of the collation w ould yield one DN A base label for each m anuscript; thus the 300 loci in the collation w ould prod uced a 300-base “DN A strand ” for each of the eleven m anuscripts. The creation and d ata entry of these 3,300 base labels w as beyond w hat could easily be accom plished m anually. To perform the task, a custom application program was w ritten (MSS2DN A) w hich allow s the entry of the collation in table form , perform s the translation to sequences of DN A base labels for the various m anuscripts, and m ou nts the results on the Microsoft Wind ow s clipboard (Figure 5). 34 From the clipboard , the DN A d ata for the various m anu scripts w as assem bled w ith a text ed itor into the file form at required by DN APARS, as d ocum ented by Felsenstein.35 In ord er to facilitate com parison to Reynold s‟ stem m atic w ork on the Sallust m anuscripts, and becau se they represent only parts of the text, d ata for m anuscripts V (a florilegium ) and Q w ere rem oved from the d ata file, leaving the nine m anuscripts for w hich Reynold s had published a stem m a. In rem oving V and Q, som e 27 (i.e., 9%) of the loci w ere rend ered irrelevant, although they rem ain in the set. 36
34
This p rogram , w hile not elegant, is p u blicly available (w ith sou rce cod e) so that others m ay ind ep end ently cond u ct investigations or rep eat and verify the p resent investigation. The p r ogram , MSS2DN A, ru ns on 32-bit Microsoft Wind ow s p latform s (Wind ow s 95, Wind ow s 98, Wind ow s N T) and m ay be d ow nload ed in archived (ZIP) form at http :/ / hom er.bu s.m iam i.ed u / ~ad breind / m ss2d na.zip 35 “Molecu lar Sequ ence Program s” in Felsenstein, op . cit. 36 These d ata p oints rep resent loci at w hich only Q and / or V d iffered from the consensu s of rem aining m anu scrip ts. These sites can be id entified from Ap p end ix B, in the table m arked “step s in each site,” as sites w here the table show s 0 step s. That is, the r em aining m anu scrip ts show consensu s at the site, so no character state changes are requ ired for any p hylogenetic arrangem ent of the m anu scrip ts.
Breind el 18
The com pleted DN APARS file appears in this report as “Append ix A: Infile.” The DN APARS program w as then run, u sing this file as its d ata source. 37
Figure 5. MSS2DNA. The columns collect the manuscripts which share a reading at each locus. The column headings indicate the DNA base labels which will be attached to the manuscript groups.
DN APARS prod uced the output file w hich appears in this report as “Append ix B: Outfile,” and w hich includ es the prelim inary phylogenetic tree (Figure 6). DN APARS w as then run on the input d ata several m ore tim es in ord er that other possible m ost parsim onious trees m ight be d iscovered . N o other m ost parsim onious trees w ere found . 37
The 386-Wind ow s p recom p iled PH YLIP execu tables w ere u sed throu ghou t. The p rogram op tions selected for DN APARS w ere all d efau lts w ith the follow ing excep tions: Rand om ize ord er w as selected , w ith a seed of 69 (=4*17+1) and 100 p erm u tations of the inp u t row s; term inal typ e w as set to (none); inp u t sequ ences interleaved w as set to N o; and all p rinting op tions for the ou tp u t w ere selected .
Breind el 19
One most parsimonious tree found: +--F.Hauniens +--8 +--7 +--D.Par10195 ! ! +--6 +-----H.Beroline ! ! +--------5 +--------K.VatP_887 ! ! ! +-----------N.VatP_889 +--4 ! ! +--C.Par_6085 ! ! +--3 --1 +--------------2 +--B.Basileen ! ! ! +-----A.Par16025 ! +-----------------------P.Par16024 remember: this is an unrooted tree!
Figure 6
In ord er that the output from this program m ight be com pared to Reynold s‟ published stem m a for Sallust, and in recognition of Reynold s‟ jud gm ents about the quality of the textual variants, the tree w as re-oriented using the PH YLIP‟s RETREE program . Since m anuscripts F, D, H , K, and N form ed a m onophyletic group and because they had been collected in Reynold s‟ presentation of the Sallu st stem m a, the nod e representing their com m on ancestor w as selected for the outgroup (or subarchetype). N ote that although the tree w as re-oriented no changes w ere m ad e to the genealogical relationships inferred betw een the m anuscripts by DN APARS. 38 The transcript of the RETREE session appears in this report as “Append ix C: RETREE 38
Re-orientation in effect asserts likely p ositions for the su barchetyp es. As d escribed above, West had ind icated that su ch a step w ou ld be requ ired , and that it shou ld be cond u cted u sing a critic‟s evalu ation of the variants.
Breind el 20
Session.” 39 The session also prod uced as output a new tree file. This tree file w as u sed as input to PH YLIP‟s DRAWGRAM program , w hich constructed a graphical representation of the stem m a (Figure 7).
Figure 7
For the sake of com parison, Reynold s‟ stem m a is reprod uced (Figure 8). 40
Figure 8
As can be observed from the com puter-generated tree and Reynold s‟ tree (Figures 7 and 8), they are nearly id entical m od ulo inversion. There are, how ever, tw o
The p rogram op tions selected for RETREE w ere all d efau lts w ith the follow ing excep tion: “no grap hics” w as selected . 40 Reynold s, L.D., ed . (1991) C. Sallusti Crispi: Catilina, Iugurtha, Historiarum Fragmenta Selecta, A ppendix Sallustiana, Oxford , p . xi. 39
Breind el 21
d ifferences. First, Reynold s associates N and K m ore closely w ith each other than w ith H , D, or F, w hile DN APARS d etected no such d ifference in proxim ity. Second , Reynold s associates A m ore closely w ith P than w ith B or C, w hile DN APARS ind icated no such closer affiliation. This latter d istinction can in fact be attributed to d ifferences in the text being collated , rather than to d ifferences betw een the analyses of Reynold s and DN APARS (see below ).
Analysis
Since several hund red rearrangem ents of the ord er of the “DN A strand s” prod uced no further most parsim onious trees, it seem s reasonable to suppose that the m anuscript collation d ata specify a unique m ost parsim onious tree. 41 The existence of a unique m ost parsim onious tree is itself an ind ication that the present m ethod m ay be prod uctive, as it obviates the need for a hum an to insert prejud ices into the analysis, by selecting one clad ogram from a list of many. The sim ilarity of the results d erived through Reynold s‟ analysis to those d erived through the parsim ony analysis can, in light of the novelty of the approach, only be called stunning. This sim ilarity is further strengthened w hen w e account for one of the tw o ind icated d ifferences betw een the stem m ata. As d escribed above (see n. 30), in keeping w ith the m etaphor of biological evolution, only the latest extant m arkings (corrections, not includ ing d eletions) on each m anu script w ere collated . Thus, w here the first and This su p p osition is based on Felsenstein‟s im p licit assu m p tion that a relatively sm all nu m ber of rearrangem ents of the inp u t d ata ou ght to yield m u ltip le m ost p arsim oniou s trees if they exist. Su ch an assertion seem s m athem atically su sp ect, consid ering th e large nu m ber of p ossible p erm u tations of, say, nine m anu scrip ts (over 360,000). On this m atter, how ever, I d efer to Felsenstein‟s know led ge as a sp ecialist. 41
Breind el 22
second hand s of A d iffered , the second hand w as read for the collation instead of the first. Reynold s naturally constructs his stem m a ind icating the position of the original A text. But he notes that “Secund a m anu s (A 2) librum lectionibu s instruxit ex aliquo stirpis [= B, C] cod ice petitis.” That is, w here read ings exist in A 2, they com e from the B-C branch – w hich fact DN APARS appears to have recognized , in asserting the A -A 2 m anuscript to d escend both from an ancestor of P and also from a closer ancestor of B and C. To test this hypothesis, w e w ould m erely need to m od ify the collation to reflect only A-A 1 read ings, and then see w here DN APARS places the m anuscript. H aving taken the d iscrepancies into account, it seem s that both the hum an and the m achine-assisted analysis d erive results from the sam e und erlying pattern am ong the m anuscript read ings. This stud y, then, prelim inarily suggests that the parsim ony analysis technique cou ld substantively ad vance know led ge of textual transm ission. Furtherm ore, the parsim ony analysis can ind icate the read ings likely to appear in the archetype and subarchetypes, in ord er that they m ost efficiently give rise to the extant m anuscripts. A d etailed exam ination of such archetype reconstruction is beyond the scope of this stud y. But am bitious read ers should note that Append ix B to this paper (i.e., the DN APARS output) provid es the read ings likely to appear at various nod es in the clad ogram for every locus stud ied . On Reynold s‟ view of the transm ission, the archetype (his ), ought to bear the read ings given for nod e 4.
Future Research
Breind el 23
The future presents a num ber of im m ed iate challenges and possibilities for the clad istic analysis of texts using p arsim ony techniques. The obvious m ethod s through w hich the proced ure m ay be tested includ e exam ining a variety of texts, as w ell as using full collations – in place of collations bu ilt from apparatus critici – so as to avoid d epend ence on one ed itor‟s opin ion of w hat m ay be viable m anuscript read ings. 42 If positive results are ind icated , parsim ony analysis m ight be d eployed to assist the textual critic in d eterm ining the relationships of texts, and in reconstructing archetypes, for new publications. Perspectives m ay also be presented for re-evaluating existing d ogm a about trad itions w hich have not been recently exam ined . 43 In the classroom , the use of graphical interactive parsim ony program s, w hich allow one to m anipulate stem m ata on -screen and im m ed iately to observe the consistencies or inconsistencies thus fostered , m ay facilitate integration of stem m atics into the stand ard classics curriculum .44 Lastly, literary theorists m ay w ish to pond er the existence of d eeper m etaphors connecting the enzym es and m utation s of DN A replication w ith the correspond ing verbal agents and scribal errors giving rise to m any of our textual variants.
“Read ings w hich m u st qu ite certainly be elim inated have no p lace u nd er the text,” w rites Maas ( p . 23), thu s giving ed itors license to om it even from the app. crit. those read ings d eem ed eliminanda. 43 We m ay su p p ose that p arsim ony analysis w ill be effective in evalu ating relationship s betw een m anu scrip ts of texts in m od ern, as w ell as ancient, langu a ges. 44 MacClad e (d istribu ted by Sinau er Associates) is one su ch p rogram . Many cand id ates w hich m ight be u sefu l for heavy-d u ty analysis as w ell as p ed agogy are d escribed by Felsenstein at http :/ / evolu tion.genetics.w ashington.ed u / p hylip / softw are.htm l 42
Appendix A: Infile 9 300 P.Par16024AAACCCCCCCAATACCCCCCAACGCCCACACCCAACCACCACCCCCACGACGGAAACCCCCCGCCCCCCACCACAC CACACCCCCCACCCAACCCATACCACCAAACACACGCCCCCGCCACACGACGGCACACACCCCACACAACACCTCACCCACAACCCCCCCCACCACCCAACAAACCGCCCAAACCCACACCCCCACCACCCAACCCCCACCCCACACCCCCACAAACC CCCCACCTACCCCCCCCCACCAGCCCCACCAACCAAACCAACCCCCGAACCACCCCCACCCCCCA A.Par16025CCACACCCCACAGCCCCCCCACCGCACCCCCCACCCCCCCCCCCCCCCATCGAACCCGCCACGCCCCCCGCAACCC CCCCCCCCCCCACACTCCCCACCACCCACACCCCCGACCCAAACCAAAAACGGCCCCCCCCCCACACCCCCCCCACCCACCCAAAACACCCACCCCCCCACACCCAACCAACAAAACCACCCCCCCCAACACCCCCCACCACCCCCCCACACCCCCCAAACA ACCCACCCCCACCCCCCCGCCCCACCCACCACCCCACACCCCGACCCCACCCCGACACCGC B.BasileenAACCCCCCCAAATCACCCCCAGCGCCCACACAACCCCCACACCCCCGCCCCGGAACCCCCCCCCACCACCCCCCCC CCCCCCCCCCCACCATACCCAACGAACAAACCCCCGAACCACACACAACCCGGACACCGCCCCACACCCCCCACCACCCACCCCCA ACCCCCACACCCCACCAGACAACCACCCCCCCCACCCCCCCCAACCCCCCCCCCCCAACCCCCCCCCCCCCCCCGCCCACCCCCCC CACCACCCCGCACCACCCACCGCCCCACCCCCCGACCCCCCCCCCACACCGA C.Par_6085AAACCCCCCAAAACACCCCCAGCGCACCCCCAACCCCCCCCCCCCCGCACCGGACCCCCCACACACCACTCAACCC CACACCCCCCCACACTCCCCTACGCCCACACCCCCGCACCACACAAAACCCGGCCACCCCCCCACACCCCCCCCCACCCACCCCCA ACCCCCCCCCCCCCCCAGCCACCCACCCCCAACACCCCCCCCACCACCCCCCACCCCAACACACCCCCCCCCCCGCCCCCCCCCCC CACCCCCCCGCCCCACCCACCACCCCACCCCCCGACCCCACCCCGACACCGC N.VatP_889CAACCCAACCCACACCACAAACCGCCCCACCCCCCAAAACACCCCCAAGGCAGCCCCACACAAACCCCCACCACCC CCCCCAACCCCACCCGCCCCACAAAGAACCCACCCGCCCCCGCCCCCAGCAGGCGGCCGACACCACCCCCCAGCGCCCCCCCCCCC AACCCCCCCCCCCCCCAGACAGCACACACCCAACACCCCCCCAACACCCCCACACCCCACCCCCCCCCCCAACCACAACAACCACC CCCCCCCCAGCCCCCCCAACCACCCCCCCCCCCCCCCACCCCCCGCAGCCGC K.VatP_887ACCCCCCACCCCTCCAACCAAGCGCCCCCCCCCCCAAAACACCACAAAGGAGGCCCCCCCACGACCCCCGCCGCCC CCCCCCCCCACCCCGGACCCCACAAGACCCACCCCACACACGCCACCCGCAGAGGGCCCACACCACCCACCATCCCTCCACCCCCC ACCACACACCCCCAACAGCCAGCCCCCCCCCAACCCCACCACAACCCACAAACCCCCCCCCCCCCCCCCCCACCACCCAACACACC CCCACCCAAGCCCCACCACCCCCACCCCCCCCCGCCCACCACCCACCACCGC H.BerolineAACCCACCACACTCCCCCCACGACCCCCACCCCCCCACACGCCCCAAAGCCGGACCCCACCGGCCCCACCACCCCC CCCCCCCCCACACCCTACCCGCCTAGCCCAACCCACCCACCGCCCCCAGCCGGACTCCCCAACCCCCCGCCATCCCCCACCCCCAACCACACCCCACCACAAGACAGACCCCCCCCCCACCCCCCAAGCCCAAAAACAGACCAACCCCCACCCCCCCCCCCC CAC-GCCCCCCCACCCCCACACCCCCACCAGCCCCACCCACCGCCCCCCCCCAACCTCCGC D.Par10195CACACACCACCATCCACACAATCACCACCCACCCCCAAACGACAAAACGGACCCCCCCCCCAGACAAACACCCACC CCCCACAACACCCCTCAAACGCCAAGCCCCACACAACCAACGCCCCCCGCACCGGGCCACAAACCCCCCCCAGCCCGCCCCACCCA CCCCAACAACAACGAAAGCAAGCCCCCCCCCACCCAACCCAACCCAAAAAACCGACCAACCACCACCCCCCCCCCCCCACGCCCAC CCCACCACCCCACACCCACCAGCAAACACCCAAGCCCCCCCACCCCCACCAC F.HauniensCACACACCCACATCCAAACCAGCAACACCCACCCCCAACAAAACAAACGGACCCCCATCCCACACAAACACACACA CCCCACAACCCACCTGAACCGGCAAGCCCCCCACCACAAACGCACCCCGCCAGTGTCCGCACACCCCACCCAGCCCGCCCCACCCA CACAAACACCAAAGAAAGCAAGCCCCACCCCACGCAACACAACCCCAAAACCCCAACACCCACCACCCCCCCCCCACCACGCCCAC CACAACACCCCACCCCCACCAGCCAAGACCAAAGCCACCCCACCCCC-ACAC
Breind el 25
Appendix B: Outfile DNA parsimony algorithm, version 3.572c Name ----
Sequences ---------
P.Par16024 A.Par16025 B.Basileen C.Par_6085 N.VatP_889 K.VatP_887 H.Beroline D.Par10195 F.Hauniens
AAACCCCCCC CC..A....A ..C......A .........A C.....AA.. .CC....A.. ..C..A..A. C.CA.A..A. C.CA.A...A
AATACCCCCC C.GC...... ...CA..... ..ACA..... C.C...A.AA CC.C.AA..A .C.C.....A C..C.A.A.A C..C.AAA..
AACGCCCACA .C...A.C.C .G........ .G...A.C.C .C.....CAC .G.....C.C CGAC...CAC .T.A..AC.C .G.AA.AC.C
CCCAACCACC ..ACC..C.. .AACC..CA. .AACC..C.. ...CCAA.A. ...CCAA.A. ...CC.ACA. A..CC.A.A. A..CC.A..A
ACCCCCACGA C.....C.AT ......G.CC C.....G.AC .......A.G ...A.A.A.G G....A.A.C GA.AAA...G .AA.AA...G
CGGAAACCCC ..A.CC.G.. .....C.... ....CC.... .A.CCC.A.A A..CCC.... ....CC..A. ACCCCC.... ACCCCCAT..
P.Par16024 A.Par16025 B.Basileen C.Par_6085 N.VatP_889 K.VatP_887 H.Beroline D.Par10195 F.Hauniens
CCGCCCCCCA A........G ..C.A..A.C A.A.A..A.T .AAA...... A..A.....G .G.....A.C .A.A.AAA.. .ACA.AAA..
CCACACCACA .A..C..C.C ..C.C..C.C .A..C..... ....C..C.C ..G.C..C.C A.C.C..C.C ..CAC..C.C .ACACA.C.C
CCCCCCACCC ......CA.A ......CA.. ......CA.A .AA...CA.. .....AC... .....ACA.. A.AA.AC... A.AA..CA..
AACCCATACC CT...CAC.A .TA..CA..G CT...C...G CG...CACAA GGA..CC..A CTA..CGC.T TCAAACGC.A TGAA.CGG.A
ACCAAACACA C...C..C.C .A.....C.C C...C..C.C .GA.CC...C .GACCCAC.C .G.CC.AC.C .G.CCCACAC .G.CCC.CAC
CGCCCCCGCC ..A...AAA. ..AA..ACA. ...A..ACA. .......... .A.A.A.... AC..A..... AA..AA.... .A.AAA...A
P.Par16024 A.Par16025 B.Basileen C.Par_6085 N.VatP_889 K.VatP_887 H.Beroline D.Par10195 F.Hauniens
ACACGACGGC CA.AA..... ...ACC...A .A.ACC.... C.CA.CA... ..C..CA.AG C.CA.C...A C.C..CACCG C.C..C.A.T
ACACACCCCA C.C.C..... CAC.G..... CAC.C..... GGC.GA.A.C GGC.CA.A.C CTC.C.AA.C GGC...AAAC GTC.G.A.AC
CACAACAC-C ...CC.C.C. ...CC.C.A. ...CC.C.C. AC.CC.CAG. AC.C..CAT. .C.CG.CAT. .C.CC.CAG. .C..C.CAG.
TCACCCACAA A.-..AC.C. CAC..AC.CC CAC..AC.CC G.C...C.CC C.T..AC.CC C.-..AC.CC C.G...CACC C.G...CACC
CCCCCCCCAC AAA.A..... .AA....... .AA.....C. ..AA....C. ..A..A.ACA .AA..A.AC. .A....AACA .A.A.AAACA
CACCCAACAA .C...C...C AC....C..G .C...CC..G .C...CC..G .C.......G .CA...CA.G ACAA.G.A.G .CAAAG.A.G
P.Par16024 A.Par16025 B.Basileen C.Par_6085 N.VatP_889 K.VatP_887 H.Beroline D.Par10195 F.Hauniens
ACCGCCCAAA C.AA..A.C. ..AA..ACCC C.AC..ACCC ..A..A..C. C.A....CCC ..A.A..CCC CAA....CCC CAA....C.C
CCCACACCCC AAAC...... ...C...... ..A....... ....ACA... ....AC...A ...C...... .....C.AA. .....G.AA.
CACCACCCAA .C...A.ACC .C...A..CC .C.....ACC .C...A.ACC .CA..A..C. .CAAG..... .CAAC..A.. ACAAC.....
CCCCCACCCC ....AC.A.. .....C..AA ....AC...A ...A...... .AAA.C.... AAA.AGA..A AAA..GA..A AA...CAA.A
ACACCCCCAC C.C..A.AC. C.C.....C. ....A...C. ..C.....C. C.C.....C. ..C...A.C. ..CA..A.C. C.CA..A.C.
AAACCCCCCA CCC.AAA.A. CCC...G..C CCC...G..C CC.A..A.A. CCCA..A..C CCC......C CCC......C CCC....A.C
P.Par16024 A.Par16025 B.Basileen C.Par_6085 N.VatP_889 K.VatP_887 H.Beroline D.Par10195 F.Hauniens
CCTACCCCCC ..C......A A.CC.....A ..CC.....A .AAC.A.... AAC..A.... A.-G...... A.GC..A... A.GC..A..A
CCCACCAGCC ...C..C... ..AC..C..A ...C..C... ...C...... .A.C.A.... .A.C..CA.A .A.CA.CC.A .AACA.CC.A
CCACCAACCA .....C.... .....C...G .....C.... ..C....... ......C..C ..C...C.AG .AC...C.AG ..C...C.AG
AACCAACCCC CC...CA... CC...C.... CC...C.... CC..CC.... C...CC.... CC...C..A. C.AAC....A CCAAG...AA
CGAACCACCC ...C..CA.. ...C..C... ...C..CA.. .CCC.AC... ..CC.AC.A. ..CC..C... A.CC..C..A A.CCA.C..A
CCACCCCCCA ..GA.A..GC ..CA.A..G. ..GA.A..GC ..G.AG..GC .....A..GC .A...T..GC ..C..A..AC ..C..-A.AC
Breind el 26 One most parsimonious tree found:
+--F.Hauniens +--8 +--7 +--D.Par10195 ! ! +--6 +-----H.Beroline ! ! +--------5 +--------K.VatP_887 ! ! ! +-----------N.VatP_889 +--4 ! ! +--C.Par_6085 ! ! +--3 --1 +--------------2 +--B.Basileen ! ! ! +-----A.Par16025 ! +-----------------------P.Par16024 remember: this is an unrooted tree! requires a total of
500.000
steps in each site: 0 1 2 3 4 5 6 7 8 9 *----------------------------------------0! 3 2 2 1 1 1 1 2 2 10! 2 3 2 3 2 1 2 3 1 1 20! 2 1 4 1 2 1 2 1 2 2 30! 2 1 1 1 1 1 2 1 2 3 40! 1 4 1 1 2 1 1 2 2 2 50! 4 2 2 2 2 2 1 1 3 1 60! 1 3 3 4 2 1 1 1 2 0 70! 5 1 3 3 1 1 1 0 2 0 80! 2 1 1 2 1 0 2 1 3 0 90! 2 4 4 2 1 1 1 4 4 1 100! 3 2 2 2 1 2 2 2 2 1 110! 1 2 2 2 3 1 2 1 2 1 120! 1 3 2 1 3 2 2 3 2 2 130! 4 3 4 1 0 5 2 1 2 1 140! 1 2 1 0 2 3 0 1 1 5 150! 0 3 1 5 0 0 3 1 1 1 160! 2 1 2 2 2 1 2 1 1 2 170! 2 2 1 1 1 1 4 3 1 0 180! 2 4 1 1 2 1 1 1 2 2 190! 2 1 1 2 3 2 3 1 1 1 200! 1 1 1 1 1 2 3 0 4 2 210! 2 1 1 2 2 3 4 1 2 1 220! 2 4 0 2 1 1 1 1 1 1 230! 0 1 1 2 2 1 1 3 1 2 240! 2 2 2 4 4 0 2 1 0 0 250! 2 0 1 2 1 1 1 2 2 0 260! 2 0 1 2 0 0 1 1 0 1 270! 3 1 3 1 1 3 2 1 0 2 280! 1 1 1 1 1 1 2 1 2 1 290! 1 0 1 4 1 1 4 1 0 2 300! 2
Breind el 27 From
To
Any Steps?
State at upper node ( . means same as in the node below it on tree)
1 4 5 6 7 8 8 7 6 5 4 2 3 3 2 1
1 4 5 6 7 8 F.Hauniens D.Par10195 H.Beroline K.VatP_887 N.VatP_889 2 3 C.Par_6085 B.Basileen A.Par16025 P.Par16024
maybe yes yes yes yes yes yes yes yes yes yes yes yes yes yes maybe
AAACCCCCCC .......... .......M.. ..C....... .....A.CM. C..A...... ........CA ........A. ........A. .C.....A.. C.....AA.. .........A .......... .......... ..C....... CC..A..... ..........
MATMCCCCCC .......... C.....M..A .M.C.M.... .......... .A...A.A.. ......A..C ......C... AC...CC... .C...AA... ..CA..A.A. ...C...... A...A..... ..A....... .......... C.G....... A..A......
AVCGCCCMCM .S.....C.C .......... .G........ ...V...... ...A..A... ....A..... .T........ C.AC....A. .......... .C......A. .....M.... .G........ .....A.... .....C.A.A .C...A.... .A.....A.A
CCCMMCCACC ...CC..... .....MA.A. .......... .....C.... A......... ........CA .......... .......C.. .....A.... .....A.... ..A....C.. .A........ .......... ........A. .......... ...AA.....
1 4 5 6 7 8 8 7 6 5 4 2 3 3 2 1
1 4 5 6 7 8 F.Hauniens D.Par10195 H.Beroline K.VatP_887 N.VatP_889 2 3 C.Par_6085 B.Basileen A.Par16025 P.Par16024
maybe yes yes yes yes yes yes yes yes yes yes yes yes yes yes maybe
ACCCCCACGN .......... .......A.G .....A.... R......... .A..A..C.. A.A....... G..A...... G........C ...A...... .......... M.....V.A. ......G..C C......... A.......C. C.....C..T .........A
CGGAMMCCCC ....CC.... ...C...... M......... .......... ACC....... ......AT.. .......... C..A....A. A......... .A.....A.A .......... .......... .......... ....A..... ..A....G.. ....AA....
CCGCCCCCCA .......... .M.A...... .......... .V.....A.. .A...AA... ..C....... .......... .G.C.....C AC.......G .AA....... M........N ..V.A..A.. A.A......T C.C......C A........G ..........
CCACMCCMCM ....C..C.C .......... ..V....... ..C....... ...A...... .A...A.... .......... A......... ..G....... .......... .M........ .......... .A.....A.A .CC....... .A........ ....A..A.A
1 4 5 6 7 8 8 7 6 5 4 2 3 3 2 1
1 4 5 6 7 8 F.Hauniens D.Par10195 H.Beroline K.VatP_887 N.VatP_889 2 3 C.Par_6085 B.Basileen A.Par16025 P.Par16024
maybe yes yes yes yes yes yes yes yes yes yes yes yes yes maybe maybe
CCCCCCMMCC ......CA.. .......... .....A.... .......... A.AA...... .....C.... .......C.. .......... .......C.. .AA....... .........M .......... .........A .........C .........A ......AC..
MDCCCMWMCM C....CA..A .G........ ..A...V... ......GC.. T..A...... .......G.. .C..A..... .T.......T G.....CA.. .......CA. .T........ .......A.G ......T... A.A....... .......C.. AA...ATA.C
ACCAMACACM ....C..M.C .GM..C.... ...C..AC.. ..C....... ........A. ......C... .......... .....A.... ..A....... ..A....A.. M......C.. .......... C......... AA..A..... C......... ....A....A
CGCCCCCGCC .......... .......... .A...M.... M...A..... .....A.... C..A.....A A......... AC...C.... ...A.A.... .......... ..M...AVA. ...A...C.. ..C....... ..A....... ..A....A.. ..........
maybe yes yes yes
MCAMGMCGGC .......... ..C..CM... .........G C.........
VCMCMCCCCA ..C.C..... GG...M.A.C .......... .K...CA...
CACMMCMC?C ...CC.C... MC.....A.. ........K. C.........
YC?CCMMCMM C.?...C.C. .........C ..?....... ..........
1 4 5 6
1 4 5 6 7
Breind el 28 7 8 8 7 6 5 4 2 3 3 2 1
8 F.Hauniens D.Par10195 H.Beroline K.VatP_887 N.VatP_889 2 3 C.Par_6085 B.Basileen A.Par16025 P.Par16024
yes yes yes yes yes yes maybe yes maybe yes yes maybe
...C...V.. ......CA.T ......ACC. ...A..C..A A..C..A.A. C..A..A... .M.AV..... A...CC.... .A........ .C.......A CA..AA.... A..C.A....
....V...A. .T..G..C.. .G..A..... CT........ .....A.... ....GA.... C......... .A........ .......... ....G..... .......... A.A.A.....
........G. ...A...... .......... ....G...T. A...A...T. A.......G. ........C. .......... .......... ........A. .......... ...AA.A.-.
..G..C.A.. .......... .......... ..-..A.... ..T..A.... G.C..C.... .....A.... .AC......C .......... .......... A.-......A T.A..CA.AA
1 4 5 6 7 8 8 7 6 5 4 2 3 3 2 1
1 4 5 6 7 8 F.Hauniens D.Par10195 H.Beroline K.VatP_887 N.VatP_889 2 3 C.Par_6085 B.Basileen A.Par16025 P.Par16024
maybe yes yes yes yes yes yes yes yes yes yes yes yes yes yes maybe
CCMCCCCCAC ..A....... ........C. .....A.A.M .A........ ..C...A..A ...A...... .....C.... .........C .........A ...A...... .A........ .......... ........C. .......... A...A..... ..C.......
CMCCCMACAR .C.......G .......... .....A.... ..A....A.. ...A.G.... ....A..... A......... ......C... .......... .....CC... .......... ......C... .....C.... A....A.... .....C...C .A...A...A
MCMGCCCAMA ..A.....C. .......... .......C.C .......... CA........ ........A. .......... A...A..... C......... A....A.... ...A..A... .......C.C C..C...... A......... C......... A.C.....A.
CCCACACCCC .......... ....MC.... .......... ....C..... .......AA. .....G.... .......... ...C.A.... ....A....A ....A.A... ..MM...... .......... ..AA...... ..CC...... AAAC...... ..........
1 4 5 6 7 8 8 7 6 5 4 2 3 3 2 1
1 4 5 6 7 8 F.Hauniens D.Par10195 H.Beroline K.VatP_887 N.VatP_889 2 3 C.Par_6085 B.Basileen A.Par16025 P.Par16024
maybe maybe yes yes yes yes yes yes yes yes maybe yes yes yes yes maybe
CMCCAMCMMM .C...A..C. .......... ..A....C.A ...AVC..A. ....C..... A......... .......A.. ....G..... .......... .......A.C .........C .......... .....C.A.. .......C.. .......A.. .A...C.CAA
CCCCCMCCCC .......... ...M...... .AA..V.... A..C.SA..A .......... ..C..C.A.. .....G.... ....AG.... ...A.C.... ...A.A.... ....MC.... .........A ....A..... ....C...A. ....A..A.. .....A....
ACMCCCCCMC ..C.....C. .......... .......... ......A... ...A...... C......... .......... .......... C......... .......... M......... .......... A.A.A..... C......... C....A.A.. ..A.....A.
MMACCCMCCA CCM...A... ...M...... ..C......C ...C..C... .......... .......A.. .......... .......... ...A...... ..AA....A. ..C....... ......G..C .......... .......... ....AA..A. AA....C...
1 4 5 6 7 8 8 7 6 5 4 2 3
1 4 5 6 7 8 F.Hauniens D.Par10195 H.Beroline K.VatP_887 N.VatP_889 2 3 C.Par_6085
maybe yes yes yes yes yes yes yes yes yes yes maybe no
CCYMCCCCCC ..C....... .M...M.... A......... .C?V.C.... ..GC..A... .........A .......... ..-G...... .A.A.A.... .AAC.A.... .........A ...C...... ..........
CCCMCCAGCC ...C...... .......... .A........ ......CV.A ....A..C.. ..A....... .......... .......A.. .....A.... .......... ......C... .......... ..........
CCACCAACCA .......... ..M....... ......C..V ..C.....AG .......... .......... .A........ .......... ..A......C ..C....... .....C.... .......... ..........
MMCCAMCCCC CC...C.... ....C..... .......... ........M. ..AA.A...A ....G...A. .A......C. ....A...A. .A........ .......... .......... .......... ..........
Breind el 29 3 2 1
B.Basileen A.Par16025 P.Par16024
yes yes maybe
A......... ..A......A .........G .......... ...A...... .......... .......... ......A... ..TA...... ...A...... .......... AA...A....
1 4 5 6 7 8 8 7 6 5 4 2 3 3 2 1
1 4 5 6 7 8 F.Hauniens D.Par10195 H.Beroline K.VatP_887 N.VatP_889 2 3 C.Par_6085 B.Basileen A.Par16025 P.Par16024
maybe yes maybe maybe yes yes no yes yes yes yes no maybe yes maybe maybe
CGAMCCMCCC ...C..C... ..C..M.... .......... .....C.... A........A ....A..... .......... .......... .....A..A. .C...A.... .......M.. .......... .......A.. .......C.. .......A.. ...A..A...
CCRCCMCCSM .....A..GC .......... ..A....... .......... ..C.....A. .....-A... .......... .A...T.... .......... ..G.AG.... ..GA...... .......... .......... ..C......A .......... ..A..C..CA
Breind el 30
Appendix C: RETREE Session Tree Rearrangement, version 3.572c Settings for this run: U Initial tree (arbitrary, user, specify)? N Use the Nexus format to write out trees? 0 Graphics type (IBM PC, VT52, ANSI)? W Width of terminal screen, of plotting area? L Number of lines on screen?
User tree from tree file No ANSI 80, 80 24
Are these settings correct? (type Y or the letter for one to change) 0 Tree Rearrangement, version 3.572c Settings for this run: U Initial tree (arbitrary, user, specify)? N Use the Nexus format to write out trees? 0 Graphics type (IBM PC, VT52, ANSI)? W Width of terminal screen, of plotting area? L Number of lines on screen?
User tree from tree file No (none) 80, 80 24
Are these settings correct? (type Y or the letter for one to change) y Reading tree file ... retree: can't read intree Please enter a new filename>treefile
,>>1:F.Hauniens ,>15 ,>14 `>>2:D.Par10195 ! ! ,>13 `>>>>>3:H.Beroline ! ! ,>>>>>>>12 `>>>>>>>>4:K.VatP 887 ! ! ! `>>>>>>>>>>>5:N.VatP 889 ,>11 ! ! ,>>6:C.Par 6085 ! ! ,>17 -10 `>>>>>>>>>>>>>16 `>>7:B.Basileen ! ! ! `>>>>>8:A.Par16025 ! `>>>>>>>>>>>>>>>>>>>>>>>9:P.Par16024 NEXT? (Options: R . U W O T F B N H J K L C + ? X Q) (? for Help) o Which node should be the new outgroup? 12 ,>>1:F.Hauniens ,>15 ,>14 `>>2:D.Par10195 ! ! ,>13 `>>>>>3:H.Beroline ! !
Breind el 31 ,>>>>>>>>>>12 `>>>>>>>>4:K.VatP 887 ! ! ! `>>>>>>>>>>>5:N.VatP 889 ! -10 ,>>6:C.Par 6085 ! ,>17 ! ,>16 `>>7:B.Basileen ! ! ! `>>>>>>>>>>>>>11 `>>>>>8:A.Par16025 ! `>>>>>>>>9:P.Par16024 NEXT? (Options: R . U W O T F B N H J K L C + ? X Q) (? for Help) w Enter R if the tree is to be rooted OR enter U if the tree is to be unrooted: r Tree written to file