Using Fuzzy Logic to Create Links: Resolving References to Court Decisions Jack Rugh <
[email protected]> Julia Lennen <
[email protected]>
Abstract The abstract was not available at the time the proceedings were created. Please check an updated version [http://www.idealliance.org/papers/dx_xml03/html/abstract/05-05-03.html] of the paper abstracts at the conference proceedings web site.
Table of Contents 1. Introduction ............................................................................................................................... 1 1.1. Tax Management — Legal Tax Publisher ............................................................................. 1 1.2. SGML/XML Approach ..................................................................................................... 2 1.3. The Caselinking Project ..................................................................................................... 2 2. Requirements for Building a Valid Link .......................................................................................... 2 2.1. Case Citation Markup ....................................................................................................... 2 2.2. Issues with Case Citations .................................................................................................. 3 3. Tempus — TM Primary Source Repository ..................................................................................... 3 4. Creating and Fulfilling Requests for Links ....................................................................................... 4 4.1. Portfolio Production Workflow ........................................................................................... 4 4.2. Defining the Key Metadata Values ...................................................................................... 5 4.3. Matching Logic ............................................................................................................... 5 4.4. Creating and Maintaining Normalized Case Names ................................................................. 9 4.5. Computing the Similarity of Two Case Names ..................................................................... 10 5. Case Matching Statistics ............................................................................................................. 10 5.1. Overall Matching Statistics ............................................................................................... 10 5.2. Matching Statistics by Portfolio Reference .......................................................................... 10 5.3. Effectiveness of Similarity Computations ............................................................................ 12 5.4. An Alternate Method of Computing the Similarity Between Two Strings .................................. 12 6. Dealing with Anomalies — the Editorial User Interface .................................................................... 13
1. Introduction 1.1. Tax Management — Legal Tax Publisher The Bureau of National Affairs, Inc. (BNA) provides in-depth coverage of legal and regulatory developments for professionals in business and government through its print and electronic news, analysis, and reference products. Tax Management, Inc. (“TM”), a subsidiary of BNA, focuses on tax planning and compliance information. TM’s premier product, the BNA TAX Management Library, provides expert analysis and practice tools by the world’s leading tax authorities. Included in the Library are the Tax Management Portfolios, treatises that reflect best practices in the field of tax compliance and supply users with research, planning, and implementation tools. Legal publications can contain two broad categories of content: primary source material (laws and regulations, government released documents, and court decisions) and secondary material (expert analysis, journal articles, and additional materials to assist the reader in the interpretation of the law). When “primary source” documents are referenced, or cited, in TM’s online products, links are provided for navigation from the analysis to the cited
Proceedings by deepX Ltd.
XSL• FO RenderX
1
Using Fuzzy Logic to Create Links: Resolving References to Court Deprimary source material. Most links are easy to build — documents released from government agencies usually have unique identifiers or section enumeration, thus providing easy naming conventions.
1.2. SGML/XML Approach BNA has adopted formal publishing standards based on SGML. A fully integrated SGML publishing system has been developed to support editorial workflows and delivery to print, CD-ROM and online systems. TM maintains the portfolios in the BNA repository. The TM caselinking system uses the BNA DTD for data storage and exchange and also deploys a number of XML-based strategies for file storage, document assembly, and the editorial interface.
1.3. The Caselinking Project In 2001, TM committed itself to increasing its collection of tax-related court decisions and creating navigational links from the portfolio analysis to cited decisions. Faced with this challenge, TM looked to its long-time technical partner, Retrieval Systems Corporation (“RSC”), to help design and build the solution. Several years prior, RSC had converted the portfolio data to SGML. RSC was also converting the cases themselves from a generically coded format into BNA-compliant SGML. RSC knows the BNA DTD and TM’s working environment. And best of all, its staff includes attorneys and developers with extensive exposure to legal publishing applications. The TM and RSC team worked together on the “Caselinking” project. The remainder of this paper discusses the project tasks: •
Analysis of the requirements for building a valid link to a decision based on the components of a case citation.
•
Development of a database repository for the decisions and associated metadata (Tempus).
•
Modification of existing production processes to include requests to the database to create links and the development of matching logic to ensure database requests are properly evaluated and fulfilled.
2. Requirements for Building a Valid Link 2.1. Case Citation Markup “The basic purpose of a legal citation is to allow the reader to locate the cited source accurately and efficiently.” The Bluebook: A Uniform System of Citation Building links to court decisions can be trickier than building links to other source material. Case citations follow a standard convention, documented in the widely accepted authority, The Bluebook: A Uniform System of Citation. A court decision is usually published in one or more “reporters,” a series of bound volumes, publicly or privately published. A specific decision is referenced by the Volume and abbreviated title of the reporter and the page number of the start of the decision. The court and decision year complete the citation. When an opinion is published in more than one reporter, the additional citations are called “parallel” citations. For example, Noneman v. Comr., 40 T.C.M. 99, 110 (1980) is “Noneman versus Commissioner” (of the IRS) and the decision is printed in Volume 40 of Tax Court Memorandum Reporter and starts on page 99 (with referenced information on page 110). The decision was decided in 1980.
The TM portfolio data has SGML markup for the cases. For example, the case above is marked as:
Proceedings by deepX Ltd.
XSL• FO RenderX
2
Using Fuzzy Logic to Create Links: Resolving References to Court De
Noneman v. Comr., 40 T.C.M. 99, 110 (<decision.date year="1980">1980) In this example, the citation in the portfolio (40 T.C.M. 99) corresponds with one of the citations in the decision data. Unfortunately, some of the decisions in TM’s collection were acquired without a complete list of citations and others, while they do have citations, do not have the same citation used in the portfolio content.
2.2. Issues with Case Citations As the team looked into the variables that might affect the success of linking each citation to a decision, the following was noted: •
Citations are not guaranteed to be unique since more than one decision can start on a page in the bound volume. Example: 10 TCM 1006 refers to Volume 10 of Tax Court Memorandum (CCH) decisions. On page 1006, the entirety of Nicholas Di Vincenzo v. Commissioner appears. On that same page, Sol Smith v. Commissioner begins. Thus, both cases have the same official citation.
•
Recently released decisions may not have an official citation, as they are not yet part of a bound volume. Example: Jimmy C. Chisum v. Renato Begh, a United States District Court case for the District of Columbia, was decided in June 2003. As of September 2003, this case still did not have an official citation. It would be referenced by the docket number, court, and date.
•
The official citation in the portfolio data might be to a reporter other than the one included in the decision data. Example: Helvering v. Safe Deposit & Trust Co., 316 U.S. 56 (1942) links to a case with the citation “28 AFTR 1256” in the database.
•
The “short” case name might vary between the portfolio data and the decision data due to a difference in editorial style or in the use of abbreviations. Example: Joliet & C. R. Co. v. U.S. also appears as Joliet & Chicago Railroad Co. v. U.S.
The variations in citation style both in the portfolio data and the metadata values extracted from the text of the decision meant that the program logic would need to be flexible. But first, the team needed to create the repository to store the decision text and metadata.
3. Tempus — TM Primary Source Repository The team met to determine what application should be used for the repository. The BNA corporate publishing system was not optimal because the caselinking project needed different metadata values and the link processing could have impacted the publishing system’s performance. So the decision was made to create a TM Primary Source repository (“Tempus”) to enable the easy definition (or redefinition) of metadata values, as well as the ability to load and reload data at will. Tempus had to support several design fundamentals: •
The data and the metadata values would be stored in a relational database.
•
Both the data and metadata would need to be loaded and extracted — both automatically and on demand.
•
The ability to create assembly files (hierarchical build lists) for product-making based on metadata values in the database.
Proceedings by deepX Ltd.
XSL• FO RenderX
3
Using Fuzzy Logic to Create Links: Resolving References to Court De•
Development of support tools to edit both metadata and data files.
•
A UNIX-based approach to leverage existing production tools.
With these requirements in mind, Oracle 8i was selected as the platform of choice. The database is designed to store each court decision as an “XMLized” SGML file. Key metadata values are stored along with the data. Among these values are: short case name, tribunal (court name normalized to a tribunal acronym), decision year, citation(s) (normalized), status, and a number of other fields.
Figure 1. Tempus System As new court decisions are available, automated “Capture and Convert” tools sweep these files in and convert them to SGML conforming to BNA’s DTD. To leverage XML features in Oracle and Internet Explorer, the SGML is “XMLized” when it is loaded in the database and re-SGMLized on its way out. Tools handle the extraction of metadata values from the incoming text file to populate the metadata fields in Tempus. To create product, the data is extracted along with updated assembly files that include the new decisions. In sum, Tempus is designed to: •
Store case data and metadata
•
Maintain tables required for normalizing case metadata
•
Maintain assembly instructions and sort rules to produce build lists for the hierarchical presentation (by court and year) of cases
•
Create a repository that can be queried for link values while processing other datasets.
4. Creating and Fulfilling Requests for Links 4.1. Portfolio Production Workflow The TM portfolios are continually updated to keep users abreast of the ever-shifting landscape of laws and regulations that influence business decisions. Using BNA’s SGML-based publishing systems, editors modify the portfolio content and process it for delivery. A closed-loop continuous publishing system allows them to check electronic output prior to approval for online distribution.
Proceedings by deepX Ltd.
XSL• FO RenderX
4
Using Fuzzy Logic to Create Links: Resolving References to Court DeWhen an editor submits a portfolio for electronic review, the recent changes are extracted from the publishing system. The data is transformed to a delivery format. As part of this transformation step, links to forms, images, court decisions, and other data types are validated. When case citation tags are found in the data, a request to Tempus is formulated. If the request is successful, a link is built.
4.2. Defining the Key Metadata Values The matching of portfolio references to court decisions makes use of information contained in the portfolio elements
, , , and <decision.date> compared to equivalent information in various tables in the Tempus repository. Note that some of these values are “normalized” in various ways. Case names are normalized as described in Section 4.4, below. Stylized forms of tribunals, dates, and citations are also maintained. The following six types of information are used for matching purposes: 1.
Reporter (or official) citations: Exact match on the normalized form of any of the reporter citations.
2.
Tribunal (or court): Exact match on the normalized form. An example of a normalized tribunal is: US-CA01 (United States Court of Appeals for the First Circuit). Tribunals are an explicit part of a citation unless they can be accurately derived from the reporter name. Example: 40 T.C.M. 99 is a decision issued by the U.S. Tax Court. On the other hand, a citation to a decision found in “F.2d” (Federal Reporter, 2nd Edition) must include the tribunal, as in 787 F.2d 578 (1st Cir. 1986) and 312 F.2d 574, 576 (10th Cir. 1962).
3.
Decision date: Exact match on the portion of the date specified in the portfolio reference, either year or year/month/day. Decision dates are sometimes explicit in the data (as in the examples directly above). Other times, the date, or at least the decision year, can be derived from the citation, as in T.C. Memo 1996-509.
4.
Both least common name words: Exact matches on the two words in the requested case name that occur least frequently in the set of all words in all available case names;
5.
Either least common name words: An exact match on either of the two words in the requested case name that occur least frequently in the set of all words in all available case names; and
6.
Case name: “Fuzzy logic” match of the two names to determine their degree of similarity.
4.3. Matching Logic Matching is attempted on a multistep basis, with one or more steps to retrieve potential matching decisions based on the first five types of information described above. This step or steps can be followed by an evaluation of whether any of the retrieved decisions should be considered as matches based the similarity of the requested and retrieved case names (“fuzzy logic”). Diagrams 1 — 5 depict the logic flow of the matching process. The discussion of that flow is keyed to the Matching Logic Diagram numbers and the individual Diagram Block numbers. Matching Logic Diagram 1 Diagram 1 depicts the first step for all Caselinking requests. It is triggered by SGML markup in the portfolio data.
Proceedings by deepX Ltd.
XSL• FO RenderX
5
Using Fuzzy Logic to Create Links: Resolving References to Court De-
The request is evaluated to determine (1) whether one or more reporter citations are specified; (2) whether a tribunal is specified or can be determined from a reporter citation; (3) whether a decision date is specified or can be determined from a reporter citation; and (4) whether a case name is specified. If a case name is specified, it is normalized (see below) and the two words from the name that have the least number of occurrences in the repository of court decisions are isolated. If one or more reporter citations are specified, whether or not the tribunal, decision date, and case name are missing or cannot be determined, processing continues at Matching Logic Diagram 2. If no reporter citations are specified but the tribunal, decision date, and case name are specified or can be determined, processing continues at Matching Logic Diagram 3. Diagram Block 1.A If no reporter citation is specified, and if either the tribunal, decision date, or case name is missing or cannot be determined, then no link is created. Matching Logic Diagram 2 If, in Matching Logic step 1, reporter citations are found in the source data, then the program logic shown in Matching Logic Diagram 2 is performed.
A database query is constructed to retrieve all decisions that include any of the specified reporter citations. If the tribunal and/or decision date are available, matches on those values are also required. Note that normally the request will include a date specified only as a year, which will be used to match against the database field with the decision year. However, if a full date is specified in the request, the full date will be used in the match.
Proceedings by deepX Ltd.
XSL• FO RenderX
6
Using Fuzzy Logic to Create Links: Resolving References to Court DeIf the query retrieves no decisions, but the tribunal, decision date, and case name are available, an alternate query will be attempted as noted at Matching Logic Diagram 3. Diagram Block 2.A If the query retrieves no decisions, and either the tribunal, decision date, or case name are unavailable, no link is created. Diagram Block 2.B If one decision is retrieved, and if no case name was specified, a link is made. Diagram Block 2.C If more than one decision is retrieved, and if no case name was specified, no link is created. This situation can legitimately occur when multiple decisions occur together on the same page of a reporter. Diagram Block 2.D If one decision is retrieved, and if the case name was specified, the specified name and the name in the decision are evaluated for similarity. If the similarity value is above “0.0”, a link is made (Diagram Block 2.D.1), otherwise no link is made (Diagram Block 2.D.2). This similarity test very rarely ends with no link. Diagram Block 2.E If more than one decision is retrieved and if the case name was specified, the similarity between each retrieved decision name and the specified name is computed, and the decisions are ranked. If the most similar decision has a rank that is more than “0.5” greater than the rank of the second most similar decision, a link is made to the first decision (Diagram Block 2.E.1). If the rank difference is “0.5” or less, then no link is made (Diagram Block 2.E.2). In such cases, it is necessary for editorial personnel to determine which, if any, decision should be the destination of the link. Matching Logic Diagram 3 If, in Matching Logic step 1, no reporter citations are found in the source data, but the tribunal, date, and case name are specified, then the program logic shown in Matching Logic Diagram 3 is performed. Matching Logic 3 is also performed if, in Step 2, after a query for citation and optional tribunal and date, no decision is retrieved.
A database query is constructed to retrieve all decisions for the specified tribunal and date, and both of the least frequently occurring words from the case name. If no decision is retrieved, an alternate query is attempted as noted at Matching Logic Diagram 4.
Proceedings by deepX Ltd.
XSL• FO RenderX
7
Using Fuzzy Logic to Create Links: Resolving References to Court DeIf more than one decision is retrieved, the relative similarity between the names for those decisions and the requested case name is determined in processing at Matching Logic Diagram 5. Diagram Block 3.A If one decision is retrieved, the specified case name and the name from the decision are evaluated for similarity. If the similarity value is greater than or equal to “0.2”, a link is made (Diagram Block 3.A.1), otherwise no link is made (Diagram Block 3.A.2). Matching Logic Diagram 4 If the query for both case names in Matching Logic step 3 does not retrieve any decisions, then step 4, below, is performed.
A database query is constructed to retrieve all decisions for the specified tribunal and date, and that contain either of the least frequently occurring words from the case name. If no decision is retrieved, no match is made (Diagram Block 4.A). If more than one decision is retrieved, the relative similarity between the names for those decisions and the requested case name is determined in processing at Matching Logic Diagram 5. Diagram Block 4.B If one decision is retrieved, the specified case name and the name from the decision are evaluated for similarity. If the similarity value is greater than “0.65”, a link is made (Diagram Block 4.B.1), otherwise no link is made (Diagram Block 4.B.2). Matching Logic Diagram 5 If the query for either or both case name “least common terms” (from Matching Logic step 3 or step 4) retrieved more than one decision, then step 5, below, is performed.
Proceedings by deepX Ltd.
XSL• FO RenderX
8
Using Fuzzy Logic to Create Links: Resolving References to Court De-
If more than one decision is retrieved, the similarity between each decision name and the specified name is computed, and the decisions are ranked. If no decision has a similarity value greater than “0.65”, no link is made (Diagram Block 5.A). If one decision has a similarity value greater than “0.65”, a link is made (Diagram Block 5.B). If multiple decisions have a similarity value of greater than “0.65”, no link is made. In such cases, it is necessary for editorial personnel to determine which, if any, decision should be the destination of the link.
4.4. Creating and Maintaining Normalized Case Names One of the first issues considered in developing logic to measure similarity was to determine what portions of the case names should be used for similarity matching. An analysis of approximately 110,000 case names for decisions to be made available for linking provided two sets of information: 1.
a set of common words that should not be considered for matching purposes Case insensitive common words included the usual suspects (e.g. “an”, “as”, “and”, “of”, etc.). However, a variety of subject-matter-specific words such as “appellee”, “plaintiff”, “respondent” were also added to the set of common words, based on the fact that their presence in short case names is problematic and because they contribute little or nothing to determining the similarity between two names.
2.
sets of words and word abbreviations that should be considered as synonymous. The sets of case-insensitive synonyms (without trailing punctuation) that were created include about 600 different terms that are often expressed in alternate forms. For instance: •
United States, U.S, US
•
Oklahoma, Okla, Okl, OK
•
Commissioner, Comm, Comr, Commr
•
Limited, Ltd
•
Association, Ass'n, Assn, Assoc
•
Boulevard, Blvd
•
Thomas, Thos
•
William, Wm
All case names are reduced to a common form by isolating alphanumeric terms, removing embedded commas and possessive trailers (i.e. “s’” and “’s”), removing common words on a case-insensitive basis and normalizing synonyms on a case-insensitive basis. As decisions are added to the repository, a table that provides a count of globally unique normalized terms is maintained. For example, the term “commissioner” appears “78659” times and the
Proceedings by deepX Ltd.
XSL• FO RenderX
9
Using Fuzzy Logic to Create Links: Resolving References to Court Determ “crouch” appears “12” times in the 150,000 decisions in the repository. Additionally, a table that provides the count of unique terms in each specific case name is maintained.
4.5. Computing the Similarity of Two Case Names The computational logic used in the production version of the system was developed in conjunction with the development of the term normalization logic. Each occurrence of each term found in both case names is given a weight that is based on its frequency of occurrence in the total collection of case names. That weight is defined as log ( 150000 / global_occurrence_count ) Thus, the terms that occur least frequently are given a much higher weight than terms that occur most frequently. For example, the weight of “crouch” is “4.09” and the weight of “commissioner” is “0.28”. The similarity between two case names is computed by summing the weight of each occurrence of each term that occurs in both names, and then dividing that value by the value of the sum of the weight of each occurrence of each term that appears in either of the case names. This computation always yields a number between zero and one.
5. Case Matching Statistics 5.1. Overall Matching Statistics The following statistics give the reader an idea of how many case citations are unique and how many are repeated references to the same decision. Decisions in the case repository
153,086
Unique decision references for which one matched decision was found
36,936
Unique matched decisions referenced 1 time
18,221
Unique matched decisions referenced 2 to 5 times
15,541
Unique matched decisions referenced 6 to 10 times Unique matched decisions referenced 11 or more times
2377 797
Table 1.
5.2. Matching Statistics by Portfolio Reference The following table provides information on the various forms of portfolio references made, the numbers of decisions retrieved, and the numbers of retrieved decisions that were considered to be matches. Decisions are retrieved without regard to the similarity between the referenced and retrieved case names. Whether or not a retrieved decision is considered a match depends on the similarity between the referenced and retrieved case names, when there is a referenced case name.
Proceedings by deepX Ltd.
XSL• FO RenderX
10
Using Fuzzy Logic to Create Links: Resolving References to Court DeNote
Reference Type
Decisions Retrieved
Decisions Matched
(1)
Reporter citation(s), with or without tribunal and/or decision date - one decision retrieved (Diagram Block 2.B)
1973
1973
(2)
Reporter citation(s), with or without tribunal and/or decision date - multiple decisions retrieved (Diagram Block 2.C)
14
0
(3)
Reporter citation(s) and fuzzy case name, with or without tribunal and/or decision date - one decision retrieved (Diagram Block 2.D)
78,655
78,545
(4)
Reporter citation(s) and fuzzy case name, with or without tribunal and/or decision date - multiple decisions retrieved (Diagram Block 2.E)
329
318
(5)
Fuzzy case name, tribunal, decision date, and two least frequently occurring words - one decision retrieved (Diagram Block 3.A)
9989
9989
(6)
Fuzzy case name, tribunal, decision date, and two least frequently occurring words - multiple decisions retrieved (Matching Logic Diagram 5)
46
2
(7)
Fuzzy case name, tribunal, decision date, and either of two least frequently occurring words - one decision retrieved (Diagram Block 4.B)
1029
489
(8)
Fuzzy case name, tribunal, decision date, and either of the two least frequently occurring words - multiple decisions retrieved (Matching Logic Diagram 5)
2649
291
(9)
Totals - References in Portfolios (100,360)
94,684
91,607
Table 2. Table Notes: 1.
All retrieved decisions were treated as matches, because the reference contained no case name.
Proceedings by deepX Ltd.
XSL• FO RenderX
11
Using Fuzzy Logic to Create Links: Resolving References to Court De2.
No retrieved decisions were treated as matches, because no reference case name was available to distinguish among the retrieved decisions.
3.
The retrieved decision was not treated as a match, because the case names had very weak similarity value.
4.
The reference case name allowed one of the retrieved decisions to be treated as a match.
5.
Although all retrieved decisions were treated as matches, it is certainly possible that with a different set of references and decisions, some would be rejected because of weak similarity values between the two case names.
6.
See note 4, above.
7.
See note 3, above.
8.
See note 4, above.
9.
Of the total 100,360 portfolio references to decisions, 94,684 caused at least one decision to be retrieved, which resulted in 91,607 matches.
5.3. Effectiveness of Similarity Computations The matching differences caused by the consideration of case name similarity are fairly low as a percentage of all matches made (about 3 percent). Even so, there are several advantages to considering the similarity between names when creating portfolio to decision links: 1.
Resolving multiple-case-on-page situations
2.
Allowing matches to be made when there are no reporter citations to match
3.
Detecting reporter citation errors
4.
Detecting significant differences in the styles of short case names.
5.4. An Alternate Method of Computing the Similarity Between Two Strings Since developing the matching logic described above, RSC has had several other occasions in which we needed to compute the similarity between various strings of text. In one instance, the descriptive phrases used in multiple classification indexes needed to be compared and ranked based on similarity for human review and adjustment. In another instance, portfolio references to a set of court decisions in which the court decisions contained no reporter citations needed to be compared to determine whether referenced decisions were available in the set. To handle these requirements, RSC has developed a small toolkit that is based on the “Vector Representation and Similarity Computation” algorithm described in "Introduction to Modern Information Retrieval" by Gerard Salton and Michael J. McGill, published in 1983 by McGraw-Hill (ISBN 0-07-054484-0), pages 120 — 123. Using this algorithm, we have computed similarity both using words as terms and using trigrams as terms as described in the "Trigram Phrase Matching" work of The National Library of Medicine (NLM) (see http://ii.nlm.nih.gov and http://ii.nlm.nih.gov/MTI/trigram.shtml). Although RSC has not formally compared the results of using the Salton/Trigram methodology with the algorithm used in the current production system, the sense is that the Salton/Trigram algorithm is somewhat superior. The toolkit that was developed is available for download at http://www.retrievalsystems.com. There is also a wide variety of information on text matching algorithms available in the “Digital Archive of Research Papers in Computational Linguistics” at http://acl.ldc.upenn.edu/.
Proceedings by deepX Ltd.
XSL• FO RenderX
12
Using Fuzzy Logic to Create Links: Resolving References to Court De-
6. Dealing with Anomalies — the Editorial User Interface In the previous section, several problems are presented that could be solved by a user with direct access to Tempus. For example, the multiple match situations could be resolved with additional metadata. Modifications to the parallel citations or short case names could also resolve some links. Toward that end, TM asked RSC to develop a user interface to Tempus. It was determined that a browser-based solution would provide users portable, easy-to-use access. RSC developed the solution using a combination of technologies. The SGML data was converted to XML and stored in Oracle. A Java servlet system was written to provide the browser interface and to manage the editorial interaction. A back-end server was written in Omnimark to interact with the database and to leverage the functionality created for portfolio case linking. The interaction between the servlets and the Omnimark server was modeled after the SOAP protocol and implemented a working web service. The editor can locate a case record by searching on the case name, court, date, docket number, parallel citation, Oracle ID, or vendor identification. When multiple records are retrieved, the user can select the desired record and make modifications to any of the metadata fields (see screen snap below). The user can also check out the data file and make updates or modifications to it. Records with changes are flagged for republishing.
Figure 2. Editorial User Interface
Biography Jack Rugh Retrieval Systems Corporation Vienna United States of America [email protected]
Proceedings by deepX Ltd.
XSL• FO RenderX
13
Using Fuzzy Logic to Create Links: Resolving References to Court DeMr. Rugh, who has over 35 years of information systems development experience, has been Vice President of Operations for Retrieval Systems Corporation since 1988. Since 1972, he as worked primarily with developing text processing applications, beginning a the technical manager for development of one of the early full-text legal retrieval systems (JURIS) for the United States Department of Justice. At Retrieval Systems, Mr. Rugh concentrates on text transformation, repository, workflow, and delivery systems, primarily involving SGML or XML. Retrieval Systems, located in Vienna, VA, is a technical resource to the publishing industry, providing extensive knowledge of markup languages (both SGML and XML) and up-to-date technologies. Focusing on providing content-centric systems engineering services to legal and accounting publishers, law firms and related organizations, RSC is known as a pioneer in state-of-the-art systems and software solutions. The company's specialists are experienced in every aspect of developing, maintaining and enhancing multi-media and full-text content management systems. Services include: Development of content management and delivery systems utilizing a variety of technologies; data conversion to SGML, XML, and other processing languages; creation of fulltext storage and retrieval systems, and software, using both off-the-shelf and custom components. RSC provides ongoing consultation for all aspects of electronic publishing data management, plus training and support for all systems and software RSC develops. Prior to joining Retrieval System, Mr. Rugh worked for the United States Department of Justice for 16 years, and spent several years working for IBM and as an officer in the United States Air Force. Julia Lennen Tax Management, Inc., a subidiary of The Bureau of National Affairs, Inc. Washington United States of America [email protected] Julia Smull Lennen has been involved with the design and development of complex integrated publishing solutions since 1982. A Senior Project Manager at BNA's Tax Management since 1999, Lennen is responsible for content management systems that transform editorial data for product delivery. Tax Management is a subsidiary of The Bureau of National Affairs (BNA), a legal publisher in Washington, DC. In 1994-1997, Lennen worked on BNA's SGML publishing system, incorporating her knowledge of print production into the editorial interface design. From 1997-1999, Lennen led an SGML system development project at Aspen Publishers. Prior to joining BNA in 1994, Lennen worked for Datalogics, Xyvision, Kurzweil Computer Products, and the Quadex Corporation. She has also worked as a graphic designer. She attended the Printing Technology Masters program at the Rochester Institute of Technology and holds a BA in Fine Arts from Bard College.
Proceedings by deepX Ltd.
XSL• FO RenderX
14