20091123-description Of The Fietstas Application

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View 20091123-description Of The Fietstas Application as PDF for free.

More details

  • Words: 1,757
  • Pages: 8
Description of the FietsTas application Author: Harro Stokman Date: November 23, 2009.

Management Summary The author investigates for Talking Trend (TT) what software exists within the ILPS research Group of the University of Amsterdam. As such, nine applications are recognized that are of interest to TT. One of these applications is the FietsTas. The current document provides a detailed technical overview of the application. The information provided is obtained through an interview with Valentin Vijkoun from the ILPS group The FietsTas application offers a document processing service on the Internet, allowing organizations to request annotations based on uploaded text files. The main purpose of the service is to generate term clouds (a graphical display of e.g. the frequency of terms in the content) and entity lists for text documents. The application consists of 13.000 lines of Python code, is documented, and is actively maintained within the ILPS group. Based on the data described in the detailed findings below, the following answers can be given to the research questions stated by TT1: 1. What issues exist regarding intellectual property, which need to be solved before TT applies the software commercially? The application is closed source. A licensing agreement needs to be signed with the UvA. The application uses other closed source modules from the ILPS group: The Compound Splitter, NEN and SSScraper Finally, FietsTas uses NER which in turn uses TnT. A commercial license was requested by the author with the owner of TnT, Mr. Thorsten Brants. On November 23, 2009 Mr. Brants replied not to have time to support commercial licenses. 2. What quality does the software have, what should be done to improve the quality, such that the software can be used for commercial purposes? The NER module needs to be replaced before the software can be applied commercially. The main attractiveness of FietsTas is that the functional quality of the different modules from the ILPS group can be evaluated by TT without having to install and combine them.

1

Contract between Stokman and Talking Trends of October 2009.

Detailed findings The current section describes 50 answers to various technical questions:

General 1 What is the name of application? FietsTas. 2 Briefly what does the application do? The FietsTas application offers a document processing service on the Internet, allowing organizations to request annotations based on uploaded text files. The main purpose of the service is to generate term clouds (a graphical display of e.g. the frequency of terms in the content) and entity lists for text documents. 3 Is the application language specific (vlakbij, achter Centraal Station)? Yes, the language of uploaded documents should be specified. If no specification is available, the language is detected automatically. Supported languages are Dutch and English. 4 In what scientific paper is the application used? The functionality is not described explicitly in academic papers. 5 Who is the owner? The software is developed in the ILPS group. It is not open source, although it uses other open source applications (the mySQL database and the Named Entity Recognizer). 6 Which UvA developers have experience with the application? Valentin Jijkoun and Andrei Vishiuski. 7 When does their contract with UvA end? None of the contracts ends within a year from the writing of this report. 8 What alternatives exist for the application (closed or open source)? An alternative is the Opencolais application from Reuters. Note that the Opencolais application works for English texts only. 9 What is the latest available version? Unknown, the application is under active development.

Architecture 10 What is the architecture of the application? A high level overview of the architecture is given in Figure 1. FietsTas is implemented as a Web service using a simple client-server stateless

protocol: an application sends requests using standard HTTP POST or GET requests, and the FietsTas service responds by sending a standardized XML response over HTTP.

Figure 1: High level architecture of FietsTas

Interaction with other systems 11 With which other systems does the application interact? Internally, the application uses • SSScraper for distributed scheduling and for job processing, mySQL database, • Named Entity Recognizer, • Named Entity Normalizer, • The Compound Splitter. In future, the application is planned to use: • The TimexTag, • The Stanford tagger (for every word it is assigned whether the word is a noun, verb or adjective). Furthermore, the application might use Lingpipe in future for sentiment analysis. Externally, the application can interact with third party software. This is described in the following section. 12 How does the application interact with other systems? The internal communication is realized through software wrappers around the individual modules, allowing to send and read data. The external communication is realized through API’s in two ways: • Web API: FietsTas can be accessed via REST and SOAP web services. These web service layers serve as interfaces to the same functions of FietsTas, so they can be used interchangeably. For the REST web



service, the user accesses FietsTas via HTTP POST/GET requests and receives HTTP responses. For the SOAP web service, user communicates with FietsTas via SOAP layer messages Software API: FietsTas API libraries are available for PHP and Python. Internally, libraries use Web API to access FietsTas functions, so they can be employed in any application that can access FietsTas over the internet.

Hardware, operating environment 13 What operating system does the system run on? Linux. 14 What operating is the system developed on? Linux. 15 Are there any platform limitations that may be reached in the foreseeable future (e.g. maximum file size, maximum number of concurrent users? The mySQL database has maximum number of connections, although this can be configured. In case up scaling is required beyond the capacities of mySQL; this may be achieved using other, commercial databases. 16 Is the application dependent on the operating system? No, in principle the system should not be platform dependent. However, this is not yet verified in practice. 17 Is the application hardware dependent? No, in principle the system should not be hardware dependent. However, this is not yet verified in practice.

Programming languages 18 Which languages are used in the system? The FietsTas application is developed in programming language Python. The internal components are treated as black boxes. 19 Which programming environments are used? This is dependent on the researcher. 20 How many lines of source code are there? The application consists of 13.000 lines of Python code and 100 lines of PHP.

Code generation 21 Are parts of the system generated? No, source code is not generated. Also, FietsTas does not require training data.

22 How are parts of the system generated? Not applicable (N. A.)

Data storage 23 Which type of storage is used? Using a mySQL database. 24 No.

Are vendor specific extensions used?

25 How is the connection with the database and the marshalling of data organized? The application contains a special layer containing code to connect to a database or to flat files. To access the database, standard Python libraries are used for executing SQL commands.

User interface 26 Which kind of user interface does the system have (text, web, windows) FietsTas provides a web interface for humans, accessible through a web browser. The web interface allows users to manually upload documents. Only a subset of the FietsTas functions is made available through this interface: users can upload/list documents and generate simple annotations/clouds. This interface can be used for testing purposes and for users to test applicability of the system in their applications. A simple (password-protected) web interface is also provided for FietsTas developers. This interface allows developers to visualize internal objects of FietsTas. It can also be used for debugging. 27 Is there tool support for the user interface? PHP development environment. 28 How is the connection between the user interface and the rest of the application organized? Using the programming language PHP.

Reporting 29 Are reporting facilities available in the application? There is a monitor available which is written in Python 30 Which tools are used for reporting? N. A.

Performance demands 31 Are performance demands described? Performance requirements are currently defined by Mr. Vishiuski. This is part of the Bridge project. 32 On what size/speed of hardware should the application run? FietsTas is parallelizable where several instances run on multiple machines. These servers currently use QuadCore processors with 8 GB of memory. FietsTas also runs on a dedicated server with 20 GB of memory with 4 processors. 33 What is the current size of the data that can be handled by the application? This is not known explicitly. The FietsTas application is used extensively in projects as Duoman, TNT (tracking of events in news), and Bridge (a project together with Beeld en Geluid). 34 Is the application a multi-user system? Yes, users are supplied with an API key, in order to prevent access to data from other people. 35 What is the current / max number of concurrent users? Unknown.

Documentation Is any of the following documentation available? 36 Architecture description An overview of the architecture is given in the Duoman document, Data collecting and indexing infrastructure, STEVIN internal report, November 2008. 37 Functional documentation Functional documentation is available. The document describes • how users can obtain keys to use the service, • the type of documents that can be and should be uploaded, • the type of request that user scan ask FietsTas to perform • How a user can provide feedback (this functionality is work in progress). 38 Technical documentation At http://zookst5.science.uva.nl:8080/FietsTas/ there is developer documentation. This link is not available from outside though. 39 How up to date are above documents The Duoman document is from November 2008. The last change for the user requirements are from July 2009.

40 Is there a bug reporting system? Yes, using Trac. 41 Are tools used to support the documentation? Yes, using Trac.

Configuration management 42 Which version control system is used? SVN. 43 Are change requests logged? Yes, using Trac.

Source build process 44 Is the build process automatic? No, although a Make file exists for database creation. . 45 How typical is the build process (i.e. would new developers know how this works?) There is a readme for step by step instruction. Installation by Valentin would take a few days

Deployment process 46 No.

Is the deployment process automatic?

Testing 47 No.

Does an automated daily test run exist?

48 Are there unit test? Yes, 2.500 lines of test code. 49 No.

Are there regression tests?

50 No.

Is any stress testing performed?

51

Are the test performed automatically?

No.

Related Documents