Road Traffic Analysis Final.docx

  • Uploaded by: syamprasad
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Road Traffic Analysis Final.docx as PDF for free.

More details

  • Words: 12,530
  • Pages: 71
1. INTRODUCTION 1.1 ABOUT PROJECT: ROAD traffic monitoring is of great importance for urban transportation system. Traffic control agencies and drivers could benefit from timely and accurate road traffic prediction and make prompt, or even advance decisions possible for detecting and avoiding road congestions. Existing methods mainly focus on raw speed sensing data collected from cameras or road sensors, and suffer severe data scarcity issue because the installation and maintenance of sensors are very expensive [56]. At the same time, most existing techniques based only on past and current traffic conditions (e.g [9], [54], [25], [38]) do not fit well when real-world factors such as traffic accidents play a part. To address the above issues, in this paper we introduce new-type traffic related data arising from public services: 1) Social media data, which is posted on social networking websites, e.g. Twitter and Facebook. With the popularization of mobile devices, people are more likely to exchange news and trifles in their life through social media services, where messages about traffic conditions, such as “Stuck in traffic on E 32nd St. Stay away!”, are posted by drivers, passengers and pedestrians who can be viewed as sensors observing the ongoing traffic conditions near their physical locations. Meanwhile, traffic authorities register public accounts and post tweets to inform the public of the traffic status Our goal is to predict the traffic speed of specific road links, as shown with the red question marks, given: 1) some speed observations collected by speed sensors, as shown in blue; 2) trajectory and travel time of OD pairs. Note that speeds of passed road links are either observed or to be predicted; 3) tweets describing traffic conditions. Note that the location mentioned by a tweet may be a street covering multiple road links. as “Slow traffic on I95 SB from Girard Ave to Vine St.” posted by local transportation bureau account. Such text messages describing traffic conditions and some of them tagged with location information are accessible by public and could be a complementary information source of raw speed sensing data. (OD) pair on a map, such services can recommend optimal route from the origin to the destination with least time, and trajectories can be collected once drivers use the service to navigate. Here a trajectory is a sequence of links for a given OD pair, and a link is a road segment between neighboring intersections. Correspondently, a trajectory travel time is an integration of link travel times, which are related to the realtime road traffic speeds. Longer trajectory travel time indicates that some involving road links may be congested with lower traffic speed. Trajectory data is useful for a wide range of transportation analyses and applications [49] [9]. Based on the above observations, where traditional traffic sensing data are limited while new-type data from social media and map service begin to spring up, our goal is 1

to predict the road-level traffic speed by incorporating newtype data with traditional speed sensing data. To motivate this scenario, consider a road traffic prediction example depicted in Fig.1. Those links in red question marks are not covered by traditional speed sensors, but may be passed by trajectories attached with travel time information, or mentioned in tweets describing traffic conditions, so their speeds can be inferred fusing multiple cross-domain data

1.2 DATA MINING: In today’s world large amount of data is generated and collected daily. Analyzing the data and finding out important part out of it is really difficult and is the most important need. There is a huge amount of data available in the Information Industry. This data is of no use until it is converted into useful information. It is necessary to analyze this huge amount of data and extract useful information from it. Data mining is an interdisciplinary subfield of computer science.It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Data mining is the natural evolution of information technology. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. Data mining is the analysis step of the "knowledge discovery in databases" process. Extraction of information is not the only process we need to perform; data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation, Data Mining, Pattern Evaluation and Data Presentation. Once all these processes are over, we would be able to

2

use this information in many applications such as Fraud Detection, Market Analysis, Production Control, Science Exploration, etc. 1.2.1 Data Preprocessing: In real-world, data tend to be incomplete, noisy and inconsistent. Such situation requires data preprocessing. Various forms of data preprocessing includes data cleaning, data integration, data transformation and data reduction. Typically, the process of duplicate detection is preceded by a data preparation stage, during which data entries are stored in an uniform manner in the database. Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing. Data goes through a series of steps during preprocessing: 

Data Cleaning: Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data or coarse data. After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores.



Data Integration: Data Integration is a data preprocessing technique that merges the data from multiple heterogeneous data sources into a coherent data store.Data integration may involve inconsistent data and therefore needs data cleaning.Data integration primarily supports the analytical processingof large data sets by aligning, combining and presenting each data set from organizational departments and external remote sources to fulfill integrator 3

objectives.Data integration is generally implemented in data warehouses (DW) through specialized software that hosts large data repositories from internal and external resources. Data is extracted, amalgamated and presented as a unified form. For example, a user’s complete data set may include extracted and combined data from marketing, sales and operations, which is combined to form a complete report. 

Data Transformation: Data transformation is the process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system. The usual process involves converting documents, but data conversions sometimes involve the conversion of a program from one computer language to another to enable the program to run on a different platform.



Data Reduction: Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected,ordered, and simplified form. The basic concept is the reduction of multitudinous amounts of data down to the meaningful parts.Data reduction techniques can be applied to obtain a reduced representation of the data set that is much smaller in volume, yet closely maintains the integrity of the original data.Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The basic concept is the reduction of multitudinous amounts of data down to the meaningful parts. That is, mining on the reduced data set should be more efficient yet produce the same (or almost the same) analytical results.

These are common techniques used in data reduction. 

Order by some aspect of size.



Table diagonalization, whereby rows and columns of tables are re-arranged to make patterns easier to see.

4



Round drastically to one, or at most two, effective digits (effective digits are ones that vary in that part of the data).



Use averages to provide a visual focus as well as a summary.



Use layout and labeling to guide the eye.



Give a brief verbal summary.

1.2.2 Data Mining Applications: Data mining is widely used in diverse areas. There are a number of commercial data mining system available today and yet there are many challenges in this field. Here is the list of areas where data mining is widely used -





Financial Data Analysis



Fraud detection



Retail Industry



Telecommunication Industry



Biological Data Analysis



Other Scientific Applications



Intrusion Detection

Financial Data Analysis: The financial data in banking and financial industry is generally reliable and of high quality which facilitates systematic data analysis and data mining. Some of the typical cases are as follows •

Design and construction of data warehouses for multidimensional data analysis and data mining.



Loan payment prediction and customer credit policy analysis.



Classification and clustering of customers for targeted marketing.



Detection of money laundering and other financial crimes. 5



Fraud Detection: Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In the call, duration of the call, time of the day or week, etc. It also analyzes the patterns that deviate from expected norms.



Retail Industry: Data Mining has its great application in Retail Industry because it collects large amount of data from on sales, customer purchasing history, goods transportation, consumption and services. It is natural that the quantity of data collected will continue to expand rapidly because of the increasing ease, availability and popularity of the web. Data mining in retail industry helps in identifying customer buying patterns and trends that lead to improved quality of customer service and good customer retention and satisfaction. Here is the list of examples of data mining in the retail industry -





Design and Construction of data warehouses based on the benefits of data mining.



Multidimensional analysis of sales, customers, products, time and region.



Analysis of effectiveness of sales campaigns.



Customer Retention.



Product recommendation and cross-referencing of items.

Telecommunication Industry: Today the telecommunication industry is one of the most emerging industries providing various services such as fax, pager, cellular phone, internet messenger, images,e-mail, web data transmission, etc. Due to the development of new computer and communication technologies, the telecommunication industry is rapidly expanding. This is the reason why data mining is become very important to help and understand the business. Data mining in telecommunication industry helps in identifying the telecommunication patterns, catch fraudulent activities, make better use of resource, and improve quality of service. Here is the list of examples for which data mining improves telecommunication services 6





Multidimensional Analysis of Telecommunication data.



Fraudulent pattern analysis.



Identification of unusual patterns.



Multidimensional association and sequential patterns analysis.



Mobile Telecommunication services.



Use of visualization tools in telecommunication data analysis.

Biological Data Analysis: In recent times, we have seen a tremendous growth in the field of biology such as genomics, proteomics, functional Genomics and biomedical research. Biological data mining is a very important part of Bioinformatics. Following are the aspects in which data mining contributes for biological data analysis •

Semantic integration of heterogeneous, distributed genomic and proteomic databases.



Alignment, indexing, similarity search and comparative analysis multiple nucleotide sequences.



Discovery of structural patterns and analysis of genetic networks and protein pathways.





Association and path analysis.



Visualization tools in genetic data analysis.

Other Scientific Applications: The applications discussed above tend to handle relatively small and homogeneous data sets for which the statistical techniques are appropriate. Huge amount of data have been collected from scientific domains such as geosciences, astronomy, etc. A large amount of data sets is being generated because of the fast numerical simulations in various fields 7

such as climate and ecosystem modeling, chemical engineering, fluid dynamics, etc. Following are the applications of data mining in the field of Scientific Applications -





Data Warehouses and data preprocessing.



Graph-based mining.



Visualization and domain specific knowledge.

Intrusion Detection: Intrusion refers to any kind of action that threatens integrity, confidentiality, or the availability of network resources. In this world of connectivity, security has become the major issue. With increased usage of internet and availability of the tools and tricks for intruding and attacking network prompted intrusion detection to become a critical component of network administration. Here is the list of areas in which data mining technology may be applied for intrusion detection•

Development of data mining algorithm for intrusion detection.



Association and correlation analysis, aggregation to help select and build discriminating attributes.



Analysis of Stream data.



Distributed data mining.



Visualization and query tools

2. REQUIREMENT ELICITATION

A Requirementis a feature that the system must have or a constraint that it must satisfy to be accepted by the clients. Requirements Engineering aims at defining the requirements of the system under construction. It includes two main activities namely Requirements Elicitationand Analysis.

8

Requirements Elicitation focuses on describing the purpose of the system. The Client, the Developer, and the Users identify a problem area and define a system that addresses the problem. Such a definition is called Requirements Specification. This specification is structured and formalized during analysis to produce an Analysis Model. Requirements Elicitation and Analysis focuses only on the user’s view of the system. Requirements Elicitation includes the following activities. Identifying Actors: During this activity, developers identify the different types of users the future system will support. Identifying Scenarios: During this activity, developers observe users and develop a set of detailed scenarios for typical functionality provided by the future system. Developers use these scenarios to communicate with the users and deepen their understanding. Identifying Use Cases: Once developers and users agree on a set of scenarios, developers derive from the scenarios a set of use cases that completely represent the future system. Refining Use Cases: During this activity, developers ensure that the requirements specification is complete by detailing each use case and describing the behavior of the system in the presence of errors and exceptional conditions.

Identifying relationships among use cases: During this activity, developers identify dependencies among use cases and also consolidate the use case model by factoring out common functionality. Identifying Non-functional requirements: 9

During this activity, developers, users and clients agree on aspects like performance of system, documentation, resources security and its quality.

2.1 Existing System

:

ROAD traffic monitoring is of great importance for urban transportation system. Traffic control agencies and drivers could benefit from timely and accurate road traffic prediction and make prompt, or even advance decisions possible for detecting and avoiding road congestions. Existing methods mainly focus on raw speed sensing data collected from cameras or road sensors, and suffer severe data sparsity issue because the installation and maintenance of sensors are very expensive [1]. At the same time, most existing techniques based only on past and current traffic conditions, do not fit well when real-world factors such as traffic accidents play a par

2.2 PROPOSED SYSTEM: Proposed System

:

To address the above issues, in this paper we introduce new-type traffic related data arising from public services: 1) Social media data, which is posted on social networking websites, e.g., Twitter and Facebook. With the popularization of mobile devices, people are more likely to exchange news and trifles in their life through social media services, where messages about traffic conditions, such as “Stuck in traffic on E 32nd St. Stay away!”, are posted by drivers, passengers and pedestrians who can be viewed as sensors observing the ongoing traffic conditions near their physical locations.

2.2.1 Project Scope: Road traffic monitoring is of great importance for urban transportation system. Traffic control agencies and drivers could benefit from timely and accurate road traffic prediction and make prompt, or even advance decisions possible for detecting and avoiding road congestions. Existing methods mainly focus on raw speed sensing data collected from cameras or road sensors, and suffer severe data scarcity issue because the installation and maintenance of sensors are very expensive . At the same time, most existing techniques based only on past and current traffic conditions do not fit well when real-world factors such as traffic accidents play a part. To address the above issues, in this paper we introduce newtype traffic related data arising from public services:

10

2.2.2 Project objectives: The system has a clear set of objectives to achieve. They are as follows: 

To make it easy to use.



To make it easy to extend.



To support a large variety of data sources, including nested data.



To provide several basic similarity measures.



To allow almost all algorithms to be implemented using the toolkit.

2.2.3 Project overview: The functional overview of the system is as follows: 

This project takes the input dataset from the user.



After taking the input, it runs the both algorithms to find the duplicates in the given dataset.



Finally, it displays the duplicateswhich are identified in the given dataset.

11

2.3 FUNCTIONAL REQUIREMENTS: The functional requirements describe the inputs and outputs of the application. The functional requirements of this project are as follows:

 Input:Data set from the user.  Output: Speed Prediction 2.3.1 Actors: Actors represent external entities that interact with the system. An actor can be human or an external system. During this activity, developers identify the Actors involved in this system are: In this project, user and his responsibilities are as follows Admin: 

Admin collects the files from the users.



Runs the algorithms on the files collected from the user. 12



Finds the duplicates and returns the duplicates in the file to the user.

User: 

Uploads the file in which the duplicates are to be detected.



Views the result.

2.3.2Use Case: Use cases are used during requirement elicitation and analysis to represent the functionality of the system. Use cases focus on the behavior of the system from an external point of view. A use case describes a function provided by the system that yields a visible result for an actor. An actor describes any entity that interacts with the system. The identification of actors and use cases results in the definition of the boundary of the system, which is, in differentiating the tasks accomplished by the system and the tasks accomplished by its environment. The actors are outside the boundary of the system, whereas the use cases are inside the boundary of the system. Actors are external entities that interact with the system. Use cases describe the behavior of the system as seen from an actor’s point of view. Actors initiate a use case to access the system functionality. The use case then initiates other use cases and gathers more information from the actors. When actors and use cases exchange information, they are said to be communicate. To describe a use case we use a template composed of six fields: Use Case Name

:

The name of the use case

Participating Actors:The actors participating in the particular use case Entry Condition:Condition for initiating the use case. Flow of events:

Sequence of steps describing the function of use case

Exit Condition:Condition for terminating the use case. Quality Requirements:Requirements that do not belong to the use case

13

Use case diagrams include four types of relations. They are as follows: Communication Relationships Inclusion Relationships Exclusion Relationships Inheritance relationships

USE CASE DIAGRAM:

Figure 2.1: Use case Diagram

USE CASE 1: Login Use Case Name: Participating Actors:

Login Admin, user

Flow of events:

Admin, user will get into the system

Entry Condition:

Admin, user should login with his userid & password provided to him Successfully logged in

Exit Condition:

Table 2.1: Use case table for login

14

USE CASE 2: Upload Dataset Use Case Name: Participating Actors:

Upload Dataset Admin, user

Flow of events:

Entry Condition:

User will upload the file which contains duplicates and admin will upload the identified duplicates. Data files are taken.

Exit Condition:

Successfully uploaded. Table 2.2: Use case table for Upload Dataset

USE CASE 3: Duplicate detection process Use Case Name: Participating Actors:

Duplicate detection Admin

Flow of events:

1. Client and admin get into the system with his respective credentials 2. Client upload his file 3. Admin Verifies the details and approve the respective transaction Uploaded datasetsare taken for applying algorithms to it. Duplicates successfully identified.

Entry Condition: Exit Condition:

Table 2.3: Use case table for Duplicate detection USE CASE 4: Download Dataset Use Case Name: Participating Actors:

Download Dataset Admin, user

Flow of events:

1. Client and admin get into the system with his respective credentials 2. Client upload his file 3. Admin Verifies the details and downloads the file uploaded by the user. 4. Admin finds the duplicates in fileand uploads it to the user. 5. The user downloads the file sent by the admin. Clients undergoes various authentication process

Entry Condition:

15

Successful downloaded.

Exit Condition:

Table 2.4: Use case table for Download Dataset

2.3.3. Scenarios: A use case is an abstraction that describes all possible scenarios involving the described functionality. A scenario is an instance of a use case describing a concrete set of action. Scenarios are used as examples for illustrating common cases. We describe a scenario using a template with three fields: name of the scenario, participating actors and flow of events, which describe the sequence of events step by step.

SCENARIO 1: Login Scenario Name:

Login

Participating Actors:

Admin, User

Flow of events:

Admin, user will get into the system Table 2.5: Scenario table for Login

SCENARIO 2: Upload Dataset Scenario Name: Participating Actors:

Upload Dataset Admin, User

User will upload the file which contains duplicates and admin will upload the identified duplicates. Table 2.6: Scenario table for Upload Dataset

Flow of events:

SCENARIO 3: Duplicate detection process Scenario Name: Participating Actors: Flow of events:

Duplicate detection Admin 1. User and admin get into the system with his respective credentials 2. User upload his file 3. Admin Verifies the details and approve the respective transaction Table 2.7: Scenario table for Duplicate detection process

16

SCENARIO 4:Download Dataset Scenario Name: Participating Actors: Flow of events:

Download Dataset Admin, User 1.User and admin get into the system with his respective credentials 2. User uploads his file 3. Admin Verifies the details and downloads the file uploaded by the user. 4. Admin finds the duplicates in file and uploads it to the user. 5. The user downloads the file sent by the admin. Table 2.8: Scenario table forDownload Dataset

2.4NON-FUNCTIONALREQUIREMENTS: 

Usability:User must have minimal knowledge on duplicate detection.



Reliability:The system is more reliable because the qualities of it are inherited from the platform java. The code built by using java is more reliable.



Supportability:The system is designed to be the cross platform supportable. The system is supported on a wide range of hardware and any software platform, which is having JVM, built into the



Performance:This system is developing in the high level languages and using the advanced front-end and back-end technologies it will give response to the end user on client system with in very less time.



Implementation:This project is implemented using JAVA.

2.4.1. User Interface and Characteristics: 1. User interface: 17

Here we have chosen JAVA as our programming language for the implementation of the system. The reason for choosing this language is: 

Java is platform-independent One of the most significant advantages of Java is its ability to move easily from one computer system to another. The ability to run the same program on many different systems is crucial to World Wide Web software, and Java succeeds at this by being platform-independent at both the source and binary levels.



Java is secure Java considers security as part of its design. The Java language, compiler, interpreter, and runtime environment were each developed with security in mind.

2. Error handling: Before performing any operations on the dataset, contents on the dataset must be checked. If any values that format or type is not matched, then Error Message is displayed so that user can take an appropriate decision. For example, without entering data the system doesn’t access. 3. Performance consideration: The performance of the system is very high when compared to current numerical model techniques. The training time is less when compared to the current system. 4. Platform: Windows XP & above operating systems. 5. Technology to be used: Programming LanguageJAVA is chosen as the programming language for the implementation of the system. 2.4.2. Hardware Requirements: Processor Hard Disk :

: Intel 40GB

RAM Capacity

:

512MB 18

Monitor

:

Standard

Keyboard

: Standard

Mouse : Any Standard Mouse 2.4.3. Software Requirements: Operating System : Windows XP and above Front end Back end

: JAVA : MySQL

2.4.3.1 About Java: Java is

a

general-purpose computer

programming

language that

is concurrent, class-

based, object-oriented, and specifically designed to have as few implementation dependencies as possible. It is intended to let application developers "write once, run anywhere" (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. Java applications are typically compiled to byte code that can run on any Java virtual machine (JVM) regardless of computer architecture.

Java is the foundation for virtually every type of networked application and is the global standard for developing and delivering embedded and mobile applications, games, Web-based content, and enterprise software. With more than 9 million developers worldwide, Java enables you to efficiently develop, deploy and use exciting applications and services. History of Java: Java was originally developed by James Gosling at Sun Microsystems (which has since been acquired by Oracle Corporation) and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++, but it has fewer low-level facilities than either of them. Java is a general purpose, high-level programming language developed by Sun Microsystems. A small team of engineers, known as the Green Team, initiated the language in 1991. Java was originally called OAK, and was designed for handheld devices and set-top boxes. Oak was 19

unsuccessful, so in 1995 Sun changed the name to Java and modified the language to take advantage of the burgeoning World Wide Web.

Java Features: Simple:   

Java is Easy to write and more readable and eye catching. Java has a concise, cohesive set of features that makes it easy to learn and use. Most of the concepts are drew from C++ thus making Java learning simpler.

Secure:   

Java program cannot harm other system thus making it secure. Java provides a secure means of creating Internet applications. Java provides secure way to access web applications.

Portable: 

Java programs can execute in any environment for which there is a Java run-time



system.(JVM) Java programs can be run on any platform (Linux,Window,Mac) Java programs can be transferred over world wide web (e.g. applets)



Object oriented:   

Java programming is object-oriented programming language. Like C++ java provides most of the object oriented features. Java is pure OOP. Language. (while C++ is semi object oriented)

Robust: 

Java encourages error-free programming by being strictly typed and performing run-time checks.

Multi-threaded: 

Java provides integrated support for multithreaded programming. 20

Architecture neutral:  

Java is not tied to a specific machine or operating system architecture. Machine Independent i.e. Java is independent of hardware.

Interpreted:  

Java supports cross-platform code through the use of Java byte code. Byte code can be interpreted on any platform by JVM.

High Performance:  

Byte codes are highly optimized. JVM can executed them much faster.

Distributed:  

Java was designed with the distributed environment. Java can be transmit,run over internet.

Dynamic: 

Java programs carry with them substantial amounts of run-time type information that is used to verify and resolve accesses to objects at run time.

Java Principles: There were five primary goals in the creation of the Java language: 

It must be "simple, object-oriented, and familiar".



It must be "robust and secure".



It must be "architecture-neutral and portable".



It must execute with "high performance".



It must be "interpreted, threaded, and dynamic".

21

Overview of OOP Terminology: 

Class: A user-defined prototype for an object that defines a set of attributes that characterize any object of the class. The attributes are data members (class variables and instance variables) and methods, accessed via dot notation.



Class variable: A variable that is shared by all instances of a class. Class variables are defined within a class but outside any of the class's methods. Class variables aren't used as frequently as instance variables are



Data member: A class variable or instance variable that holds data associated with a class and its objects.



Function overloading: The assignment of more than one behavior to a particular function. The operation performed varies by the types of objects (arguments) involved.



Instance variable: A variable that is defined inside a method and belongs only to the current instance of a class.



Inheritance: The transfer of the characteristics of a class to other classes that are derived from it.



Instantiation : The creation of an instance of a class



Method: A special kind of function that is defined in a class definition.



Object: A unique instance of a data structure that's defined by its class. An object comprises both data members (class variables and instance variables) and methods.



Operator overloading: The assignment of more than one function to a particular operator.



Instance: An individual object of a certain class. An object obj that belongs to a class Circle, for example, is an instance of the class Circle.

2.4.3.2. About MySQL: MySQL is a free, open-source database engine available for all major platforms. (Technically, MySQL is a relational database management system (RDBMS)). MySQL represents an excellent introduction to modern database technology, as well as being a reliable mainstream database resource for high-volume applications.

22

A modern database is an efficient way to organize, and gain access to, large amounts of data. A relational database is able to create relationships between individual database elements, to organize data at a higher level than a simple table of records, avoid data redundancy and enforce relationships that define how the database functions.

A database is a separate application that stores a collection of data. Each database has one or more distinct APIs for creating, accessing, managing, searching and replicating the data it holds. Other kinds of data stores can be used, such as files on the file system or large hash tables in memory but data fetching and writing would not be so fast and easy with those types of systems. So nowadays, we use relational database management systems (RDBMS) to store and manage huge volume of data. This is called relational database because all the data is stored into different tables and relations are established using primary keys or other keys known as foreign keys. A Relational Database Management System (RDBMS) is a software that: 

Enables you to implement a database with tables, columns and indexes.



Guarantees the Referential Integrity between rows of various tables.



Updates the indexes automatically.



Interprets an SQL query and combines information from various tables.

RDBMS Terminology: Before we proceed to explain MySQL database system, let's revise few definitions related to database. 

Database: A database is a collection of tables, with related data.



Table: A table is a matrix with data. A table in a database looks like a simple spreadsheet.

23



Column: One column (data element) contains data of one and the same kind, for example the column postcode.



Row: A row (= tuple, entry or record) is a group of related data, for example the data of one subscription.



Redundancy: Storing data twice, redundantly to make the system faster.



Primary Key: A primary key is unique. A key value cannot occur twice in one table. With a key, you can find at most one row.



Foreign Key: A foreign key is the linking pin between two tables.



Compound Key: A compound key (composite key) is a key that consists of multiple columns, because one column is not sufficiently unique.



Index: An index in a database resembles an index at the back of a book.



Referential Integrity: Referential Integrity makes sure that a foreign key value always points to an existing row.

MySQL is a fast, easy-to-use RDBMS being used for many small and big businesses. MySQL is developed, marketed, and supported by MySQL AB, which is a Swedish company. MySQL is becoming so popular because of many good reasons: 

MySQL is released under an open-source license. So you have nothing to pay to use it.



MySQL is a very powerful program in its own right. It handles a large subset of the functionality of the most expensive and powerful database packages.



MySQL uses a standard form of the well-known SQL data language.



MySQL works on many operating systems and with many languages including PHP, PERL, C, C++, JAVA, etc.



MySQL works very quickly and works well even with large data sets.



MySQL is very friendly to PHP, the most appreciated language for web development.

24



MySQL supports large databases, up to 50 million rows or more in a table. The default file size limit for a table is 4GB, but you can increase this (if your operating system can handle it) to a theoretical limit of 8 million terabytes (TB).



MySQL is customizable. The open-source GPL license allows programmers to modify the MySQL software to fit their own specific environments.

25

3. ANALYSIS In object-oriented analysis, developers build a model describing the application domain. The analysis model is then extended to describe how the actors and the system together with non-functional requirements, to prepare the architecture of the system developed during high-level design, which is correct, complete, consistent and verifiable. Analysis object model is represented by class and object diagrams. Analysis focuses on producing a model of the system, called the Analysis model, which is correct, complete, consistent, and verifiable. Analysis is different from requirements elicitation, where, developer focus on structuring and formalizing the requirements elicited from users. This formalization leads to new insights and discovery of errors in the requirements. As the analysis model may not be understandable to the users and the client, developers need to update the requirements specification to reflect insights gained during analysis, and then review the changes with the client and users. In the end, the requirements, however large, should be understandable by the client and the users. The analysis model is composed of three individual models: the Functional Model represented by use cases and scenarios, the Analysis Object Model, represented by class and object diagrams, and the Dynamic Model, represented by state chart and sequence diagrams. In Requirements phase, we gather requirements from the users and represent them as use cases and scenarios. We refine the functional model and derive the object and the dynamic model. This leads to a more precise and complete specification as details is added to the analysis model. We conclude by describing management activities related to analysis. The analysis model represents the system under development from the user’s point of view. The analysis object model is a part of the analysis and focuses on the individual concepts that are manipulated by the system, their properties and their relationships. The analysis object model, depicted with UML class diagrams, includes classes, attributes, and operations. The analysis object model is a visual dictionary of the main concepts visible to the user.

26

3.1. ENTITY OBJECTS: The Analysis object model consists of entity, boundary and control objects. Entity objects represent the persistent information tracked by the system. Participating objects form the basis of the analysis model. Entity objects for my system are 1. Training data set 2. Testing data set

3.2. BOUNDARY OBJECTS: Boundary object is the object used for interaction between the user and the system. Moreover it is an interface used to communicate with the system. Boundary objects represent the system interface with the actors. In each use case, each actor interacts with at least one boundary object. The boundary object collects the information from the actor and translates into an interface model from that can be used by the objects and also by the control objects. The set of Boundary Objects that are involved in the system are as follows: 

Boundary objects for Login Page.



Boundary objects for uploading the file.

27

I) BOUNDARY OBJECTS FOR HOME PAGE OR LOGIN PAGE

1I) BOUNDARY OBJECTS FOR UPLOADING THE FILE

3.3 CONTROL OBJECTS: Control objects are responsible for coordinating entity objects and boundary objects. A control object is creating at the beginning of the use cases and ceases to exist at its end. Control objects usually do not have a concrete counterpart in the real world. Control object is a responsible for collecting information from the boundary objects and dispatching it to entity object. Here the files are taken and are processed to analyze the performance of the student.

3.4. OBJECT INTERACTION: About Sequence Diagram: Interaction diagrams model the behavior of use cases by describing the way groups of objects interact to complete the task. The two kinds of interaction diagrams are sequence and collaboration diagrams. Sequence diagrams generally show the sequence of events that occur. Sequence diagrams demonstrate the behavior of objects in a use case by describing the objects and the messages they pass.

The diagrams are read left to right and

descending. Following are the Sequence Diagrams for the system under consideration

28

SEQUENCE DIAGRAM:

Figure 3.1: Sequence Diagram

3.5 OBJECT BEHAVIOUR: State chart diagrams are used to describe the behavior of a system. State diagrams describe all of the possible states of an object as events occur. Each diagram usually represents objects of a single class and tracks the different states of its objects through the system. Not all classes will require a state diagram and state diagrams are not useful for describing the collaboration of all objects in a use case. State diagrams have very few elements. The basic elements are rounded boxes representing the state of the object and arrows indicting the transition to the next state. The activity section of the state symbol depicts what activities the object will be doing while it is in that state. All state diagrams being with an initial state of the object. This is the state of the object when it is created. After the initial state the object begins changing states.

29

Use Case

Data Reading

PreProcessing

Stemming System

Training The Model

Prediction

Sequence Diagram

30

System

Dataset

Preprocessing

Stemming

Training

Prediction

2: Preprocessing 3: Stemming 4: Training 5: Prediction

System

Dataset 1:

31

Figure 3.2: State chart Diagram 1

STATECHART DIAGRAM (Admin):

32

Figure 3.3: State chart Diagram 2

4. SYSTEM DESIGN System Design is the transformation of an analysis model into a system design model. In System design, developers: - Define design goals of the project - Decompose the system into smaller sub systems - Design hardware/software strategies - Design persistent data management strategies - Design global control flow strategies - Design access control policies and - Design strategies for handling boundary conditions. System design is not algorithmic. It is decomposed of several activities. They are: - Identify Design Goals - Design the initial subsystem decomposition - Refine the subsystem decomposition to address the design goals. Design is the first step in the development phase for any techniques and principles for the purpose of defining a device, a process or system in sufficient detail to permit its physical realization. Once the software requirements have been realized, analyzed and specified the software design involves three technical activities design, coding, generation and testing that are required to build and verify the software.

33

The design activities are of main importance in this phase, because in this activity, decisions ultimately affecting the success of software implementation and its ease of maintenance are made. These decisions have the final bearing upon reliability and maintainability of the system. Design is the place where quality is fostered in development. Software design is a process through which requirements are translated into a representation of software. System Design is the transform of analysis model into a system design model. Developers define the design goals of the project and decompose the system into smaller subsystems that can be realized by individual teams. Developers also select strategies for building the system, such as the hardware/software platform on which the system will run, the persistent data management strategy, the goal control flow the access control policy and the handling of boundary conditions. The result of the system design is model that includes a clear description of each of these strategies, subsystem decomposition, and a UML deployment diagram representing the hardware/software mapping of the system. The Analysis model describes the system completely form the actors point of view and serves as the basis of communication between the client and the developers. The Analysis model, however, does not contain information about the internal structure of the system, its hardware configuration or more generally, how the system should be realized. System design is the first step in this direction. During the system design activities, DevelopersBridge the gap between the requirements specification, produced during requirements elicitation and analysis, and the system that is delivered to the users.

4.1 DESIGN GOALS: Design goals are the qualities that the system should focus on. Many design goals can be inferred from the nonfunctional requirements or from the application domain. Cost: JAVA is freeware. Hence no high development and maintenance costs. Response time: The system response is based on the length of the training data set. 34

Portability: Java has an ability to move easily from one computer system to another. The ability to run the same program on many different systems is crucial to World Wide Web software, and Java succeeds at this by being platform-independent at both the source and binary levels. Usability: Users capable of handling simple GUI are able to use the system. Reliability: The system is trained with ID3 & C4.5 algorithm so that it can give us accurate results.

4.2. SYSTEM ARCHITECTURE: As the complexity of systems increases, the specification of the system decomposition is critical. Moreover, subsystem decomposition is constantly revised whenever new issues are addressed. Subsystems are merged into alone subsystem, a complex subsystem is split into parts, and some subsystems are added to take care of new functionality. The first iterations over the subsystem decomposition can introduce drastic changes in the system design model Popular System Architectures are as follows: - Repository - Mode/View/Controller (MVC) - Document/View /Controller (DVC) - Peer-to-Peer - Client/Server - Three-tier - Four-tier & Pipe and Filter

4.3. SUBSYSTEM DECOMPOSITION: A subsystem is characterized by the services it provides to other subsystem. A service is a set of related operations that share a common purpose. The set of operations of a sub 35

system that are available to other subsystems form the “subsystem interface”. Subsystem Interface includes the name of the operations, their parameters, their types and their return values. The subsystems that are factored out of the main system are as follows:  Apply algorithm: The subsystem that is responsible for applied algorithm  Output: The subsystem that is responsible for give the output

: Host

: Host Give Output

Apply Algorithm

Figure 4.1:Subsystem decomposition

4.4. GLOBAL CONTROL FLOW: Control flow is the sequencing of actions in a system. It defines the order of execution of operations. These decisions are based on external events generated by an actor or on the passage of time. These are two possible control flow mechanisms. (a) Procedure Driven Control: Operations wait for input whenever they need data from an actor. In the selection of algorithm, processing operation waits for decision maker to choose data source. Whenever decision maker gives the input preprocessing operation can be executed. (b) Event Driven Control: A main loop waits for an external event. Whenever an event available, it is dispatched to appropriate object based on information associated with the event.

4.5. BOUNDARY CONDITIONS: 36

During this activity we review the design decisions we made so far and identify additional conditions i.e., how the system is started, initialized, shutdown and how to deal with major failures such as data corruption. Configure: For each persistent object, we examine in which use cases it is created or destroyed. Download use case creates the persistent object Download files. Start up and Shutdown:For each component we add three use cases to start, shutdown and configure the component. Exception Handling: For each type of component failure we decide how the system should react. In general exception is an event or error that occurs during the execution of the system. Exceptions are caused by different sources. Hardware Failure: Hardware ages and fails. For example, failure of the network link, system can identify it by using connect use case and inform the user.

37

5. OBJECT DESIGN Object design closed the gap between the application objects and off-time-shelf components by identifying additional solution. Object design is not sequential. Although each group of activities described above addresses a specific object design issue, they usually occur concurrently. Object design includes four groups of activities: 

Reuse: Off-the shelf components identified during system design are used to help in the realization of each subsystem. Class libraries and additional components are selected for basic data structures and services. Design patterns are selected for solving common problems and for protecting specific classes from future change.



Interface specification: During this activity, the subsystem services identified during system design are specified in terms of class interfaces, including operations, arguments, type signatures, and exceptions.



Object model Restructuring: Restructuring activities manipulate the system model to increase code



Object model optimization: During this activity, object design model is transformed to address performance criteria such as response time or memory utilization. Object design closed the gap between the application objects and off-time-shelf

components by identifying additional solution objects and refining existing objects. Operation parameters and return values are typed in the same way as attributes are. The type constraints the range of values the parameter or the return value can take. The type of the return value is called the signature of the operation. The visibility of an attribute or an operation specifies whether other classes can use it or not. UML defines three levels of visibility.

5.1. OBJECT SPECIFICATION: 38

The system design model focuses on the subsystem decomposition and global system decision such as hardware/software mapping, persistent storage or access control. We identify to level subsystems and define them in terms of the services they provide. Specification activities during object design includes a.

Identifying missing attributes and operations

b.

Specifying type, signature and visibility.

c.

Specifying Constraints.

d.

Specifying Exceptions

Attributes and Operations: a.

Attributes: In this, we can identify attributes of the operation.

b.

Operation: Operation is a property of the objects.

Attributes: Attributes represent the properties of individual objects, only the attributes relevant to the system should be considered.

CLASS DIAGRAM:

Figure 5.1: Class Diagram

About Class Diagram: Class diagrams model class structure and contents using design elements such as classes, packages and objects. Class diagrams describe three different perspectives when designing a 39

system, conceptual, specification, and implementation. Classes are composed of three things: a name, attributes, and operations. Class diagrams also display relationships such as containment, inheritance, associations and others. The association relationship is the most common relationship in a class diagram. The association shows the relationship between instances of classes. The multiplicity of the association denotes the number of objects that can participate in then relationship. 5.1.1. Type, Signature and Visibility: Type:The Type of an attribute specifies the range of values the attribute can take and the operations that can be applied to the attributes. Signature:Given an operation, the tuple made out of the types of its parameters and the type of the return value is the Signature. Signatures are generally defined for operations. Visibility:The Visibility of an attribute or an operation is a mechanism for specifying whether other classes can use the attribute or the operation or not. In general there are three levels of visibility. Private:A private attribute can only be accessible with in which it is defined. Protected: A protected attribute or operation can be accessed by the class in which it is defied and on any descendant of the class. Public:A public attribute or oration can be accessed by any class. Private Attribute or Operation

(indicated by ‘-‘)

Protected Attribute or Operation (indicated by ‘#‘) Public Attribute or Operation

(indicated by ‘+‘)

Type, Signature and Visibility of the classes in this system are as follows: 5.1.2 Constraints: We attach constraints to classes and operations to more precisely specify their behavior and boundary cases. The following are the constraints in this project: The input file must be .txt, .doc, .docx, .xls, .xlsx, .csv file formats. 40

5.1.3 Exceptions: Exceptional conditions are usually associated with the violation of preconditions. Exceptions can be found systematically by examining each parameter of the operation. Constraints that the caller needs to satisfy before invoking an operation. When the user inputs a file which doesn’t consists of any data then certain messages should be displayed. 5.1.4 Associations: Associations are relationships between classes and represent groups of links. Each end of an association can be labeled by a set of integers indicating the number of links that can legitimately originate form an instance of the class connected to the association end. Associations are used to represent a wide range of connections among a set of objects.

1

1 Input data

User

Figure 5.2: Association

5.2. ALGORITHMS: Algorithm 2:

6. CODING The goal of the coding or programming phase is to translate the design of the system produced during the design phase into code in a given programming language, which can be executed by a computer and that performs the computation specified by the design. The coding phase affects both testing and maintenance. The goal of coding is not to reduce the implementation cost but the goal should be to reduce the cost of later phases. In other words the

41

goal is not to simplify the job of programmer. Rather the goal should be to simplify the job of the tester and maintainer.

6.1. CODING APPROACH: There are two major approaches for coding any software system. They are Top-Down approach and bottom up approach. Bottom-up Approach can best suit for developing the object-oriented systems. During system design phase of reduce the complexity, we decompose the system into an appropriate number of subsystems, for which objects can be modeled independently. These objects exhibit the way the subsystems perform their operations. Once objects have been modeled they are implemented by means of coding. Even though related to the same system as the objects are implemented of each other the Bottom-Up approach is more suitable for coding these objects. In this approach, we first do the coding of objects independently and then we integrate these modules into one system to which they belong. This code can detect duplicates from data. First, it takes data as input, then it scans the entire file and performs the preprocessing step. Through this step the missing values, errors, etc. get eliminated. Then it generates a unique key for each tuple and sorts the data using the key. And then it identifies the duplicates and displays the result.

6.2. INFORMATION HANDLING: Any software system require some amount of information during its operation selection of appropriate data structures can help us to produce the code so that objects of the system can better operate with the available information decreased complexity. In this project, if any of the field is vacant, then it could not proceed further steps and prompts a message saying that “input data must be in specified range “.System will not have any default values. 42

6.3. PROGRAMMING STYLE: Programming style deals with act of rules that a programmer has to follow so that the characteristics of coding such as Traceability, Understandability, Modifiability, and Extensibility can be satisfied. In the current system, we followed the coding rules for naming the variables and methods. As part of coding internal documentation is also provided that help the readers to better understand the code.

6.4. VERIFICATION AND VALIDATION: Verification is the process of checking the product built is right. Validation is the process of checking whether the right product is built. During the Development of the system, Coding for the object has been thoroughly verified from different aspects regarding their design, in the way they are integrated and etc. The various techniques that have been followed for validation discussed in testing the current system. Validations applied to the entire system at two levels: 

Form level Validation: Validations of all the inputs given to the system at various points in the

forms are

validated while navigating to the next form. System raises appropriate custom and predefined exceptions to alert the user about the errors occurred or likely to occur.



Field level Validation: Validations at the level of individual controls are also applied wherever

necessary.

System pops up appropriate and sensuous dialogs wherever necessary. In this project, validations are performed on each individual control. In normalizing phase, if any one of text field is not filled or any wrong click occurs then system will generate appropriate exceptions.

43

7. TESTING

Testing is the process of finding differences between the expected behavior specified by system models and the observed behavior of the system. Testing is a critical role in quality assurance and ensuring the reliability of development and these errors will be reflected in the code so the application should be thoroughly tested and validated. Unit testing finds the differences between the object design model and its corresponding components. Structural testing finds differences between the system design model and a subset of integrated subsystems. Functional testing finds differences between the use case model and the system. Finally performance testing, finds differences between non-functional requirements and actual system performance. From modeling point of view, testing is the attempt of falsification of the system with respect to the system models. The goal of testing is to design tests that exercise defects in the system and to reveal problems.

7.1. TESTING ACTIVITIES: Testing a large system is a complex activity and like any complex activity. It has to be broke into smaller activities. Thus incremental testing was performed on the project i.e., components and subsystems of the system were tested separately before integrating them to form the subsystem for system testing.

7.2. TESTING TYPES: Unit Testing: Unit testing focuses on the building blocks of the software system that is the objects and subsystems. There are three motivations behind focusing on components. First unit testing reduces the complexity of overall test activities allowing focus on smaller units of the system, second unit testing makes it easier to pinpoint and correct faults given that few components are involved in the rest. Third unit testing allows parallelism in the testing activities, that is each component are involved in the test. Third unit testing allows parallelism in the testing activities 44

that is each component can be tested independently of one another. The following are some unit testing techniques. 1. Equivalence testing: It is a black box testing technique that minimizes the number of test cases. The possible inputs are partitioned into equivalence classes and a test case is selected for each class. 2. Boundary testing: It is a special case of equivalence testing and focuses on the conditions at the boundary of the equivalence classes. Boundary testing requires that the elements be selected from the edges of the equivalence classes. 3. Path testing: It is a white box testing technique that identifies faults in the implementation of the component the assumption here is that exercising all possible paths through the code at least once. Most faults will trigger failure. This acquires knowledge of source code.

Integration Testing: Integration testing defects faults that have not been detected. During unit testing by focusing on small groups on components two or more components are integrated and tested and once tests do not reveal any new faults, additional components are added to the group. This procedure allows testing of increasing more complex parts on the system while keeping the location of potential faults relatively small. I have used the following approach to implements and integrated testing. Top-down testing strategy unit tests the components of the top layer and then integrated the components of the next layer down. When all components of the new layer have been tested together, the next layer is selected. This was repeated until all layers are combined and involved in the test. Validation Testing: The systems completely assembled as package, the interfacing have been uncovered and corrected, and a final series of software tests are validation testing. The validation testing is nothing but validation success when system functions in a manner that can be reasonably expected by the customer. The system validation had done by series of Black-box test methods.

45

System Testing: 1. System testing ensures that the complete system compiles with the functional requirementsand non-functional requirements of the system, the following are some system testing activities. 2. Functional testing finds differences between the functional between the functional requirements and the system. This is a black box testing technique. Test cases are divided from the use case model. 3. Performance testing finds differences between the design and the system the design goals are derived from the functional requirements. 4. Pilot testing the system is installed and used by a selected set of users – users exercise the system as if it had been permanently installed. 5. Acceptance testing, I have followed benchmarks testing in a benchmarks testing the client prepares a set of test cases represent typical conditions under which the system operates. In our project, there are no existing benchmarks. 6. Installation testing, the system is installed in the target environment.

7.3. TESTING PLAN: Testing accounts for 45 - 75% of the typical project effort. It is also one of the most commonly underestimated activities on a project. A test plan is a document that answers the basic questions about your testing effort. It needs to be initiated during the requirements gathering phase of your project and should evolve into a roadmap for the testing phase. 

Test Planning enables a more reliable estimate of the testing effort up front.



It allows the project team time to consider ways to reduce the testing effort without being under time pressure.



Test Plan helps to identify problem areas and focuses the testing team’s attention on the critical paths.



Test plan reduces the probability of implementing non-tested components.

46

7.4. TEST CASE REPORT:  Test Case1-Login   

Test Case id: 01 Test Case Name:Login Test Case Type: Black Box Testing

Description

Expected Value

Observed Value

Adminselects ‘login’ Admin get the access

If the login id and

button

password are correct

Result

Successful

then the admin will get access

If

Admin

enters Admin does not get Admin does not get

wrong password

the access

Successful

the access and has to enter

the

correct

password

Table 7.1: Test case 1

47

 Test Case2-File Browse   

Test Case id: 02 Test Case Name: Browse the file Test Case Type: Black Box Testing

Description

Expected Value

Observed Value

Userselects

User uploads the file If the file is of valid file

‘Upload’ button

from the folder

Result

Successful

format, file will be taken.

If user selects User selects ppt or Only .csv, .txt, .doc, .xlsx,

Successful

Ta ble 7.2: Tes t cas e2

file of not valid pdf files, system will .xls files can be taken from file format.

not take.

user.

8.SCREENS 9. SOURCE CODE Sample Code /* * To change this license header, choose License Headers in Project Properties.

48

* To change this template file, choose Tools | Templates * and open the template in the editor. */ package com.util;

import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import weka.classifiers.Classifier; import weka.classifiers.lazy.IBk; import weka.core.Instance; import weka.core.Instances; public class KNN { public static BufferedReader readDataFile(String filename) { BufferedReader inputReader = null; try { inputReader = new BufferedReader(new FileReader(filename)); } catch (FileNotFoundException ex) { System.err.println("File not found: " + filename); } return inputReader; } public static void main(String[] args) throws Exception { BufferedReader datafile = readDataFile("ads.txt"); Instances data = new Instances(datafile); data.setClassIndex(data.numAttributes() - 1); //do not use first and second Instance first = data.instance(0); Instance second = data.instance(1); data.delete(0); 49

data.delete(1); Classifier ibk = new IBk(); ibk.buildClassifier(data); double class1 = ibk.classifyInstance(first); double class2 = ibk.classifyInstance(second); System.out.println("first: " + class1 + "\nsecond: " + class2); } }

/* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package com.util;

import java.util.Arrays; import java.util.Random;

public class np {

private static Random random; private static long seed;

static { 50

seed = System.currentTimeMillis(); random = new Random(seed); }

/** * Sets the seed of the pseudo-random number generator. This method enables * you to produce the same sequence of "random" number for each execution of * the program. Ordinarily, you should call this method at most once per * program. * * @param s the seed */ public static void setSeed(long s) { seed = s; random = new Random(seed); }

/** * Returns the seed of the pseudo-random number generator. * * @return the seed */ public static long getSeed() { return seed; }

/** * Returns a random real number uniformly in [0, 1). * * @return a random real number uniformly in [0, 1) 51

*/ public static double uniform() { return random.nextDouble(); }

/** * Returns a random integer uniformly in [0, n). * * @param n number of possible integers * @return a random integer uniformly between 0 (inclusive) and {@code n} * (exclusive) * @throws IllegalArgumentException if {@code n <= 0} */ public static int uniform(int n) { if (n <= 0) { throw new IllegalArgumentException("argument must be positive: " + n); } return random.nextInt(n); }

/** * Returns a random long integer uniformly in [0, n). * * @param n number of possible {@code long} integers * @return a random long integer uniformly between 0 (inclusive) and * {@code n} (exclusive) * @throws IllegalArgumentException if {@code n <= 0} */ public static long uniform(long n) { if (n <= 0L) { 52

throw new IllegalArgumentException("argument must be positive: " + n); }

long r = random.nextLong(); long m = n - 1;

// power of two if ((n & m) == 0L) { return r & m; }

// reject over-represented candidates long u = r >>> 1; while (u + m - (r = u % n) < 0L) { u = random.nextLong() >>> 1; } return r; }

/** * Returns a random integer uniformly in [a, b). * * @param a the left endpoint * @param b the right endpoint * @return a random integer uniformly in [a, b) * @throws IllegalArgumentException if {@code b <= a} * @throws IllegalArgumentException if {@code b - a >= Integer.MAX_VALUE} */ public static int uniform(int a, int b) { if ((b <= a) || ((long) b - a >= Integer.MAX_VALUE)) { 53

throw new IllegalArgumentException("invalid range: [" + a + ", " + b + ")"); } return a + uniform(b - a); }

/** * Returns a random real number uniformly in [a, b). * * @param a the left endpoint * @param b the right endpoint * @return a random real number uniformly in [a, b) * @throws IllegalArgumentException unless {@code a < b} */ public static double uniform(double a, double b) { if (!(a < b)) { throw new IllegalArgumentException("invalid range: [" + a + ", " + b + ")"); } return a + uniform() * (b - a); }

/** * @param m * @param n * @return random m-by-n matrix with values between 0 and 1 */ public static double[][] random(int m, int n) { double[][] a = new double[m][n]; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { a[i][j] = uniform(0.0, 1.0); 54

} } return a; }

/** * Transpose of a matrix * * @param a matrix * @return b = A^T */ public static double[][] T(double[][] a) { int m = a.length; int n = a[0].length; double[][] b = new double[n][m]; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { b[j][i] = a[i][j]; } } return b; }

/** * @param a matrix * @param b matrix * @return c = a + b */ public static double[][] add(double[][] a, double[][] b) { int m = a.length; 55

int n = a[0].length; double[][] c = new double[m][n]; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { c[i][j] = a[i][j] + b[i][j]; } } return c; }

/** * @param a matrix * @param b matrix * @return c = a - b */ public static double[][] subtract(double[][] a, double[][] b) { int m = a.length; int n = a[0].length; double[][] c = new double[m][n]; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { c[i][j] = a[i][j] - b[i][j]; } } return c; }

/** * Element wise subtraction * 56

* @param a scaler * @param b matrix * @return c = a - b */ public static double[][] subtract(double a, double[][] b) { int m = b.length; int n = b[0].length; double[][] c = new double[m][n]; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { c[i][j] = a - b[i][j]; } } return c; }

/** * @param a matrix * @param b matrix * @return c = a * b */ public static double[][] dot(double[][] a, double[][] b) { int m1 = a.length; int n1 = a[0].length; int m2 = b.length; int n2 = b[0].length; if (n1 != m2) { throw new RuntimeException("Illegal matrix dimensions."); } double[][] c = new double[m1][n2]; 57

for (int i = 0; i < m1; i++) { for (int j = 0; j < n2; j++) { for (int k = 0; k < n1; k++) { c[i][j] += a[i][k] * b[k][j]; } } } return c; }

/** * Element wise multiplication * * @param a matrix * @param x matrix * @return y = a * x */ public static double[][] multiply(double[][] x, double[][] a) { int m = a.length; int n = a[0].length;

if (x.length != m || x[0].length != n) { throw new RuntimeException("Illegal matrix dimensions."); } double[][] y = new double[m][n]; for (int j = 0; j < m; j++) { for (int i = 0; i < n; i++) { y[j][i] = a[j][i] * x[j][i]; } } 58

return y; }

/** * Element wise multiplication * * @param a matrix * @param x scaler * @return y = a * x */ public static double[][] multiply(double x, double[][] a) { int m = a.length; int n = a[0].length;

double[][] y = new double[m][n]; for (int j = 0; j < m; j++) { for (int i = 0; i < n; i++) { y[j][i] = a[j][i] * x; } } return y; }

/** * Element wise power * * @param x matrix * @param a scaler * @return y */ 59

public static double[][] power(double[][] x, int a) { int m = x.length; int n = x[0].length;

double[][] y = new double[m][n]; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { y[i][j] = Math.pow(x[i][j], a); } } return y; }

/** * @param a matrix * @return shape of matrix a */ public static String shape(double[][] a) { int m = a.length; int n = a[0].length; String Vshape = "(" + m + "," + n + ")"; return Vshape; }

/** * @param a matrix * @return sigmoid of matrix a */ public static double[][] sigmoid(double[][] a) { int m = a.length; 60

int n = a[0].length; double[][] z = new double[m][n];

for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { z[i][j] = (1.0 / (1 + Math.exp(-a[i][j]))); } } return z; }

/** * Element wise division * * @param a scaler * @param x matrix * @return x / a */ public static double[][] divide(double[][] x, int a) { int m = x.length; int n = x[0].length;

double[][] z = new double[m][n];

for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { z[i][j] = (x[i][j] / a); } } return z; 61

} /** * Element wise division * * @param A matrix * @param Y matrix * @param batch_size scaler * @return loss */ public static double cross_entropy(int batch_size, double[][] Y, double[][] A) { int m = A.length; int n = A[0].length; double[][] z = new double[m][n];

for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { z[i][j] = (Y[i][j] * Math.log(A[i][j])) + ((1 - Y[i][j]) * Math.log(1 - A[i][j])); } }

double sum = 0; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { sum += z[i][j]; } } return -sum / batch_size; } public static double[][] softmax(double[][] z) { double[][] zout = new double[z.length][z[0].length]; 62

double sum = 0.; for (int i = 0; i < z.length; i++) { for (int j = 0; j < z[0].length; j++) { sum += Math.exp(z[i][j]); } } for (int i = 0; i < z.length; i++) { for (int j = 0; j < z[0].length; j++) { zout[i][j] = Math.exp(z[i][j]) / sum; } } return zout; }

public static void print(String val) { System.out.println(val); } }







<meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> 63

<meta name="description" content="company is a free job board template"> <meta name="author" content="">

<meta name="viewport" content="width=device-width, initial-scale=1">



rel='stylesheet'



<script src="js/vendor/modernizr-2.6.2.min.js">

Road Traffic Speed Prediction: A Probabilistic Model Fusing Multi-Source Data

64



















<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"> <script>window.jQuery 1.10.2.min.js"><\/script>')

||

document.write('<script

src="js/vendor/jquery-

<script src="js/bootstrap.min.js"> <script src="js/owl.carousel.min.js"> <script src="js/wow.js"> <script src="js/main.js">
67



Data User Login......












68





User Name:
Password:






69





Conclusion

:

This project proposes a novel probabilistic framework to predict road traffic speed with multiple crossdomain data. Existing works are mainly based on speed sensing data, which suffers data spar sity and low coverage. In our work, wehandle the challenges arising from fusing multi-source data,including location uncertainty, language ambiguity and data heterogeneity, using Location Disaggregation Model, TrafficTopic model and Traffic Speed Gaussian Process Model. Experiments on real data demonstrate the effectivenessand efficiency of our model. For Future work, we plan toimplement kernel-based and distributive GP, so the trafficprediction framework can be applied into a real-time largetraffic network

References :

[1]B. Abdulhai, H. Porwal, and W. Recker. Short-term traffic flowprediction using neuro-genetic algorithms.ITS JournalIntelligentTransportation Systems Journal, 7(1):3–41, 2002.

[2]R. Alfelor, H. S. Mahmassani, and J. Dong. Incorporating weatherimpacts in traffic estimation and prediction systems.Technicalreport, US Department of Transportation, 2009.

[3]M. T. Asif, N. Mitrovic, L. Garg, and J. Dauwels.Low-dimensionalmodels for missing data imputation in road networks. 32(3):3527– 3531, 2013. 70

[4]C. M. Bishop.Pattern Recognition and Machine Learning (InformationScience and Statistics). SpringerVerlag New York, Inc., 2006. [5]D. M. Blei and J. D. Lafferty.Correlated topic models.InNIPS,pages 147–154, 2005.

[6]D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichletallocation.Journal of Machine Learning Research3:993–1022, 2003.

[7]J. Chen, K. H. Low, Y. Yao, and P. Jaillet.Gaussianprocessdecentralized data fusion and active sensing for spatiotemporaltraffic modeling and prediction in mobility-on-demand systems.IEEE Transactions on Automation Science and Engineering, 12(3):1–21, 2015.

[8]P.-T. Chen, F. Chen, and Z. Qian.Road traffic congestion monitoring in social media with hinge-loss markov random fields.InICDM, pages 80–89. IEEE, 2014.

[9]S. Clark.Traffic prediction using multivariate nonparametricregression.Journal of transportation engineering, pages 161–168,2003.

71

Related Documents


More Documents from ""