2review 25 Oct

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View 2review 25 Oct as PDF for free.

More details

  • Words: 5,236
  • Pages: 24
WEB-BASED DATA MINING IN ACADEMIC WEBSITES

WEB-BASED DATA MINING IN ACADEMIC WEBSITES Guide: Mr. D. George Washington

Name: Prasanna Kumar Palepu Reg No: 200536314

Abstract: Proposed system is engaged in a discussion over applications of Web mining to help in discovering pedagogically relevant knowledge contained in databases obtained from Web-based educational systems. These findings can be used both to help effective utilization of resources and minimization of webtraffic, intruders. Analysis and reasoning of the mass of information in education website are made by the technology of Web mining, which can dig out potential modes reduce the risk and make right decisions. The Intended goal is:  To mine the web log and find drawbacks in web sites  To build an interface to analyze the web log.

Previous Status of The Project: Worked on filtering the log file and keeping them in a database and updating it day-by-day web log data.

Present Status of The Project:       

Designed database structure for log file. Collected IP to country database Collected GMT to country database Collected USER_Agent database Created User Interface design with UML diagrams Created reports format and table structures Generated a code for Parsing the Log file. Trying to eliminate bugs in it.

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 1

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Architecture and Design: Introduction: This following page describes the system design in terms of packages, classes, relationships, and behavior. Several attached worksheets address specific aspects of the overall system design, such as user interface and database design. The most important facts of Design:This design is intended for helping in creating a rich interface for web administrators to analyze the web log data and find anomalies in websites.

UML Structural Design The system's structural design is described in the following UML model: WebLogModelStructure The system's structural design is described in the following UML structural diagrams: * PACKAGE WeblogModelStructure OVERVIEW DIAGRAM * WebLogModel o AddLog Diagram o ParseLog Diagram o ExportLog Diagram

UML Behavioral Design The system's behavioral design is described in the following UML model: WebLogModelBehavioral. The system's design is described in the following UML diagrams:  Referrer Statistics Class Diagram  Access Statistics Class Diagram  User Agent Statistics Class Diagram  OuterView Of Project  UML Activity Diagram

UML Design Checklist Correctness: The generated Design is correct in its fullest and any modifications in it will not lead to drastic change in entire system. Feasibility: As per the Gantt chart the amount of time spend on design is accurate and it is feasible. Understandability: Since I am using Describe UML tool which is user-friendly and easily understandable.

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 2

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Implementation phase guidance: The designed modules are easily implemented. Modularity: There is no particular software for parsing Web Log data and it is unique. And this design comprises of all modules separated distinctly. Extensibility: It is very easy to add new code to intended system as it is written in VB.NET, which is user friendly. Testability: It is very easy to test the system by Testing tools. Manual testing is also done for verification and validation on each module individually and also on whole. Efficiency: The system consumes an acceptable amount of time, storage space, bandwidth, and other resources.

Architecture Overview Software architecture style is being used: Single web service: app-server, database. What are the ranked goals of this architecture? 1. Ease of integration 2. Extensibility 3. Capacity matching

Components The components of this system:The components of this system are listed below by type: * Presentation/UI Components o C-00: WeblogUI * Application Logic Components o C-10: WebLogLogic * Data Storage Components o C-20: WebLogStorage

Deployment The Components are deployed as follows:* All-in-one server o WebLogFront End + C-00: WebLogUI + C-10: WebLogLogic o Database process + C-20: WebLogStorage Aspects/resources of their environment are shared as follows: Everything is on one oracle server so all machine resources are shared by all components. Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 3

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

The database will be updated constantly using export function. The database could be moved to a different machine with a fairly simple change to a configuration file. Otherwise, nothing can be changed about the deployment. We have the ability to move the database process to a separate machine. We have the ability to add more front-end servers. The application logic running on the application server cannot be split or load-balanced.

Integration The components are integrated and they communicate:All of our code uses direct procedure calls. The database is accessed through a driver. Components within the same process use direct procedure call. Communication with the database uses a ODBC driver. Communication between the front end-and back-end servers uses ODBC.

Architectural Scenarios The following sequence diagrams give step-by-step descriptions components communicate during some important usage scenarios: * * * *

of how

System startup System shutdown ParsingLog ExportingLog

Architecture Checklist Ease of integration: It uses the mechanisms been provided for all needed types of integration and all of the new components are designed to work together. And, the reused components are integrated via fairly simple interfaces.

Source Code Organization and Build System Overview It roughly follows documentation.

the

standard

proposed

in

the

Visual

Studio

.NET

Ranked goals of this source code organization and build system:1. Separation of files by type 2. Separation of version-controlled files from files generated by the build process 3. Compatibility with standard build processes

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 4

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Key Directories and Files in Working Copies Path

Description

Logs/

Web Log File Directory For Parsing

Src/

VB.Net Source Files

src/Model/

VB.Net Model Form Source File

src/Report/

VB.Net Report Source File

src/VBNET/[Nested packages]/

VB.Net source code of classes in each package

src/VBNET/[Nested packages]/test/

VB.Net source code of unit tests for classes in each package

conf/

Configuration files,

data/

Initial data to load into database and/or file system

lib/

Libraries reused by this project

build/

Output of build process

help/

Project documents

Build Targets Target

Description

compile Compiles VB.NET source code and creates and creates an Executable file. Load

Loads the intended Log file into Application

Parse

This is the main target of the application, the log file has to be parsed and stored in a temporary space.

Export

It will export the parsed data to database and remove the temporary space used by it at the time of parsing.

Analyze Analyze the exported data from database.

Build Configuration Options Property

Description

WebLogAnalysis

This is the tool going to be created for exporting the raw web log to database for analysis.

1.0

Version number of this release.

User Interface Overview The ranked goals for the user interface of this system:

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 5

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

1. 2. 3. 4.

Understandability and learn ability Task support and efficiency Safety Consistency and familiarity

This UI design follows Microsoft UI guidelines.

Task Models Only Web administrators will use this software for finding drawbacks in web site.

Technical Constraints / Operational Contextualization Output devices:This “WebLogAnalyzer" system has a 320x200 16-color display as a model window. Windowing systems, UI libraries, or other UI technologies will you used:Standard .NET with no extra libraries.

User Interface Checklist Understandability and learn ability

There are no misunderstanding by labels and icons used in this system as it uses standard ones. The advanced options clearly separated from the most commonly used options There is no invisible options or commands Safety

This is one way export process from front end to database. But still it we can rollback using database administration. Consistency and Familiarity

The UI elements in this system work the same as they do in the existing example systems I identified. And all elements in this system that appear the same, actually function the same.

Persistence Central Database Database access controls will be used:A database user account has been created that has access to the needed application database tables. The username and password for this account is stored in a configuration file read by the application server. This application's central database accessible to other applications:No. This database should always be accessed through this application. All relevant pieces of information are available through the application interfaces. The database itself does not protect against data corruption that could be caused by other applications.

File Storage Nothing is stored in files, everything is in the database. The server stores most data in the database; all user documents are stored in files on their computer hard disk.

Persistence Mechanisms Checklist Expressiveness: Database can easily understandable. Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 6

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Ease of access: Database is accessible by login id and password only. Reliability: The database is highly reliable. Capacity: Database server is having more than 80GB free space. Security: The database is highly secure. Performance: Intel based systems with more than 512MB ram will work faster for this system.

Physical structure of the Database:All tables described below are deployed in Oracle and they are normalized. Any modification of database during will not give much impact in entire design of the project.

Main_Parsed Table1:Field Name Unique_ID Client_IP RFC_Name

LogName Log_Date Req_method Req_Path Req_Protocol Stat_Code

Req_Bytes

Referrer

User_agent

Data Type Length Description AutoNumber 50 Unique Number to Identify the records. This is the address of the computer making VARCHAR2 50 the HTTP request. The server records the IP The field is designed to identify the VARCHAR2 20 requestor. If this information is not recorded, a hyphen (-) holds the column in the log. If using local authentication and registration, VARCHAR2 20 the user's log name will appear; likewise, if no value is present, a "-" is substituted. The format is DD/Mon/YYYY:HH:MM:SS TIMESTAMP +GMT VARCHAR2 20 Request Method is GET, PUT, POST, or HEAD VARCHAR2 256 Path is the path and file retrieved VARCHAR2 20 It defines the protocol used by the Client HTTP completion code. 200: OK 3xx: Some VARCHAR2 3 sort of Redirection 4xx: Some sort of Client Error 5xx: Some sort of Server Error For GET HTTP transactions, this field is the number of bytes transferred. For other VARCHAR2 10 commands this field will be a hyphen (-) or a zero (0) The referrer URL indicates the page where VARCHAR2 50 the visitor was located when making the next request. The user agent is information about the VARCHAR2 200 browser, version, and operating system of the reader. The general format is:

GMT Table2:Field Name GMT Zone

Data Type SMALLINT VARCHAR2

Length Description 5 Greenwich Mean Time in number format 2 Zone of the GMT

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 7

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Military_Code Country City

VARCHAR2 VARCHAR2 VARCHAR2

10 15 15

Millitary Code for the Time Zone Country Name City Name

IP2Country Table3:Field Name

Data Type

Length

IP_From

NUMBER

12

IP_To

NUMBER

12

Registry

VARCHAR2

10

Country_Code Country

VARCHAR2 VARCHAR2

3 20

Description Starting IP address (Numerical representation of IP address) Ending IP address (Numerical representation of IP address.) This is having reserved address numbers. It contains “apcnic, arin, lacnic, ripencc, afrinic” Code of the country Full Description of the country

IP Example: (from Right to Left) 1.2.3.4 = 4 + (3 * 256) + (2 * 256 * 256) + (1 * 256 * 256 * 256)= 16909060

User_agent Table4:Field Name

Data Type

Length

U_Agent_String

VARCHAR2

100

U_Agent_Type Browser Platform

VARCHAR2 VARCHAR2 VARCHAR2

2 10 10

Description User Agent String with all information about the Client system. S-Spiders, R-Robots, C-Crawler, B-Browser Browser Version Platform of User

Req_Resourse Table5:Field Name Req_URL Req_File Req_Bytes

Data Type VARCHAR2 VARCHAR2 NUMBER

Length Description 100 Requested URL path 50 Requested file 10 Requested file Size in bytes

Status_Code Table6:Field Name Stat_Code

Data Type NUMBER

Stat_C_Desc

VARCHAR2

Length Description 3 HTTP completion code. 200: OK 3xx: Some sort of Redirection 4xx: 25 Some sort of Client Error 5xx: Some sort of Server Error

Host_Summary Table7:Field Name

Data Type

Length

Client_IP

VARCHAR2

50

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

Description This is the address of the computer making the HTTP request. The server records the IP 17-Oct-2008 Page 8

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Country_Code

VARCHAR2

3

No_Of_Occurances

NUMBER

5

No_Of_Pages

NUMBER

5

Bandwidth Date

NUMBER DATETIME

10

Code of the country The number of times client visited the website. The number of times client visited the webpages. Bandwidth in bytes Date the client visited the website.

Referrar_Code Table8:Field Name Ref_URL Ref_Site

Data Type VARCHAR2 VARCHAR2

Key_Word1

VARCHAR2

Key_Word2

VARCHAR2

Key_Word3

VARCHAR2

Key_Word4

VARCHAR2

Key_Word5

VARCHAR2

Search_Engine Dom_Name

VARCHAR2 VARCHAR2

Length Description 100 Referral URL 100 Referring WebSite Keywords used to search the 20 website Keywords used to search the 20 website Keywords used to search the 20 website Keywords used to search the 20 website Keywords used to search the 20 website 20 Name of the Search Engine 5 Name of the Domain

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

content in content in content in content in content in

17-Oct-2008 Page 9

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

UML Activity Diagram

Parse Log Data

Finding Country by IP Address

Parsing Time Zone by splitting the date time and GMT

Parsing the Arguements in Request Field

Parsing Status Code

Parsing Referrer

Parsing User Agent Details

Update in database

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 10

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

OuterView Of Project

Access_Stats

Host_Stats WebAdmin

Referrer_Stats

User_Agent_Stats

UserAgent Class Diagram User_Agent Attributes

Private Private Private Private

U_Agent_URL As Character Type As Character Browser As Character Platform As Character

Public Public Public Public Public Public Public Public Public

Function Class_Initialize() Function getU_Agent_URL() As Character Sub setU_Agent_URL( val As Character ) Function getType() As Character Sub setType( val As Character ) Function getBrowser() As Character Sub setBrowser( val As Character ) Function getPlatform() As Character Sub setPlatform( val As Character )

Operations

U_A_Browser

U_A_OS Attributes

Private NoOfHits As Integer Private Bandwidth As Integer Private NoOfPages As Integer Operations

Public Public Public Public Public Public Public

Function Class_Initialize() Function getNoOfHits() As Integer Sub setNoOfHits( val As Integer ) Function getBandwidth() As Integer Sub setBandwidth( val As Integer ) Function getNoOfPages() As Integer Sub setNoOfPages( val As Integer )

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

Attributes

Private NoOfHits As Integer Private Bandwidth As Integer Private NoOfPages As Integer Operations

Public Public Public Public Public Public Public

Function Class_Initialize() Function getNoOfHits() As Integer Sub setNoOfHits( val As Integer ) Function getBandwidth() As Integer Sub setBandwidth( val As Integer ) Function getNoOfPages() As Integer Sub setNoOfPages( val As Integer )

17-Oct-2008 Page 11

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Access Statistics Class Diagram

ClientRequests Attributes

Private RequestedFile As Character Private ReqestedURL As Character Private RequestedBytes As Character Private ClientIP As Character Operations

Public Function getRequestedFile() As Character Public Sub setRequestedFile( val As Character ) Public Function getReqestedURL() As Character Public Sub setReqestedURL( val As Character ) Public Function getRequestedBytes() As Character Public Sub setRequestedBytes( val As Character ) Public Function getClientIP() As Character Public Sub setClientIP( val As Character ) Public Function Class_Initialize()

By_Pages { From Access_Stats } Attributes

Private NoOfVisitors As Integer Private Bandwidth As Integer Private NoOFHits As Integer Operations

Public Function Class_Initialize() Public Function getNoOfVisitors() As Integer Public Sub setNoOfVisitors( val As Integer ) Public Function getBandwidth() As Integer Public Sub setBandwidth( val As Integer ) Public Function getNoOFHits() As Integer Public Sub setNoOFHits( val As Integer )

By_Files Attributes

Private NofOfVisitors As Integer Private Bandwidth As Integer Private NoOfHits As Integer Operations

Public Function Class_Initialize() Public Function getNofOfVisitors() As Integer Public Sub setNofOfVisitors( val As Integer ) Public Function getBandwidth() As Integer Public Sub setBandwidth( val As Integer ) Public Function getNoOfHits() As Integer Public Sub setNoOfHits( val As Integer )

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

By_Paths Attributes

Private NoOfVisitors As Integer Private NoOfHits As Integer Private Bandwidth As Integer Operations

Public Function Class_Initialize() Public Function getNoOfVisitors() As Integer Public Sub setNoOfVisitors( val As Integer ) Public Function getBandwidth() As Integer Public Sub setBandwidth( val As Integer ) Public Function getNoOfHits() As Integer Public Sub setNoOfHits( val As Integer )

By_ResponseCode Attributes

Private NoOfVisitors As Integer Private Bandwidth As Integer Private NoOfHits As Integer Operations

Public Function Class_Initialize() Public Function getNoOfVisitors() As Integer Public Sub setNoOfVisitors( val As Integer ) Public Function getBandwidth() As Integer Public Sub setBandwidth( val As Integer ) Public Function getNoOfHits() As Integer Public Sub setNoOfHits( val As Integer )

17-Oct-2008 Page 12

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Referrer Statistics Class Diagram

ReferrerStats Attributes

Private ReferrerURL As Character Private RefSite As Character Private Keyword1 As Character Private Keyword2 As Character Private Search_Engine As Character Private Dom_Name As Character Operations

Public Function Class_Initialize() Public Function getReferrerURL() As Character Public Sub setReferrerURL( val As Character ) Public Function getRefSite() As Character Public Sub setRefSite( val As Character ) Public Function getKeyword1() As Character Public Sub setKeyword1( val As Character ) Public Function getKeyword2() As Character Public Sub setKeyword2( val As Character ) Public Function getSearch_Engine() As Character Public Sub setSearch_Engine( val As Character ) Public Function getDom_Name() As Character Public Sub setDom_Name( val As Character )

ByRef_Site

By_Keyword

Attributes

Private NoOfHits As Integer Private Bandwidth As Integer Private NoOfPages As Integer Operations

Public Function Class_Initialize() Public Function getNoOfHits() As Integer Public Sub setNoOfHits( val As Integer ) Public Function getBandwidth() As Integer Public Sub setBandwidth( val As Integer ) Public Function getNoOfPages() As Integer Public Sub setNoOfPages( val As Integer )

Attributes

Private NoOfHits As Integer Private Bandwidth As Integer Private NoOfPages As Integer Operations

Public Function Class_Initialize() Public Function getNoOfHits() As Integer Public Sub setNoOfHits( val As Integer ) Public Function getBandwidth() As Integer Public Sub setBandwidth( val As Integer ) Public Function getNoOfPages() As Integer Public Sub setNoOfPages( val As Integer )

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

By_SearchEngine Attributes

Private NoOfHits As Integer Private NoOfPages As Integer Private Bandwidth As Integer Operations

Public Function Class_Initialize() Public Function getNoOfHits() As Integer Public Sub setNoOfHits( val As Integer ) Public Function getBandwidth() As Integer Public Sub setBandwidth( val As Integer ) Public Function getNoOfPages() As Integer Public Sub setNoOfPages( val As Integer )

17-Oct-2008 Page 13

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

A normal web log is a raw file as follows: Here 1,2,3 and 4 are line number representation 1. 65.55.208.12 - - [09/Sep/2007:04:13:04 +0530] "GET /academic/curri2002ft-welding.doc HTTP/1.0" 200 52224 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" 2. 74.6.28.105 - - [09/Sep/2007:04:13:17 +0530] "GET /academic/D508.doc HTTP/1.0" 304 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 3. 69.123.246.252 - - [09/Sep/2007:04:13:33 +0530] "GET /images/newlogo.jpg HTTP/1.1" 304 "http://collinfo.annauniv.edu:6060/annauniv/courseall/branchwise.asp?brnam e=B.E-Bio-Medical Engineering&brcode=121°rcode=11" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" 4. 69.123.246.252 - - [09/Sep/2007:04:13:33 +0530] "GET /images/annatext.gif HTTP/1.1" 304 "http://collinfo.annauniv.edu:6060/annauniv/courseall/branchwise.asp?brnam e=B.E-Bio-Medical Engineering&brcode=121°rcode=11" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" Format Of log File: <method><protocol><user_agent> Fields: Client IP: 128.101.228.20 Authenticated User ID: - Time/Date: [10/Nov/1999:10:16:39 -0600] Request: "GET / HTTP/1.0" (Other common methods are POST and HEAD) Status: 200 (– 200: OK – 3xx: Some sort of Redirection – 4xx: Some sort of Client Error– 5xx: Some sort of Server Error) Bytes: Referrer: “-” Agent: "Mozilla/4.61 [en] (WinNT; I)" Common Log Format: Remotehost: browser hostname or IP # Remote log name of user (almost always "-" meaning "unknown") Authuser: authenticated username Date: Date and time of the request "request”: exact request lines from client Status: The HTTP status code returned Bytes: The content-length of response

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 14

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Sample Reports Access Statistics Pages Hits

Page 1 2 3 4

/ /coe/schedule.htm /result/results_revs.html /academic/

166 61 32 25

% 27.48 10.10 5.30 4.14

Visitors % 144 26.33 53 9.69 31 5.67 23 4.20

Bandwidth 2.30 MB 615.87 KB 117.11 KB 137.72 KB

% 30.75 8.04 1.53 1.80

Entry Points Hits

Entry Point 1 2 3 4

/ /academic/ /academic /academic/lakescr.txt

135 15 9 8

% 57.45 6.38 3.83 3.40

Visitors % 135 57.45 15 6.38 9 3.83 8 3.40

Bandwidth % 86.84 2.09 0.11 0.00

2.22 MB 54.74 KB 2.85 KB 8

Paths Visitors % 53 22.55 16 6.81 11 4.68 10 4.26

Path 1 2 3 4

No No No No

Referrer Referrer Referrer Referrer

-> -> -> ->

File Type 1 .gif 2 .jpg 3 .html

/ / -> /coe/schedule.htm / -> /result/results_revs.html /academic/ Hits 1616 653 440

% 40.34 16.30 10.98

Response Code 1 2 3 5 6

200 304 404 301 405

-

Visitors % 173 15.27 177 15.62 221 19.51

OK Not Modified Not Found Moved Permanently Method Not Allowed

Hits

Pages 196 63 73

Visitors % % 2415 60.28 240 44.53 1057 26.39 109 20.22 411 10.26 120 22.26 22 0.55 22 4.08 15 0.37 15 2.78

Bandwidth 721.30 KB 549.09 KB 249.79 KB 2.88 KB

% 9.42 7.17 3.26 0.04

Bandwidth % 31.01 9.97 11.55 Pages % 566 71.37 140 17.65 46 5.80 6 0.76 2 0.25

% 7.08 7.86 6.54

3.93 MB 4.36 MB 3.63 MB Bandwidth 44.63 MB 0 119.31 KB 6.90 KB 4.78 KB

% 80.48 0.00 0.21 0.01 0.01

Visitor Statistics Hosts Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 15

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Host

Hits

Country

1 122.164.245.135 2 121.246.25.137 3 59.92.9.1

India India India

128 121 119

Pages % 3.20 3.02 2.97

Bandwidth % 1.83 2.92 3.93

54 86 116

% 0.88 0.62 0.91

499.78 KB 352.62 KB 514.89 KB

Visitors Visitors 1 122.164.245.135 2 121.246.25.137 4 122.164.169.105 Country 1 India 2 United States 3 Kuwait

Hits

Country India India India

128 121 113

Hits 3882 74 25

% 3.20 3.02 2.82

Visitors % 278 89.39 22 7.07 1 0.32

% 96.90 1.85 0.62

Pages % 54 1.83 86 2.92 67 2.27 Pages 599 59 23

Bandwidth 499.78 KB 352.62 KB 1.02 MB

% 0.88 0.62 1.84

Bandwidth % 85.21 8.39 3.27

45.54 MB 8.36 MB 162.64 KB

% 82.11 15.08 0.29

Referrers Statistics Hits

Visitors Pages Bandwidth % % % % 1134 28.31 143 15.29 25 2.55 5.95 MB 10.73 17.25 553 13.80 249 26.63 104 10.61 31.10 MB

Referrer 1 http://www.annauniv.edu / 2 No Referrer http://www.annauniv.edu /schedule.htm http://www.annauniv.edu 4 /circular.html 3

/coe /coe

457 11.41 59 6.31 19 1.94 2.25 MB 4.05 197 4.92 19 2.03 18 1.84 1.11 MB 2.00

Referring Sites Hits

Referring Site

% 1 http://www.annauniv.edu / 3311 82.65 2 No Referrer 553 13.80 3 http://collinfo.annauniv.edu :6060 / 68 1.70 4 http://www.google.co.in / 25 0.62 5 http://www.google.com / 13 0.32

Visitors % 195 38.09 249 48.63 19 3.71 21 4.10 8 1.56

Pages % 548 76.97 104 14.61 8 1.12 17 2.39 6 0.84

Bandwidth 33.38 MB 17.25 MB 196.97 KB 1.98 MB 663.29 KB

% 60.19 31.10 0.35 3.57 1.17

Keywords Keyword 1 anna university 2 annauniversity

SE Page 1 1

Hits % 11 28.95 5 13.16

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

Visitors % 7 22.58 3 9.68

Pages % 3 13.04 1 4.35

Bandwidth 153.23 KB 42.43 KB

% 34.60 9.58

17-Oct-2008 Page 16

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

3 annauniv.edu 4 annauniv

1 2 3 4

1 1

4 2

Search Engine

SE Page

Google.com Yahoo.com MSN.com live.com

1-2 1 1 1

10.53 5.26

Hits % 28 70.00 8 20.00 3 7.50 1 2.50

3 9.68 2 6.45 Visitors % 22 70.97 6 19.35 2 6.45 1 3.23

2 8.70 1 4.35

76.75 KB 42.43 KB

Pages % 13 68.42 2 10.53 3 15.79 1 5.26

17.33 9.58

Bandwidth 306.83 KB 84.86 KB 32.00 KB 21.21 KB

% 68.96 19.07 7.19 4.77

User Agent Stats Operating System 1 2 3 4 5

Windows XP Windows 2000 Windows 98 Unknown Linux Browser

1 2 3 4 5

MS Internet Explorer 6 Firefox MS Internet Explorer 7 MS Internet Explorer 5 Opera 9

Hits 3154 449 277 65 16

% 78.99 11.24 6.94 1.63 0.40

Visitors % 185 62.08 32 10.74 12 4.03 62 20.81 3 1.01

Hits % 2707 68.85 597 15.18 368 9.36 106 2.70 52 1.32

Visitors % 163 68.49 30 12.61 19 7.98 5 2.10 5 2.10

Pages 452 207 83 28 14

Bandwidth % 55.73 25.52 10.23 3.45 1.73

40.86 MB 2.74 MB 4.35 MB 498.96 KB 66.85 KB

Pages % 453 46.89 245 25.36 141 14.60 53 5.49 37 3.83

% 83.90 5.62 8.93 1.00 0.13

Bandwidth % 66.89 8.71 21.17 0.51 0.65

32.22 MB 4.20 MB 10.20 MB 253.71 KB 318.61 KB

Error Stats Errors Hits

Error 1 2 3 4 5

%

/coe/TITLEflowers.gif http://www.annauniv.edu /coe /schedule.htm /favicon.ico No Referrer /coe/fd_1.jpg http://www.annauniv.edu /coe /top.htm /campustour/images/leftboxcorner_top.gif http://www.annauniv.edu /campustour /index.htm /academic/ No Referrer Error

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

97

22.77

87

20.42

35

8.22

27

6.34

15

3.52

Hits %

17-Oct-2008 Page 17

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

1 2

404 - Not Found 405 - Method Not Allowed

411 15

96.48 3.52

Sample Code #include #include <stdio.h> #include <stdlib.h> #include <string.h> #ifndef _DEBUG #define PRIVATE static #else #define PRIVATE #endif #define MAX_FILE_SPECS (10) #define INITIAL_BUFFER_LEN (100) PRIVATE PRIVATE PRIVATE PRIVATE PRIVATE PRIVATE PRIVATE PRIVATE PRIVATE PRIVATE PRIVATE

struct log_entry_filter log_filter; char* file_specs[MAX_FILE_SPECS]; void filter_file(FILE* log_file); void parse_command_line(int argc, char** argv); void execute_all_tests(void); char* all_tests(void); void read_file_specs_from_cl(int argc, char* argv[]); void filter_files(glob_t* glob); void free_file_specs(void); void print_version(void); void filter_file_specs(void);

int main(int argc, char** argv) { parse_command_line(argc, argv); if (file_specs[0] != NULL) { filter_file_specs(); free_file_specs(); } else { filter_file(stdin); } filter_free(&log_filter); return EXIT_SUCCESS; } PRIVATE void filter_file(FILE* log_file)

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 18

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

{

struct log_file_entry* entry; char* line = NULL; size_t length = INITIAL_BUFFER_LEN; line = buffer_allocate(line, INITIAL_BUFFER_LEN + 1);

}

while (getline(&line, &length, log_file) != -1) { assert(line != NULL); entry = parse_line(line); if (entry) { if (filter_entry(&log_filter, entry)) { fputs(line, stdout); } free_entry(entry); } } free(line);

PRIVATE void free_file_specs(void) { int counter = 0; while (file_specs[counter] != NULL) { free(file_specs[counter]); counter++; } } PRIVATE void filter_file_specs(void) { int counter = 0; int flags = 0; int status; glob_t glob_buf; assert(file_specs[0] != NULL); while (file_specs[counter] != NULL) { status = glob(file_specs[counter], flags, NULL, &glob_buf); switch (status) { Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 19

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

case GLOB_NOSPACE: // Out of memory error exit_with_diagnostic("Ran out of memory whilst globbing...\n"); break; case GLOB_NOMATCH: // The pattern didn't match any files exit_with_diagnostic("No files match file spec\n"); break; default: // Everything went ok, just carry on... break; } flags |= GLOB_APPEND; counter++; } assert(glob_buf.gl_pathc > 0); filter_files(&glob_buf); globfree(&glob_buf); } PRIVATE void filter_files(glob_t* glob) { int i; FILE* log_file;

}

for (i = 0; i < glob->gl_pathc; ++i) { log_file = fopen(glob->gl_pathv[i], "r"); if (!log_file) { exit_with_diagnostic("Unable to open log file\n"); } filter_file(log_file); fclose(log_file); }

PRIVATE void usage(void) { exit_with_diagnostic( "usage: " PACKAGE_NAME " [-hiTv] [-b browser] [-c client] [-f filter(s)]\n" " [-I identity] [-m method] [-p protocol] [-r referer] [-s status]\n" " [-u uri] [-U user] [-z size] logfile [logfile...]\n" "\n" " -b browser filter for user agent (browser) string\n" " -c client filter for client address\n" Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 20

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

" -h get usage message\n" " -i do case-insensitive string searches\n" " -I identity ?? filter on second field of log file\n" " -m method filter on request method (e.g. GET, POST...)\n" " -p protocol filter on HTTP protocol version field (e.g. HTTP/1.1)\n" " -r referer filter on document referer string\n" " -s status filter on request status value (e.g. 200, 404...)\n" " -T run internal test suite\n" " -u uri filter on document URI\n" " -U user filter on user name used in request, if any\n" " -v show program's version number\n" " -z size filter on document size\n" "\n"); } PRIVATE void parse_command_line(int argc, char** argv) { int choice; if (argc <= 1) { usage(); } memset(file_specs, 0, MAX_FILE_SPECS * sizeof(char*)); while (((choice = getopt(argc, argv, "b:c:hiTI:m:p:r:s:tu:U:vz:")) != -1)) { switch (choice) { case 'b': save_ua_filter(&log_filter, optarg); break; case 'c': save_client_filter(&log_filter, optarg); break; case 'h': usage(); break; case 'i': // Perform case insensitive matches case_sensitive = 0; break; case 'I': save_identity_filter(&log_filter, optarg); break; case 'm': save_method_filter(&log_filter, optarg); Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 21

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

break; case 'p': save_protocol_filter(&log_filter, optarg); break; case 'r': save_referer_filter(&log_filter, optarg); break; case 's': save_status_filter(&log_filter, optarg); break; case 'T': execute_all_tests(); break; case 'u': save_uri_filter(&log_filter, optarg); break; case 'U': save_user_id_filter(&log_filter, optarg); break; case 'v': print_version(); break; case 'z': save_size_filter(&log_filter, optarg); break; default: usage(); exit_with_diagnostic("\nUnknown command line option"); break; }

}

} read_file_specs_from_cl(argc, argv);

PRIVATE void read_file_specs_from_cl(int argc, char* argv[]) { int cl_counter; int file_spec_counter = 0; char* file_spec; assert(file_specs[0] == NULL); Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 22

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

}

for (cl_counter = optind; cl_counter < argc; ++cl_counter) { file_spec = malloc(strlen(argv[cl_counter]) + 1); if (!file_spec) { exit_with_diagnostic("Failed to allocate buffer for file spec"); } strcpy(file_spec, argv[cl_counter]); file_specs[file_spec_counter++] = file_spec; }

PRIVATE void print_version(void) { printf("%s version %s\n", PACKAGE_NAME, VERSION); exit(EXIT_SUCCESS); } PRIVATE char* all_tests(void) { mu_run_test(entry_all_tests); mu_run_test(filter_all_tests); return 0; } PRIVATE void execute_all_tests(void) { int exit_code = EXIT_SUCCESS; char *result; result = all_tests(); if (result != 0) { printf("%s\n", result); exit_code = EXIT_FAILURE; } else { printf("ALL TESTS PASSED\n"); } printf("Tests run: %d\n", tests_run); exit(exit_code); }

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 23

WEB-BASED DATA MINING IN ACADEMIC WEBSITES

Reference: [1] Vranic, M.Pintar, D. Skocir, "The use of data mining in education environment" in 9th International Conference on Telecommunications, 2007. ConTel 2007; June 2007; PP: 243-250 [2] Qianhui Althea LIANG , Jen-Yao CHUNG , Steven MILLER , Yang OUYANG; "Service Pattern Discovery of Web Service Mining in Web Service RegistryRepository" in IEEE International Conference on e-Business Engineering (ICEBE'06); October 2006 [3] Georgios Lappas; "An Overview of Web Mining in Societal Benefit Areas" in The 9th IEEE International Conference on E-Commerce Technology and The 4th IEEE International Conference on Enterprise Computing, E-Commerce and E-Services (CEC-EEE 2007); July 2007; pp. 683-690 [4] Hafidh Ba-Omar , Ilias Petrounias , Fahad Anwar; "A Framework for Using Web Usage Mining to Personalise E-learning" in Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007); July 2007; pp. 937-938 [5] Leticia dos Santos Machado , Karin Becker; "Distance Education: A Web Usage Mining Case Study for the Evaluation of Learning Sites" In Third IEEE International Conference on Advanced Learning Technologies (ICALT'03); July 2003; pp. 360 [6] Carlos G. Marquardt , Karin Becker , Duncan D. Ruiz; "A Pre-Processing Tool for Web Usage Mining in the Distance Education Domain" in International Database Engineering and Applications Symposium (IDEAS'04); July 2004; pp. 78-87 [7] Xiangzhu Gao , San Murugesan , Bruce Lo; "Extraction of Keyterms by Simple Text Mining for Business Information Retrieval" in IEEE International Conference on e-Business Engineering (ICEBE'05); October 2005; pp. 332339 [8] Ajith Abraham; "Natural Computation for Business Intelligence from Web Usage Mining" in Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05); September 2005; pp. 3-10

Guide: Mr. D. George Washington Prasanna Kumar Palepu (200536314)

17-Oct-2008 Page 24

Related Documents

2review 25 Oct
November 2019 9
2review-continewlatest
November 2019 3
Passeport 23 25 Oct
June 2020 5
25 Oct Mess Sage
December 2019 15
Juniper 25 Oct 2009
June 2020 15
Amm Oct 25 Ahmay
May 2020 7