Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing
Study and Application of Web-based Data Mining in E-Business Yanguang Shen College of information and Electronic Engineering Hebei University of Engineering Handan, 056038, China
[email protected]
Lili Xing College of information and Electronic Engineering Hebei University of Engineering Handan, 056038, China
[email protected]
Internet efficiently, in order to help the data owners discover the valuable information and then make right decisions in business, becomes the urgent problem that e-business operators are concerned about. The rapidly developing technology of Web-based data mining offers an effective approach to the problems that ebusiness is confronted with.
Abstract The paper is engaged in a discussion over applications of Web mining to the intelligent search engine, customer relationship management, personalized service and commercial credit evaluation in e-business. And analysis and reasoning of the mass of information in e-business are made by the technology of Web mining, which can dig out potential modes and predict customers' action, to help enterprises’ decision-makers adjust their marketing strategy, reduce the risk, make right decisions and get competitive advantage.
2. Web mining 2.1. Summarization of Web mining Data mining is an uncommon course to extract the previously unknown and potentially useful information and knowledge from massive, incomplete, disturbed, fuzzy and random data. Web mining tries to extract interesting, potentially useful and hidden information from the documents and activities on the Web to help people abstract knowledge from WWW by way of data mining to some extent, which is a cross field of database, data mining, artificial intelligence, information retrieval, natural language understanding and so on.
1. Introduction Currently, e-business, with its advantages of lowcost, high efficiency, free from the restrictions of time and space, gradually prevails in the globe. But meanwhile it is confronted with many problems. How to collect the information about the environment inside and outside of the enterprise comprehensively, accurately and timely through the Internet, especially the hidden information that is the key to the success of the enterprise, so as to adapt to the changes of the market and enhance the competitiveness? How to adjust the marketing strategy and optimize the structure of the website and the way of service according to the access habit of customers to increase the efficiency of the website, in order to improve the customer relationship management and realize the lifetime value of customers? How can enterprises ingratiate the customers, realize recommendation on their own initiative and offer personalized service to them? These problems are the key to the success of the e-business. In the new mode of business, how to organize and make full use of the abundant information on the
0-7695-2909-7/07 $25.00 © 2007 IEEE DOI 10.1109/SNPD.2007.117
Yiting Peng National Natural Science Foundation of China Beijing,100085, China
2.2. Data sources of Web mining in e-business There is a great deal and large variety of data in ebusiness which can be used for data mining analysis. Usually the types of data which can be used for Web mining to produce different knowledge modes are as follows [1]: 1. Server data. Customers will leave their respective log data on Web servers when visiting these sites. These log data are usually stored in server in the form of document files, generally including server logs, error logs, cookies logs and so on. 2. Query data. Query data is a typical kind of data produced on e-business Web servers. For example,
812
customers stored on line perhaps search for some products and some advertisement information, and this query information is just related to the server log through cookies or register information. 3. On-line market data. The major part of the data is about e-business websites, purchases of customers, merchandises and so on, which is stored in traditional relational databases. 4. Web pages. Web pages include HTML or XML pages, which comprise texts, pictures, audio, and video and so on. 5. Hyperlinks between Web pages. It is an important resource, which indicates the relation of hyperlinks between pages. 6. Customer registration information. It is the information that customers have to input via a Web page and submit to the server. It is usually about the demographic characteristics of users. In Web mining, customer registration information should be integrated with visiting logs to improve the accuracy of data mining and produce more knowledge about customers.
targeted pages so as to meet the specific requirements of visitors. 4. Classification and prediction. The discovery of classification gives a description of the public attributes to a special group, which can be used to classify new items. The purpose of classification is mapping the data items in the database to one of the given types by constructing a classification model or classifier so that it can be used in prediction. That is to say, historical data record is used to give the extended description of the given data automatically, so that future data can be predicted and some business activities suitable for a particular clan of customers can be carried out. 5. Cluster analysis. Customers with similar characteristics can be gathered from the Web-visiting information with cluster analysis. Clustering customer information or data items in Web logs facilitates the development and implementation of future marketing strategies, which include sending email for sales automatically to a specific customer cluster, recommending specific commodities to customers of a certain cluster and so on. For e-business, customer clustering can provide strong theoretical support to market segmentation. By extracting features from clustered customers, e-business website can provide their customers with personalized service. 6. Anomaly detection. Anomaly detection describes the minority and extreme cases of the analyzed object revealing the internal causes so as to reduce operation risks. The application of anomaly detection in ebusiness can be reflected in credit card fraud screening, unusual customer detection, and network intrusion detection and so on [2].
2.3. Knowledge schemas that Web mining can acquire Web mining techniques can be employed to dig out some relevant knowledge schemas from various data sources of websites to guide operators of the sites to work better and provide better services to customers. Usually, Web mining can be used in websites to mine out the following knowledge schemas. 1. Path analysis. It can be used to determine the most frequently visited path in a Web site. Through path analysis, we can find important pages and improve the design of Web pages and the structure of the Web site. 2. The discovery of association rules. The mutual relations between the various documents customers visited on the website can be found with the discovery of association rules in e-business. The correlation between pages and the relativity of the purchases can be found as well. With these relativities in mind, the sites can be better organized and effective marketing strategies can be put in practice to increase cross-sales. Meanwhile, the customers’ burden of filtering information can be reduced greatly. 3. The discovery of sequential patterns. It is the finding in time-stamp-orderly sequence transactions of those internal models of transactions in the way of “some items following another”. It can facilitate ebusinesses predicting client access mode and help offer targeted advertising services to clients. The discovery of a series of models can help the server choose
3. Application of Web mining to e-business 3.1. Intelligentized search engine based on Web mining Current search engines have some disadvantages such as low precision and the great deal of useless information returned, so that e-business enterprises can't acquire enough crucial information to enhance competitiveness. The technology of Web mining and the search engine can be combined to make the intelligent search engine to meet the needs of ebusiness enterprises. The follows are some aspects of Web mining mainly adopted by search engines: automatic document classification, automatic abstract formation, online clustering of retrieval results, relevance ranking and the personalized search engine [3]. We can sort the retrieval results by document classification to help
813
users locate the object knowledge fleetly. Most search engines automatically intercept the first several sentences of a document and make an abstract with fixed words, so that it has the fault of reflecting information incomprehensively. Automatic abstract formation can resolve this problem and help users get the retrieved information more accurately, more conveniently and faster. The clustering of the retrieved document set can assemble the relevant documents to keep away from those irrelevant. The processed information can be offered to users in the visual hierarchical form of the hyperlink construction. And users can then choose their favorite clusters to reduce the number of Web pages to be browsed. The search engine combined with personalized technology of Web mining can get the intrinsic characters of data objects on the basis of numerous training samples and extract information purposively according to them, so that it expands the search engine’s store of key words users have searched by according to users’ preferences and consequently the retrieval results can meet users’ needs more closely. Or it can set up users’ interest library on the analysis of the information users have browsed. In a word, the personalized search engine can enhance the recall and precision of search.
different categories to improve the satisfaction of customers and to maintain old customers consequently. We can determine which category a new visitor belongs to and whether he or she is potentially profitable by analyzing the records of the pages that the visitor browses, so that we can deal with different customers accordingly , reduce the cost of sales, increase the rate of transforming from visitors to buyers, and hence dig out potential customers; Customers with similar browsing behaviors are grouped together and their common features are extracted, so that customers can be clustered, which can help e-business enterprises to better understand customers’ interests, consuming habits and trends, predict customers’ needs, recommend specific commodities to them accordingly and realize crossselling [4]. The trading volume and the rate of successful trades will be increased and the efficiency of distribution will be improved. In addition, the structure and content of the site is the key to customers’ interest. With the discovery of association rules, we can rearrange them dynamically for different customers, and put together the commodities with some degree of support and trust to promote sales; By the means of path analysis we can identify the paths along which a category of customers visit the site frequently. These paths reflect the sequence and habits of such customers visiting pages of the site. We can hyperlink the related documents customers have visited in order that they can access their favored easily. Such a site will leave a good impression on customers, strengthen their loyalty, arouse their interest, prolong their time present on the site and increase their chance of visiting again. By Web mining, we can acquire reliable market feedback to evaluate the rate of return on advertisement investment and decide whether the online marketing mode is successful or not; According to the browsing mode of visitors interested in a certain product, we can determine the location of the advertisement to increase the pertinence and the rate of return on advertisement investment and reduce companies’ operating costs.
3.2. Application of Web mining to Customer Relationship Management (CRM) The core of CRM, on one hand, is to discover potential markets and customers by collecting effective data about customers and their activities; and on the other hand, is to meet customers’ needs and to realize customers’ lifetime value by improving the customer service and a deep analysis of customers. CRM provides traditional enterprises with management systems and technical artifices for their survival in the network economy era. It requires enterprises to transfer from the "product-oriented" model to "customeroriented". 3.2.1. Application of Web mining to CRM. Web mining can help enterprises identify customers’ features, which enables enterprises to provide targeted services for customers. Web mining used in CRM of ebusiness has several aspects, such as the acquisition and maintenance of customers, identification of the value of customers, analysis of customers’ satisfaction, and improvement of site structure and so on. With Web mining, we can understand the dynamic behavior of visitors and optimize the operation mode of e-business websites. We can put the large number of customers acquired into different categories and provide personalized services for customers from
3.2.2. Safeguard the privacy of customers. Safeguarding the privacy of customers is a basic part in commercial operation that can not be ignored. Therefore, as an e-business enterprise, mining an individual customer should be avoided. Customers’ privacy should be protected with both technology and management. Technologically, we usually adopt the encrypted identifier, and minimize individual customer data mining. In management, many enterprises have added the position of Chief Privacy Officer, who makes an appropriate balance between the individual
814
demand for privacy and the right of using private materials in a reasonable way on the part of the enterprise [5]. E-business enterprises manage the protection of customers’ privacy as a sole main body. In addition, trade self-regulation is an effective way to protect customers’ privacy. At present, e-business websites are more and more inclined to establish their self-images in customers by way of self-regulation, so that customers can submit data free of worry.
in e-business. It is the executor of the “customeroriented” ﹑“one-to-one” sales principle. The personalized service recommendation system mainly applies the ideas and methods of data mining to such resources as Web server logs and Web databases [6]. It mines the regular visiting patterns of users, puts them into particular categories and recommends Web pages accordingly. And the system can adjust the recommendation set timely to provide personalized access for users by constantly tracing their current access. It is composed of five modules: data collection, data preprocess, data storage, offline mining and online recommendation. The system construction model is as figure 1 shows.
3.3. Application of Web mining to the personalized service recommendation system It can help enterprises implement better CRM to setup the personalized service recommendation system Data collection module Web database
Usage log library
Site files
Data preprocess module
Data storage module
Data cleaning User identification Session identification Transaction identification Path complement
User transaction library
Offline mining module Useful mode set
Mode analysis
Mode rule set
Mode discovery
Data mining engine
Online recommendation module Recommendation engine
User session
Web
User session Personalized service
Recommendation rule set
server Improvement of the structure of site
Network centre
Figure 1. Construction model of personalized service recommendation system based on Web mining. The data collection module can collect such data as preprocessed data into the transaction library. The Web databases and usage logs to prepare for later mining engine of the offline mining module uses the mining. The data preprocess module preprocesses the data mining technology in the algorithms library, such collected data, its process including data cleaning, user as statistical analysis, association rules, cluster analysis identification, session identification, transaction and sequence patterns. It serves to discover the identification and so on. The quality of the data browsing modes of users, and analyze and translate preprocess is greatly correlated to the efficiency and them through mode analysis. Based on practical result of mining. Data storage module stores the application it can change the statistic results,
815
discovered rules and modes into knowledge through observation and selection. The useful mode that has been chosen out can guide the practice of e-business. The online recommendation module sets a recommendation engine in the front of the Web server. It generates the corresponding recommendation set, combining the users’ current browsing and the Web page recommendation set the users have browsed. And then recommendation set pages are added to the newly requested pages, which are transmitted to users’ browsers through the Web server, realizing real-time personalized services. At the same time, the recommendation results are transmitted to the site management centre for it to adjust the site design, optimize its structure and enhance its efficiency. In a word, the personalized service recommendation system with the data mining technology has two stages. The first stage is studying offline. The second is the online using of the mode. The feature capture of mining and online recommendation and the generation of rules are processed offline. But online services are provided through the online recommendation engine when users access the site. The online and offline modules are interrelated to each other. The online module offers recommendation based on the rule models provided by the offline module; and the offline module generates correspondent rules with the data accumulated online and by recommendation algorithms. The mining algorithm and recommendation strategy can be chosen according to the need of sites of different styles. The mining results and recommendation set are fed back to users by the recommendation engine. The access information will be recorded in the server after users login the site. And after being preprocessed these data will be processed using mode identification and mode analysis with concrete mining algorithm and recommendation strategy in the exclusive data mining module. Users’ access information is also transmitted to the recommendation engine, which extract the correspondent user’s mining results and recommendation set from the mining module and feed them back to users visually to realize personalized services.
abnormal examples by Web mining. In addition, we can track the operation of enterprises by data mining technology to evaluate the enterprise assets, analyze its profitability, predict developing potential, form perfect security assurance system, implement monitor and control of the whole course online, supervise of free presses online, stick up for enterprise credibility, strengthen the security management of Internet transactions and online payment. With the credit evaluation model of data mining, we can find the data features of customers’ transactions and set up customer credit level, in order to prevent and reduce the credit risks effectively and enhance the level of enterprise credit discrimination and risk management through mining the amount data of history transactions [7].
4. Conclusions Hidden knowledge behind massive data can be found by e-business-oriented Web mining, which can guide e-business enterprise to increase sales, improve the enterprise-customer relationship and increase the operating efficiency of the website. The development and application of e-business-oriented Web mining has a good prospect and will be paid more and more attention to.
5. References [1] Zhong Xie, “Recommendation System of Commercial Site Research Based on Web Data Mining”, Master’s Degree Thesis of Southwest Normal University, 2002(05), pp. 8-10. [2] Fengzhao Yang, and Hui Bai, “Application of Anomaly Detection in E-business”, Information Magazine, Xi’an, 2005(12), pp.51-53. [3] Yan Li, Xinzhong Chen, and Bingru Yang, “Research on Web Mining-based Intelligent Search Engine”, Computer Engineering and Applications, Beijing, 2002(04), pp.34-36. [4] Xiang Su, Weiling Jiao, and Pei Wu, “Application and Study of Web Mining”, Infotmation science , Theory & Application, Beijing, 2005, 28(06), pp.651-655. [5] Jing Hao. “Application of Data Mining in the CRM of Ebusiness”, Master’s Degree Thesis of Wuhan University. 2005(05), pp.47-53.
3.4. Application of Web-based data mining to commercial credit evaluation
[6] Fenghui Li, “On Ec-oriented Web Data Mining”, Master’s Degree Thesis of Shandong Science and Technology University,2004(06), pp.35-79.
Developed social credit is the important foundation for the development of e-business. It can prevent the risks of investment and operation effectively with the statistics of the differences between the data of website and the historical record, the deviation between the results and expectations and full analysis of the
[7] Chuiwei Lu, “Study and Application of Data Mining Technology in E-business”, Market Modernization, Beijing, 2006(04), pp. 87.
816