Data Mining

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Mining as PDF for free.

More details

  • Words: 1,257
  • Pages: 3
A08413 - NguyÔn §×nh TiÕn Data Mining

(Khai th¸c d÷ liÖu)

Khai th¸c d÷ liÖu ®¬n gi¶n lµ viÖc läc mét lîng lín d÷ liÖu th« ®Ó lÊy c¸c th«ng tin h÷u Ých mµ cho mét u thÕ c¹nh tranh. Th«ng tin nµy ®îc t¹o bëi nh÷ng mÉu mµ nh÷ng khuynh híng ®Çy ý nghÜa mµ ®· cã trong d÷ liÖu nhng cha ®îc they tríc ®ã. C«ng cô phæ biÕn nhÊt ®îc sö dông khi khai th¸c lµ TrÝ tuÖ nh©n t¹o (AI). C¸c c«ng nghÖ AI cè g¾ng lµm viÖc theo c¸ch mµ bé n·o con ngêi lµm viÖc, b»ng nh÷ng pháng ®o¸n th«ng minh, häc theo vÝ dô, vµ sö dông suy luËn diÔn dÞch. Mét sè c«ng nghÖ AI phæ biÕn sö dông trong khai th¸c d÷ liÖu bao gåm nh÷ng m¹ng thÇn kinh, sù xÕp nhãm vµ nh÷ng c©y quyÕt ®Þnh. Nh÷ng m¹ng thÇn kinh theo dâi nh÷ng quy t¾c cña viÖc sö dông d÷ liÖu, mµ dùa trªn nh÷ng kÕt nèi ®îc t×m they hay trªn nh÷ng mÉu cña d÷ liÖu. Trong mét kÕt qu¶, phÇn mÒm liªn tôc ph©n tÝch gi¸ trÞ vµ so s¸nh nã víi c¸c hÖ sè kh¸c, vµ nã tiÕp tôc so s¸nh nh÷ng hÖ sè nµy cho ®Õn khi t×m ®îc nh÷ng mÉu næi lªn. Nh÷ng mÉu nµy ®îc biÕt nh nh÷ng quy t¾c. PhÇn mÒm t×m kiÕm nh÷ng mÉu kh¸c dùa trªn nh÷ng quy t¾c hoÆc ®a ra mét c¶nh b¸o khi mét gi¸ trÞ trigger ®îc t¹o nªn. Sù xÕp nhãm chia d÷ liÖu vµo trong nh÷ng nhãm dùa vµo sù t¬ng tù hoÆc nh÷ng ph¹m vi d÷ liÖu h¹n chÕ. C¸c bã ®îc sö dông khi d÷ liÖu kh«ng cã nh·n thuËn tiÖn cho viÖc khai má. VÝ dô, mét c«ng ty b¶o hiÓm muèn t×m thÊy nh÷ng ©m mu mµ kh«ng cã nh÷ng b¶n ghi cña nã g¾n nh·n gian lËn hay kh«ng gian lËn. Nhng sau khi ph©n tÝch nh÷ng mÉu trong bã, phÇn mÒm khai má cã thÓ b¾t ®Çu tÝnh ®Õn nh÷ng quy t¾c mµ trá vµo nh÷ng tuyªn bè nµo cã kh¶ n¨ng sai. Nh÷ng c©y quyÕt ®Þnh, gièng nh nh÷ng bã, ®Ó riªng biÖt d÷ liÖu vµo nh÷ng tËp con vµ sau ®ã ph©n tÝch tËp con ®Ó ph©n chia chóng vµo nh÷ng tËp con bÐ h¬n n÷a, vµ v.v (1 sè møc bÐ h¬n n÷a). Nh÷ng tËp con cuèi cïng ®ñ nhá ®Ó qu¸ tr×nh khai má cã thÓ t×m thÊy nh÷ng mÉu vµ nh÷ng mèi quan hÖ thó vÞ bªn trong d÷ liÖu. D÷ liÖu khai má ®îc x¸c ®Þnh mét lÇn, nã nªn bÞ tÈy s¹ch. ViÖc tÈy s¹ch d÷ liÖu gi¶i phãng nã khái c¸c th«ng tin trïng lÆp vµ d÷ liÖu lçi. TiÕp theo, d÷ liÖu nªn ®îc lu tr÷ theo ®Þnh d¹ng chuÈn bªn trong nh÷ng truêng hay nh÷ng danh môc. C¸c c«ng cô khai má cã thÓ lµm viÖc víi tÊt c¶ c¸c kiÓu d÷ liÖu lu tr÷, tõ nh÷ng kho d÷ liÖu lín ®Õn nh÷ng c¬ së d÷ liÖu m¸y ®Ó bµn ®Õn nh÷ng tËp tin c¬ b¶n. Nh÷ng kho d÷ liÖu vµ nh÷ng chî d÷ liÖu lµ nh÷ng ph¬ng ph¸p lu tr÷ mµ bao gåm sè lîng lín nh÷ng v¨n th lu tr÷ ®îc lu tr÷ theo c¸ch mµ khi c©n thiÕt cã thÓ dÔ dµng truy cËp. Khi qu¸ tr×nh ph©n tÝch hoµn thµnh , phÇn mªm khai má ph¸t sinh mét b¶n b¸o c¸o. Mét nhµ ph©n tÝch xem qua b¶n b¸o c¸o ®Ó thÊy

A08413 - NguyÔn §×nh TiÕn nh÷ng viÖc tiÕp theo cÇn lµm, nh lµ lµm thuÇn khiÕt c¸c tham sè, sö dông c¸c c«ng cô ph©n tÝch kh¸c ®Ó kh¶o s¸t s÷ liÖu hoÆc thËm chÝ “®Ëp vì “ d÷ liÖu nÕu nã kh«ng thÓ dïng ®îc. NÕu kh«ng cã c¸c c«ng viÖc tiÕp theo ®îc yªu cÇu, b¸o c¸o sÏ chØ ®Õn nh÷ng ngêi ra quyÕt ®Þnh cho nh÷ng hµnh ®éng thÝch hîp. Søc m¹nh cña viÖc khai má ®îc dïng cho nhiÒu môc ®Ých, nh ph©n tÝch nh÷ng quyÕt ®Þnh cña Toµ ¸n tèi cao, kh¸m ph¸ nh÷ng mÉu trong ch¨m sãc søc khoÎ, lÊy nh÷ng c©u chuyÖn vÒ c¸c ®èi thñ tõ newswire, gi¶i quyÕt qu¸ tr×nh th¾t cæ chai trong qua tr×nh s¶n xuÊt, vµ ph©n tÝch nh÷ng chuçi trong di truyÒn häc ë con ngêi. ThËt sù kh«ng cã giíi h¹n cña c¸c kiÓu kinh doanh hay vïng ®Êt cña häc hµnh n¬i mµ khai th¸c d÷ liÖu cã thÓ cã lîi. Data mining is simply filtering through large amounts of raw data for useful information that gives business a competitive edge. This information is made up of meaningful patterns and trends that are already in the data but were previously unseen. The most popular tool used when mining is artificial intelligence (AI). AI technologies try to work the way the human brain works, by making intelligent guesses, learning by example, and using deductive reasoning. Some of the more popular AI methods used in data mining include neural networks, clustering, and decision trees. Neural networks look at the rules of using data, which are based on the connections found or on the sample set of data. As a result, the software continually analyses value and compares it to the other factors, and it compares these factors repeatedly until it finds patterns emerging. These patterns are known as rules. The software then looks for other patterns based on these rules or sends out an alarm when a trigger value is hit. Clustering divides data into groups based on similar features or limited data ranges. Clusters are used when data isn’t labeled in away that is favourable to mining. For instance, an insurance company that wants to find instances of fraud wouldn’t have its records labeled as fraudulent or not fraudulent. But after analyzing patterns within clusters, the mining software can start to figure out the rules that point to which claims are likely to be false. Decisions trees, like clusters, separate the data into subsets and then analyse the subsets to divide them into further subsets, and so on (for a few more levels). The final subsets are then small enough that the mining process can find interesting patterns and relationships within the data.

A08413 - NguyÔn §×nh TiÕn Once the data to be mined is identified, it should be cleansed. Cleansing data frees it from duplicate information and erroneous data. Next, the data should be stored in a uniform format within relevant categories or fields. Mining tools can work with all types of data storage, from large data warehouse to smaller desktop databases to flat files. Data warehouses and data marts are storage methods that involve archiving large amounts of data in a way that makes it easy to access when necessary. When the process is complete, the mining software generates a report. An analyst goes over the report to see if further work needs to be done, such as refining parameters, using other data analysis tools to examine the data, or even scrapping the data if it’s unusable. If no further work is required, the report proceeds to the decision makers for appropriate action. The power of data mining is being used for many purposes, such as analyzing Supreme Court decisions, discovering patterns in health care, pulling stories about competitors from newswires, resolving bottlenecks in production processes, and analyzing sequences in the human genetic makeup. There really is no limit to the type of business or area of study where data mining can be beneficial.

Related Documents

Data Mining
May 2020 23
Data Mining
October 2019 35
Data Mining
November 2019 32
Data Mining
May 2020 21
Data Mining
May 2020 19
Data Mining
November 2019 34