Knowledge Discovery Analysis

  • Uploaded by: Bridget Smith
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Knowledge Discovery Analysis as PDF for free.

More details

  • Words: 1,286
  • Pages: 7
By J. Swathi & D. Divya GMR Institute Of Technology, Rajam, Srikakulam District-532 127, Andhra Pradesh. E-mail: [email protected] [email protected]

ABSTR ACT

more

applications

like

Customer

Retention, Marketing, Risk Assessment, Fraud detection and others.

Intro

In this world of exponential growth of data, accessing the desired information

or

the

extraction

of

duction

knowledge from data is called Data Mining or KDD (Knowledge Discovery Analysis). KDD has been mostly used by artificial intelligence and machine learning researchers. This paper analyzes data from different perspectives to find relationships and patterns among dozens of fields in large relational databases by latest trends and methods.

of

data

gathered

from

multiple sources stored under a unified schema at a single site. In this paper, we will discuss about the Data Warehouse design

using

star

and

snowflake

schemas. We are frequently using Star schema, it has more advantages over the other

schemas.

normalize

Snowflake

dimensions

to

schemas eliminate

redundancy. Both Data Mining and Data Warehousing

are

today’s

fiercely

competitive market place, companies have an insatiable need for information. Customer data, Financial data and Internet-click stream data is a powerful asset provided it can be integrated and utilized

to

enhance

customer

experiences.

Data Warehousing is a repository

In

important

in

the

present competitive market world with

The ability to access meaningful data, moving and sharing of data throughout an organization between departments,

officers

and

business

partners in a timely efficient manner through the use of familiar query and analytical tools is critical. DEF: A Database is a collection of nonredundant

data

which

is

between different applications.

sharable

databases.” is known as spatial Data

What is Data Mining? Data Mining is defined as

Mining.

“the non-trivial extraction of implicit,

The applications are useful in remote

previously unknown, potentially useful

sensing, medical, navigation, and related

and understandable knowledge from

uses.

data”. Data Mining is the process of finding correlations or patterns among dozens of fields in large relational

Time Series/Sequence Data Mining: Another important area in

databases.

Latest Trends in Technologies

time series and sequence-based data.

and Methods: There are many number of Data Mining trends is in terms of technologies and methodologies which are currently being developed and researched.

The trends

identified includes: Distributed / Collective Data Mining: The information located in different places, in different physical locations is generally known as distributed Data Mining.

Distributed

Data

Mining

(DDM) is used to offer a different approach

to

Data Mining centers on the mining of

traditional

approaches

analysis, by using a combination of localized data analysis, together with a “global data model”. Spatial and Geographic Data Mining: “The extraction of implicit knowledge, spatial relationships or other patterns not explicitly stored in spatial

This involves the mining of a sequence of data.

Sequential pattern

mining focuses on the identification of sequences. Hypertext

and

Hypermedia

Data

Mining: Hypertext and Hypermedia Data Mining can be characterized as mining data which includes text, hyperlinks and text markups. Phenomenal Data Mining: Phenomenal Data mining focuses on the relationships between data and the phenomenon

which are inferred from

the data is not went well in data ware project.

and is grouped under business oriented

Applications of Data Mining:Data

Mining

collects,

stores

and

products, sales analysis report and

organizes data for use in areas such as •



Data

Mining

and

customer

data modeling.

software for solving business

Integrated:- Data Warehouses must put

decision problems

data from disparate sources into a

Privacy of data in Insurance

consistent format. They must resolve

companies

problems such as naming conflicts and

and

Government

inconsistencies among units of measure.

Fraud

detection

in

Telecommunications and stock Medical

diagnosis

to

detect

abnormal patterns •

Airline reservation to maximize seat utilization

What Is Data Warehousing? A Data Warehouse is a relational database that is designed for query and analysis rather than for transaction

When they achieve this, they are said to be integrated. Non-volatile:- Once loaded into the Data

exchanges •

marketing campaigns achieved through

relationship management (CRM)

agencies •

subject headings such as customers,

processing.

It

contains

historical data derived from transaction data.Data Warehouses characteristics, 

Subject Oriented



Integrated



Nonvolatile



Time Variant

Subject Oriented:- The data in the warehouse is defined in business terms

Warehouse , the data is not updated. Acts as stable resource for consistent reporting and comparative analysis. Time-variant:- All data in the Data Warehouse is time stamped at time of entry into the warehouse or when it is summarized within the warehouse to act as chronological record and to provide historical and trend analysis possibilities.

Architecture

of

Data

Data Warehouse Architecture(with a staging area):

Warehouse:-

Most data warehouses use a

Three common architectures in data

staging area instead. A staging area

warehouse are: •

Data Warehouse Architecture (Basic)



simplifies building summaries and general warehouse management.

Data Warehouse Architecture (with a Staging Area)



Data Warehouse Architecture (with a Staging Area and Data Marts)

Data Warehouse Architecture(Basic): The metadata and raw data of a traditional online transaction processing (OLTP) system is present, as is an additional type of data, summary data. A summary in Oracle is called a materialized view.

Data Warehouse Architecture(with a staging area & Data marts): We may want to customize your warehouse's architecture for

different

groups

within

our

organization. We can do this by adding data marts, which are systems designed for a particular line of business.

indexes, and synonyms. Commonly used Schemas are Star schema, Snowflake schema. Star Schema:

The star schema is the

simplest schema. The entity-relationship diagram of this schema resembles a star. The center of the star consists of a large fact table and the points of the star are the dimension tables. A Star schema is characterized by one or more fact tables and dimension tables. The main advantages of star schemas are : •

Provide a direct and intuitive mapping between the business entities being analyzed by end

Processes

within

a

Data

Warehouse:-

users and the schema design. •

Are widely supported by a large number of business intelligence



Extract and load the data

tools.A star join is a primary key



Clean and transform data into a

to foreign key join of the

form that can cope with large

dimension tables to a fact table.

data volumes and provide good query performance

Snowflake Schema: The Snowflake schema is a



Backup and archive data

more complex data warehouse model



Manage queries, and direct them

than a star schema, and is a type of star

to the appropriate data sources

schema. The diagram of the schema

Schemas in Data Warehouse: A schema is a collection of database objects, including tables, views,

resembles a snowflake. Snowflake schemas normalize dimensions to eliminate redundancy.

Mining tools are continually evolving , building ideas from the latest scientific research.

In this paper, the concepts like importance,

major

trends &

methods of Data Mining as well as architecture

and

Warehouse

using

involved in

effectively managing the

design various

of

Data

schemas

Data Warehouse are focused.

these

tools

optimistic to say that Data Mining has a bright and promising future, and that the years to come will bring many new developments

methods,

and

technologies. The field of Data Mining young

enough

from AI, Neural networks, Statistics and Optimization. Data Warehouse usually contains historical data derived from transaction data, but it can include data from other sources. The determination of which schema model should be used for

It would not be overly

still

of

incorporate the latest algorithms taken

Conclusion:

is

Many

that

the

possibilities are still limitless. Data

a Data Warehouse should be based upon the requirements and preferences of the Data Warehouse project team. Star schemas are widely supported by a large number of business intelligence tools where as Snowflake schemas normalize dimensions to eliminate redundancy.

Related Documents

Discovery
May 2020 35
Discovery
November 2019 79
Discovery
August 2019 66
Discovery
June 2020 18

More Documents from ""