By J. Swathi & D. Divya GMR Institute Of Technology, Rajam, Srikakulam District-532 127, Andhra Pradesh. E-mail:
[email protected] [email protected]
ABSTR ACT
more
applications
like
Customer
Retention, Marketing, Risk Assessment, Fraud detection and others.
Intro
In this world of exponential growth of data, accessing the desired information
or
the
extraction
of
duction
knowledge from data is called Data Mining or KDD (Knowledge Discovery Analysis). KDD has been mostly used by artificial intelligence and machine learning researchers. This paper analyzes data from different perspectives to find relationships and patterns among dozens of fields in large relational databases by latest trends and methods.
of
data
gathered
from
multiple sources stored under a unified schema at a single site. In this paper, we will discuss about the Data Warehouse design
using
star
and
snowflake
schemas. We are frequently using Star schema, it has more advantages over the other
schemas.
normalize
Snowflake
dimensions
to
schemas eliminate
redundancy. Both Data Mining and Data Warehousing
are
today’s
fiercely
competitive market place, companies have an insatiable need for information. Customer data, Financial data and Internet-click stream data is a powerful asset provided it can be integrated and utilized
to
enhance
customer
experiences.
Data Warehousing is a repository
In
important
in
the
present competitive market world with
The ability to access meaningful data, moving and sharing of data throughout an organization between departments,
officers
and
business
partners in a timely efficient manner through the use of familiar query and analytical tools is critical. DEF: A Database is a collection of nonredundant
data
which
is
between different applications.
sharable
databases.” is known as spatial Data
What is Data Mining? Data Mining is defined as
Mining.
“the non-trivial extraction of implicit,
The applications are useful in remote
previously unknown, potentially useful
sensing, medical, navigation, and related
and understandable knowledge from
uses.
data”. Data Mining is the process of finding correlations or patterns among dozens of fields in large relational
Time Series/Sequence Data Mining: Another important area in
databases.
Latest Trends in Technologies
time series and sequence-based data.
and Methods: There are many number of Data Mining trends is in terms of technologies and methodologies which are currently being developed and researched.
The trends
identified includes: Distributed / Collective Data Mining: The information located in different places, in different physical locations is generally known as distributed Data Mining.
Distributed
Data
Mining
(DDM) is used to offer a different approach
to
Data Mining centers on the mining of
traditional
approaches
analysis, by using a combination of localized data analysis, together with a “global data model”. Spatial and Geographic Data Mining: “The extraction of implicit knowledge, spatial relationships or other patterns not explicitly stored in spatial
This involves the mining of a sequence of data.
Sequential pattern
mining focuses on the identification of sequences. Hypertext
and
Hypermedia
Data
Mining: Hypertext and Hypermedia Data Mining can be characterized as mining data which includes text, hyperlinks and text markups. Phenomenal Data Mining: Phenomenal Data mining focuses on the relationships between data and the phenomenon
which are inferred from
the data is not went well in data ware project.
and is grouped under business oriented
Applications of Data Mining:Data
Mining
collects,
stores
and
products, sales analysis report and
organizes data for use in areas such as •
•
Data
Mining
and
customer
data modeling.
software for solving business
Integrated:- Data Warehouses must put
decision problems
data from disparate sources into a
Privacy of data in Insurance
consistent format. They must resolve
companies
problems such as naming conflicts and
and
Government
inconsistencies among units of measure.
Fraud
detection
in
Telecommunications and stock Medical
diagnosis
to
detect
abnormal patterns •
Airline reservation to maximize seat utilization
What Is Data Warehousing? A Data Warehouse is a relational database that is designed for query and analysis rather than for transaction
When they achieve this, they are said to be integrated. Non-volatile:- Once loaded into the Data
exchanges •
marketing campaigns achieved through
relationship management (CRM)
agencies •
subject headings such as customers,
processing.
It
contains
historical data derived from transaction data.Data Warehouses characteristics,
Subject Oriented
Integrated
Nonvolatile
Time Variant
Subject Oriented:- The data in the warehouse is defined in business terms
Warehouse , the data is not updated. Acts as stable resource for consistent reporting and comparative analysis. Time-variant:- All data in the Data Warehouse is time stamped at time of entry into the warehouse or when it is summarized within the warehouse to act as chronological record and to provide historical and trend analysis possibilities.
Architecture
of
Data
Data Warehouse Architecture(with a staging area):
Warehouse:-
Most data warehouses use a
Three common architectures in data
staging area instead. A staging area
warehouse are: •
Data Warehouse Architecture (Basic)
•
simplifies building summaries and general warehouse management.
Data Warehouse Architecture (with a Staging Area)
•
Data Warehouse Architecture (with a Staging Area and Data Marts)
Data Warehouse Architecture(Basic): The metadata and raw data of a traditional online transaction processing (OLTP) system is present, as is an additional type of data, summary data. A summary in Oracle is called a materialized view.
Data Warehouse Architecture(with a staging area & Data marts): We may want to customize your warehouse's architecture for
different
groups
within
our
organization. We can do this by adding data marts, which are systems designed for a particular line of business.
indexes, and synonyms. Commonly used Schemas are Star schema, Snowflake schema. Star Schema:
The star schema is the
simplest schema. The entity-relationship diagram of this schema resembles a star. The center of the star consists of a large fact table and the points of the star are the dimension tables. A Star schema is characterized by one or more fact tables and dimension tables. The main advantages of star schemas are : •
Provide a direct and intuitive mapping between the business entities being analyzed by end
Processes
within
a
Data
Warehouse:-
users and the schema design. •
Are widely supported by a large number of business intelligence
•
Extract and load the data
tools.A star join is a primary key
•
Clean and transform data into a
to foreign key join of the
form that can cope with large
dimension tables to a fact table.
data volumes and provide good query performance
Snowflake Schema: The Snowflake schema is a
•
Backup and archive data
more complex data warehouse model
•
Manage queries, and direct them
than a star schema, and is a type of star
to the appropriate data sources
schema. The diagram of the schema
Schemas in Data Warehouse: A schema is a collection of database objects, including tables, views,
resembles a snowflake. Snowflake schemas normalize dimensions to eliminate redundancy.
Mining tools are continually evolving , building ideas from the latest scientific research.
In this paper, the concepts like importance,
major
trends &
methods of Data Mining as well as architecture
and
Warehouse
using
involved in
effectively managing the
design various
of
Data
schemas
Data Warehouse are focused.
these
tools
optimistic to say that Data Mining has a bright and promising future, and that the years to come will bring many new developments
methods,
and
technologies. The field of Data Mining young
enough
from AI, Neural networks, Statistics and Optimization. Data Warehouse usually contains historical data derived from transaction data, but it can include data from other sources. The determination of which schema model should be used for
It would not be overly
still
of
incorporate the latest algorithms taken
Conclusion:
is
Many
that
the
possibilities are still limitless. Data
a Data Warehouse should be based upon the requirements and preferences of the Data Warehouse project team. Star schemas are widely supported by a large number of business intelligence tools where as Snowflake schemas normalize dimensions to eliminate redundancy.