Neural Networks

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Neural Networks as PDF for free.

More details

  • Words: 1,110
  • Pages: 4
Lesson : 11 Topic: Neural Networks & Sequences NEURAL NETWORKS Computer science has taken two approaches to making a computer behave with what appears to be intelligence. 1. One is the expert system approach that figuring out the rules that guide human behavior, then representing these rules, in a totally different form, in a computer program. 2. Neural networks exemplify a second approach. They try to mimic the way human and animal brains function – first learning from experience how to deal with certain types of situations, then applying this earning to new situations of the type. A neural network consists of multiple, interconnected cells whose behavior is based on the neurons that control the behavior of humans and animals. Each neuron receives signals from system inputs or from other neurons. Based on the signals it receives, it generates an output, which it sends to system outputs or to other neurons. Conceptual structure of neural network Network outputs

Output layer

Hidden layer Input layer

Network input

• • • •

The cells and their interconnections are usually simulated by suitable data structures and variables whose values change over time. At each step, each neuron sends a data value to the other neurons to which it is connected, as defined by the tables that control the simulation. In the next time step, each neuron collects its inputs, figures out what its output should be, and sends that output on to the next level. The process continues until the output emerges.

How to connect to neurons? • The challenge of developing a neural network to identify patterns is figuring out how to connect its neurons. • The basic approach involves making a random set of connections, trying out the resulting network, and seeing if it produces useful outputs. • If it does, the connections in it are given high scores. • If it doesn’t, they are given low scores. • The process is repeated with many other sets of connections. • The high-scoring connections are reused in new network designs while the low-scoring ones are discarded. • This procedure is known as a genetic algorithm because it resembles the way biological genetics works: the scoring for results corresponds to survival of the fittest, while the mixing of high-scoring connections corresponds to the mixing of parental genetic material. • Genetic algorithms also incorporate the capability for statistical mutations, preventing them from getting permanently bogged down with one set of “chromosomes” and never improving beyond what that set makes possible. Drawback: • A practical drawback is that they can’t tell us how they reach a conclusion. Nearest Neighbor Approaches It involves finding old cases that are close to each new one and assuming its outcome will match the majority of those neighbors. This can be done either on the fly, by looking for neighbors each time a new case comes along, or in advance, by predetermining the regions within which old cases tend to have one outcome or another. This is called the knearest neighbors method (k-nn), where k is the number of neighbors to look at for each point. This approach is also referred to as memory-based reasoning, because it is based on remembering where other points fall in the database. The number of neighbors to consider, k, is a parameter in this type of analysis. Using too few neighbors can create small regions where a random cluster of anomalous results distorts the outcome.

Putting the results to use When there are more than two factors, it is usually possible to divide the set of subjects into more than two categories. In data mining, high lift is good, as it means that the data mining process has identified factors that affect the outcome. The higher the lift, the higher the business value of the model. A model with no lift at all-that is, a model that has no ability to predict which members of the overall population are more or less likely to behave in a desirable or undesirable fashion-is of no business value at all. Conversely, a model that could distinguish between the two population subsets with total accuracy would be valuable indeed. SIMILARITY SEARCH OVER SEQUENCES A lot of information stored in databases consists of sequences. Query model: The user specifies a query sequence and wants to retrieve all data sequences that are similar to the query sequence. A data sequence X = <x1,...,xk>. A subsequence Z = . It is obtained from another sequence X=<x1,…,xk> by deleting numbers from the front and back of the sequence X. Z1=x1,z2=xi+1…..,zj=zi+j+1. X=<x1,….,xk> Y= Euclidean norm – as the distance between the two sequences. k 2 ||X – Y|| = ∑ (xi – yi) i=1 Similarity queries over sequences can be classified into two types. • Complete Sequence Matching: The query sequence and the sequences in the database have the same length. Given a user-specified threshold parameter ε, our goal is to retrieve all sequences in the database that are within ε-distance to the query sequence. • Subsequence Matching: The query sequence is shorter than the sequences in the database. In this case, we want to find all subsequences of sequences in the database such that the subsequence is within distance ε of the query sequence. An algorithm to find similar sequences Given a collection of data sequences, a query sequence, and a distance threshold ε, how can we efficiently find all sequences within ε-distance of the query sequence? • One possibility is to scan the database, retrieve each data sequence, and compute its distance to the query sequence. While this algorithm has the merit of being simple, it always retrieves every data sequence. • Because we consider the complete sequence matching problem, all data sequences and the query sequence have the same length.

• • • • •

Each data sequence and query sequence can be represented as a point in a kdimensional space. If we insert all data sequences into a multidimensional index, we can retrieve data sequences that exactly match the query sequences by querying the index. But since we want to retrieve not only data sequences that match the query exactly but aso all sequences within ε-distance of the query sequence, we do not use a point query as defined by the query sequence. Instead, we query the index with a hyper-rectangle that has side-length 2ε and the query sequence as center, and we retrieve all sequences that fall within this hyperrectangle. We then discard sequences that are actually further than ε away from the query sequence.Using the index allows us to greatly reduce the number of sequences we consider and decreases the time to evaluate the similarity query significantly.

Related Documents

Neural Networks
November 2019 56
Neural Networks
June 2020 27
Neural Networks
November 2019 42
Neural Networks
November 2019 42
55206-mt----neural Networks
October 2019 23