A Paper Presentation On
Presented By M.KEERTHI
MD.NAZNEEN
(06C51A0446) ,
(06C51A0450),
III/IV B.Tech (E.C.E),
III/IV B.Tech (E.C.E) ,
e-mail:srikanthece424 @gmail.com
Mobile no:9848986162
CONTENTS:
• Abstract • More utilities • Delta rule • Algorithm • Motivation for neural networks • Types of applications • Where are neural networks being used • The perceptron • Assumptions • The perceptron algorithm • Comments on perceptron
BIBILIOGRAPHY: WWW.GOOGLE SEARCH.COM
ABSTRACT: Artificial neural networks are a method of computation and information processing that takes advantage of today's technology. Mimicking the processes found in biological neurons
l
artificial neural networks are used to predict and
learn from a given set a data. Neural networks are more robust at data analysis than statistical methods because of their ability to handle small variations of parameters and noise. The basic element of a neural network is the perception. First proposed by Frank Rosenblatt in 1958 at Cornell University, the perception has 5 basic elements: an n-vector input, weights, summing function, threshold device, and an output. Outputs are in the form of -1 and/or +1. The threshold has a setting which governs the output based on the summation of input vectors. If the summation falls below the threshold setting, a -1 is the output. If the summation exceeds the threshold setting, +1 is the output.
A more
technical investigation of a Single Neuron Preceptron shows that it can have an input vector X of N dimensions. These inputs go through a vector W of Weights of N dimension. Processed by the Summation Node, "a" is generated where "a" is the "dot product" of vectors X and W plus a Bias. "A" is then processed through an activation function which compares the value of "a" to a predefined Threshold. If "a" is below the Threshold, the perceptron will not fire. If it is above the Threshold, the perceptron will fire one pulse whose amplitude is predefined.
MORE UTILITIES:
A single layer neural network is capable of solving 80% of all problems with the only limitation being that it may take a long time to arrive at a solution. Adding more neural networks will increase the speed. How many neurons to use is governed by how the data is clustered and classified. The following example will illustrate this point. Let X symbolize undesired values and O symbolizes desired values. If the data is distributed in a way such that it can be linearly seperated, then a single perceptron can be used. For data which can not be linearly seperated, multiple seperation layers are used, which translates into using more perceptrons. Perceptrons can be added in series or parallel. In either case, a perceptron learns by adjusting the weights in such a way as to minimize the error between the output generated and the correct answer. This is called training a neural network and is best summarized by the Perceptron Learning Theorem which states that if a
Solution is possible, it will be found eventually.
Delta Rule: •
For classification problems, use the step function only to determine the class and not to update the weights.
•
How
the
classes
are
determined.
•
To train the network, we adjust the weights in the network so as to decrease the cost (this is where we require differentiability). This is called gradient descent.
Algorithm: •
Initialize the weights with some small random value
•
Until E is within desired tolerance, update the weights according to where E is evaluated at W(old), µ is the learning rate.: and the gradient is
If there are more than 2 classes
we could still use the same network but instead of having a binary target, we can let the target take on discrete values. For example of there are 5 classes, we could have t=1,2,3,4,5 or t= -2,-1,0,1,2. It turns out, however, that the network has a
much easier time if we have one output for class. We can think of each output node as trying to solve a binary problem (it is either in the given class)
•
Neural networks were started about 50 years ago. Their early abilities were exaggerated, casting doubts on the field as a whole there is a recent renewed interest in the field, however, because of new techniques and a better theoretical understanding of their capabilities.
Motivation for neural networks: Scientists are challenged to use machines more effectively for tasks currently solved by humans. •
Symbolic Rules don't reflect processes actually used by humans
•
Traditional computing excels in many areas, but not in others.
Types of Applications: Machine learning: •
having a computer program itself from a set of examples so you don't have to program it yourself. This will be a strong focus of this course: neural networks that learn from a set of examples.
•
Optimization: given a set of constraints and a cost function, how do you find an optimal solution? E.g. traveling salesman problem.
•
Classification: grouping patterns into classes: i.e. handwritten characters into letters.
•
Associative memory: recalling a memory based on a partial match.
•
Regression: function mapping
Neurobiology: Modeling models of how the brain works.
•
Neuron-level
•
Higher levels: vision, hearing, etc. Overlaps with cognitive folks.
Where are neural networks being used? •
Signal processing: suppress line noise, with adaptive echo canceling, blind source separation
•
Siemens successfully uses neural networks for process automation in basic industries, e.g., in rolling mill control more than 100 neural networks do their job, 24 hours a day
•
Robotics - navigation, vision recognition
•
Pattern recognition, i.e. recognizing handwritten characters, e.g. the current version of Apple's Newton uses a neural net
•
Medicine, i.e. storing medical records based on case information
•
Vision: face recognition, edge detection, visual search engines
•
Financial Applications: time series analysis, stock market prediction
•
Data Compression: speech signal, image, e.g. faces
•
Game Playing: backgammon, chess, go, ...
The Perceptron : The perceptron learning rule is a method for finding the weights in a network. •
We consider the problem of supervised learning for classification although other types of problems can also be solved.
•
A nice feature of the perceptron learning rule is that if there exist a set of weights that solve the problem, then the perceptron will find these weights. This is true for either binary or bipolar representations.
Assumptions:
•
We
have
single
layer
network
whose
output
is,
as
before,
Output = f(net) = f(W X) Where f is a binary step function f whose values are (+-1). •
We assume that the bias treated as just an extra input whose value is 1
•
p = number of training examples (x,t) where t = +1 or -1
The Perceptron Algorithm Initialize the weights (either to zero or to a small random value), pick a learning rate µ (this is a number between 0 and 1), until stopping condition is satisfied (e.g. weights don't change): For each training pattern (x, t):compute output activation y = f(w x), If y = t, don't change weights If y != t, update the weights: w(new) = w(old) + 2 µ t x or w(new) = w(old) + µ (t - y ) x, for all t.Consider what happens below when the training pattern p1 or p2 is chosen. Before updating the weight W, we note that both p1 and p2 are incorrectly classified (red dashed line is decision boundary). Suppose we choose p1 to update the weights as in picture below on the left. P1 has target value t=1, so that the weight is moved a small amount in the direction of p1. Suppose we choose p2 to update the weights. P2 has target value t=-1 so the weight is moved a small amount in the direction of -p2. In either case, the new
boundary
(blue
dashed
line)
is
better
than
before.
Comments on Perceptron •
The choice of learning rate µ does not matter because it just changes the scaling of w.
•
The decision surface (for 2 inputs and one bias) has equation: x2 = - (w1/w2) x1 w3 / w2, where we have defined w3 to be the bias: W = (w1,w2,b) = (w1,w2,w3)
•
From this we see that the equation remains the same if W is scaled by a constant.
The perceptron is guaranteed to converge in a finite number of steps if the problem is separable. May be unstable if the problem is not separable. Come to class for proof!! Outline: Find a lower bound L(k) for |w|2 as a function of iteration k. Then find an upper bound U(k) for |w|2. Then show that the lower bound grows at a faster rate than the upper bound. Since the lower bound can't be larger than the upper bound, there must be a finite
k such that the weight is no longer updated. However, this can only happen if all patterns
are correctly classified. Two Layer Net: The above is not the most general region. Here, we have assumed the top layer is an AND function. Problem: In the general for the 2- and 3- layers cases, there is no simple way to determine the weights.
CONCLUSION: