GuruPrevails Machine Learning Course Basic Math Examples given in Octave code Victor K Miclovich vicmiclovich@{ugandasoft, gmail}.com December 5, 2009
2
Preface I would like to welcome you to this tutorial courtesy of GuruPrevails. We want to make Machine learning a really wonderful experience and I hope these tutorials shall give you great satisfaction. Good luck and have fun! Victor K Miclovich December, 2009 Uganda
Contents Preface
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 Introduction 1.1 Starting Octave . . . . . . . . . . . . . 1.1.1 Using Octave like a calculator . 1.1.2 Inbuilt functions . . . . . . . . 1.1.3 Some constants to take note of. 2 Linear Algebra basics 2.1 Matrix(-ces) . . . . . . . . . . . . . 2.1.1 Representing a matrix . . . 2.1.2 Matrix operations... at least 2.2 Assignment . . . . . . . . . . . . . 2.3 Vector Spaces . . . . . . . . . . . . 2.3.1 What is a vector? . . . . . 2.3.2 What is a vector space? . . 2.3.3 The real definition! . . . . .
2
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
5 5 5 6 6
. . . . . . . . . . . . . . . . some of them . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
7 7 7 8 10 10 10 10 10
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
13 13 13 13 13 13 14 14 14 15
. . . .
. . . .
. . . .
3 Probability Theory 3.1 Simple terminology . . . . . . . . . . . . . . 3.1.1 Events . . . . . . . . . . . . . . . . . 3.1.2 Probability space or Sample space . 3.1.3 Probability . . . . . . . . . . . . . . 3.1.4 Review of basic Probability concepts 3.1.5 Law of Total Probability . . . . . . . 3.1.6 Baye’s Theorem . . . . . . . . . . . 3.2 Where Probability occurs in Octave . . . . 3.3 Functions that are probabilistic in nature .
. . . .
. . . . . . . . .
. . . .
. . . . . . . . .
. . . . . . . . .
4 What next? 17 4.1 Alternatives that we are going to use . . . . . . . . . . . . . . 17 4.1.1 What makes Python different from Octave/MatLab? . 17
3
4
CONTENTS
Chapter 1
Introduction In this section I briefly demonstrate the power of Octave in solving mathematical problems. Octave is supported by many operating systems: commonly in Microsoft Windows, Linux and Mac OS... In this tutorial, I will target an audience with a strong UNIX1 . So get ready...
1.1
Starting Octave
1. Start your terminal2 2. Type Octave3 3. Start hacking; you should see a prompt something like this octave:1>
1.1.1
Using Octave like a calculator
Octave can be used like a calculator; this just means you could type mathematical expressions and have a result come back to you straight away. Addition octave:1> 3 + 4 =>7 This is a simple example... and almost any kind of operand4 will work. 1
Don’t worry if you don’t have strong UNIX background... similar things can get done in Windows 2 Windows users should look for Octave under the Programs menu of their machine; install Octave if you don’t have it. 3 Windows users will probably click under the programs menu... and a terminal should pop out. 4 In the above example, the operands are 3 and 4
5
6
CHAPTER 1. INTRODUCTION
Subtraction octave:1> 5 - 2 =>3 Multiplication octave:1> 5 * 2 =>10
1.1.2
Inbuilt functions
There are many inbuilt functions in Octave. Examples include sin, cos, tan, sqrt, etc. octave:1>sqrt(4) =>2 octave:2>sin(pi / 2) =>1 octave:3>sin(90) =>0.89405
1.1.3
Some constants to take note of.
octave:1>pi =>3.1416 octave:2>e 2.7183 There are many other constants in science and math that even I may not know about because that’s not my specialty... physicists for example will have their known set of constants, chemists have theirs... Mathematicians like I have our own special set of constants that we use... The next sections of this tutorial are going to be on more specific topics. Enjoy reading :)
5
trigonometric functions usually take radians an input... e.g. 90◦ is
π 2
Chapter 2
Linear Algebra basics 2.1 2.1.1
Matrix(-ces) Representing a matrix
A matrix is just an array of numbers. These numbers could be real numbers, integers or some other system of numbers or numerics. Mathematical representation of an m × n matrix
Am,n
a1,1 a2,1 = . ..
a1,2 a2,2 .. .
··· ··· .. .
am,1 am,2 · · · In Octave I show several types of matrices Row matrix octave:1>A = [3,4,5] => 3 4 5 Column matrix octave:2>A = [3; 4; 5] => 3 4 5 A 2 × 3 matrix octave:3>A = [1, 2, 3; 4, 5, 6;] => 7
a1,n a2,n .. . am,n
8
CHAPTER 2. LINEAR ALGEBRA BASICS
1 2 3 4 5 6 or octave:3>A = [ [1, 2, 3]; [4, 5, 6]] => 1 2 3 4 5 6 A 3 × 3 matrix octave:4>A = [ [1, 2, 3]; [4, 5, 6]; [7, 8, 9] ] => 1 2 3 4 5 6 7 8 9
2.1.2
Matrix operations... at least some of them
I am going to use the 3 × 3 matrix A used in the previous sub-section. Adding two matrices octave:1># given two matrices A and some B octave:2>A = [ [1,2,3]; [4, 5, 6]; [7, 8, 9] ]; octave:3># a semi-colon at the end of the statement above prevents Octave from print A octave:4>B = [ [10, 11, 12]; [13,14,15]; [16,17,18] ]; octave:5>A + B => 11 13 15 17 19 21 23 25 27 Please, don’t get confused by my use of #; this is put inside your program to help you read and understand the logic of your program. It is good practice to keep commenting your programs... any programming language’s decoder will understand that it does not have to execute any instructions in the line that has the ’#’. One can use a ’%’ symbol to comment. I’ve just decided to use ’#’ symbol at the start of that line. Other programming languages have other ways in which comments are placed in their code. Scalar products
2.1. MATRIX(-CES)
9
octave:1> newA = 2 * A; octave:2># remember to always give A some values. You can test other values... octave:3># A is a variable; a place in computer memory where we stor the values octave:4># in this case those values are entries/elements of an array octave:5> newA => 2 4 6 8 10 12 14 16 18 Take note of my use of newA; this is called variable... and the line newA = 2 * A computes a value and puts it under that name... so I can use newA anywhere in my program. Please read the Octave manual to get other concepts in programming using the Octave language. Don’t forget to PRACTICE Finding a transpose of matrix/vector A definition of transpose is in this footnote1 octave:1> A = [3,2,3; 1,3,6]; octave:2> A transpose = A’; octave:3> A transpose => 3 1 2 3 3 6 Determinants Determinants are usually calculated for square matrices n × n; you might turn up with some nasty errors if you work with singular or m × n matrices where m 6= n octave:1>A = [2,3,4];[3,15,20];[7,8,1]; octave:1>det(A) => -203.00
1
Mathematical definition of a Transpose of a matrix: Given an m × n matrix, it’s transpose is just that matrix with rows interchanged for columns... an n × m; [A]Tij = [A]ji
10
CHAPTER 2. LINEAR ALGEBRA BASICS
2.2
Assignment
You are going to test the following functions; study the math and octave code too! 1. Given ~v = (3, 2, 3), ~u = (4, 6, 7), Find ~v .~u v = [3 2 3]; u=[4 6 7]; dot(v,u) 2. Using the above vectors, get the cross product, ~v ×~u [Hint: cross(x,y) gives ~x × ~y ] 3. Think of other vectors... try out 2 dimensional, 3-D, 4-D and ndimensional vectors. Also, take note of my exclusion of a comma i.e. v = [3 2 3] is the same as v = [3, 2, 3]
2.3 2.3.1
Vector Spaces What is a vector?
A vector is a line segment whose length is its magnitude and whose orientation in space is its direction.
2.3.2
What is a vector space?
A vector space is a mathematical structure formed by a collection of vectors.
2.3.3
The real definition!
Let V be a set on which addition and scalar multiplications are defined2 . If that is the case, then for all objects3 ~u, ~v , and w ~ in V and all scalers c and k then V is called a vector spaces and the objects in V are called vectors Below are some axioms we look at: 1. ~u + ~v is in V 4 2. c~u is in V 5 3. ~u + ~v = ~v + ~u 4. u + v = v + u 2 if ~ u and ~v are objects in V and c is a scalar, then we can in some way, define ~v + ~ u and c~ u or c~v . 3 we use objects here because vector has 2 things its describing: orientation and length/size... which makes the term sound more ”important”. 4 closed under addition 5 closed under scalar multiplication
2.3. VECTOR SPACES
11
5. ~u + (~v + w) ~ = (~u + ~v ) + w ~ 6. zero vector is a special vector in V space. 7. For every ~u in V there’s another object in V , denoted −~u and called the negative of ~u such that ~u + (−~u) = 0 or ~u − ~u = 0 8. c(~u + ~v ) = c~u + c~v 9. (c + k)~u = c~u + k~v 10. c(k~u) = (ck)~u 11. 1 × ~u = ~u From your knowledge of the previous sections, implement these axioms for the following vectors... if this isn’t enough you can set your questions :) (a) ~v = (2, 7) octave:1> v = [-13; -25] Use the constants c = 3, 5, 6, k = 3, 5, 6 And test using the Octave interpreter6
6
another name for the terminal that you see on running Octave
12
CHAPTER 2. LINEAR ALGEBRA BASICS
Chapter 3
Probability Theory 3.1 3.1.1
Simple terminology Events
An event is an outcome or occurrence that has a probability assigned to it.
3.1.2
Probability space or Sample space
These are all the possible outcomes of an experiment or series of events. We can represent this mathematically using an S
3.1.3
Probability
To find the probability of an event A P (A) =
n(A) n(S)
At times we may need to find the complement of some probability A0 , that event that does not occur. P (A0 ) = 1 − P (A)
3.1.4
Review of basic Probability concepts
To find the probability of getting event A or B, P (A ∪ B) = P (A) + P (B) − P (A ∩ B) For mutually exclusive events P (A ∩ B) is zero! So that means you just add up the probabilities of events A and B occurring! Conditional probabilities are measure the probability of one event occurring relative to another occurring. P (A ∩ B) P (A|B) = P (B) 13
14
CHAPTER 3. PROBABILITY THEORY
This can easily be written as: P (A ∩ B) = P (A|B) × P (B) For some extra credit and work... please re-familiarize yourselves with the probability tree... I won’t do that here... Probability trees make calculations of conditional probabilities easy... conditional probabilities occur a lot with decision making algorithms, recursive depth search and many other algorithms you are gonna encounter in your Machine learning careerer.
3.1.5
Law of Total Probability
If you have two events A and B, then P (B) = P (B ∩ A) + P (B ∩ A0 ) = P (A)P (B|A) + P (A0 )P (B|A0 ) This has an interesting attribute1 .
3.1.6
Baye’s Theorem
If you have n mutually exclusive and exhaustive events A1 , A2 , through to An , and B is another event, we have then: P (A|B) =
P (A)P (B|A) P (A)P (B|A) + P (A0 )P (B|A0 )
The Baye’s Theorem is used in computing as a way of filtering emails and detecting spam... most training sets use Baye’s theorem; and this teaches a ”program”/machine to easily detect commonly used words in Junk mail or SPAM; and so we can easily classify and cluster emails as correctly addressed and due to you and the ones that head off to your SPAM box.
3.2
Where Probability occurs in Octave
Probability in computers is in many causes referred to as ”psuedo-random”... calculations that are made by a bunch of transistors and crystals2 , and other complicated electronics stuff that is set to compute something that is seemingly ”random” and that is what we use to mimic an un-planned-for event – an accident of chance! So... these are just a few cases where probability is used... 1 2
The equation above is the denominator of Baye’s Theorem timers
3.3. FUNCTIONS THAT ARE PROBABILISTIC IN NATURE
3.3
15
Functions that are probabilistic in nature
octave:1>rand() The rand() function returns a matrix with random elements uniformly distributed on the interval (0, 1).
16
CHAPTER 3. PROBABILITY THEORY
Chapter 4
What next? During the next few weeks we shall be looking at probability distributions and have them implement in Octave... this is a rather intensive part of this course... that of being able to convert normal mathematics into understanding! I shall however post as many resources as I can... please try to look through both the code and references that I leave. The next tutorial shall be about generation of vectors! Watch this space!
4.1
Alternatives that we are going to use
I have written this section to encourage you all to get in times with Python. Python some fabulous support for many algorithms and scientific computing.
4.1.1
What makes Python different from Octave/MatLab?
• It is an object oriented interpreted programming language • It has extensive support • It is relatively fast; but if more speed is need extra programming can be done in C and/or C++ • There’s a lot more fun stuff... like libraries that let you play with the web; you can implement features like web scrappers, SPAM filters, signal analysis, training algorithms, etc. During the course, we shall look at python and more math in detail.
17