Genetic Programming: Principles and Applications Behnam Dezfouli Spring 2007
Introduction
ﻫﺎي ﺑﺮﻧﺎﻣﻪ ﺎ ژﻧﺘﻴﮓ ااﺳﺖ ﻛﻛﻪ ﺑﺮ روي ﺟﻤﻌﻴﺘﻲ ااز ﺎ اﻟﮕﻮرﻳﺘﻢ ﮓ ﻫﻤﺎن اﻟﮕ ﻞ ﺎ در ااﺻﻞ ﻛﺎﻣﭙﻴﻮﺗﺮي اﻋﻤﺎل ﺷﺪه اﺳﺖ ازاي ﻧﻈﺮ راا ﺑﻪﻪ ازا ﻫﺎي ﻣﻮرد ﻧﻈ ﺧﺮوﺟﻲ ﺎ ﺑﺘﻮاﻧﺪ ﺧ ﺖ ﻛﻪ ﺘ اﻧﺪ ﺑﺮﻧﺎﻣﻪﻪ ااﺳﺖ ﻚ ﻧﺎ اﻳﺠﺎد ﻳﻚ ﺪف ا ﺎ ﻫﺪف ورودﻳﻬﺎي ﺧﺎص ﺗﻮﻟﻴﺪ ﻛﻨﺪ ﺑﺮﻧﺎﻣﻪ ﻫﺎي ﻣﻤﻜﻦ ﺑﺮﻧﺎﻣﻪ ددر ﻓﻀﺎي ﺑ ﻧﺎﻣﻪ ﺗﺮﻳﻦ ﺑ ﻧﺎﻣﻪ ﻛﺎرا ﺗ ﻳﻦ ﺟﺴﺘﺠﻮي ﻛﺎ ا ﺟ ﺘﺠﻮي ﻓﻀﺎي ﺟﺴﺘﺠﻮ ﻫﻤﺎن symbolic expressionﻫﺎي LISPاﺳﺖ
S S-expressions: i T Terminals i l and dF Functions ti
Introduction 9ﻣﺸﺨﺼﻪ ي اﺻﻠﻲ :Genetic programming ﺳﻠﺴﻠﻪ ﻣﺮاﺗﺒﻲ ) (Hierarchicalﺑﻮدن ﺑﺮﻧﺎﻣﻪ ﻫﺎ
Representation as key issue ﻧﺎﻣﻨﺎﺳﺐ ﻫﺎي ﻧﺎ ﺎ ﺑﺮﻧﺎﻣﻪ ﺎ اﻳﺠﺎد ﻧﺎ ﻓﻀﺎي ﺟﺴﺘﺠﻮ و ا ﺎ اﻧﺪازه و ﺷﻜﻞ :ﻣﺤﺪودﻳﺖ ﻓﻀﺎ ﻣﺤﺪود ﻛﻛﺮدن اﻧ ا آزادي و دﻳﻨﺎﻣﻴﻚ ﺑﻮدن ﺳﺎﺧﺘﺎرﻫﺎ :ﺗﻮﻟﻴﺪ ﺟﻮاب ﺧﻮب ﺑﺮاي ﻣﺴﺌﻠﻪ
ﻣﻌﺎﻳﺐ ﻧﻤﺎﻳﺶ رﺷﺘﻪ ايString-base representation : ذات ﺑﺮﻧﺎﻣﻪ ﻫﺎ و روال ﻫﺎ ﺳﻠﺴﻠﻪ ﻣﺮاﺗﺒﻲ اﺳﺖ اب ﻧﻧﻤﻲ ﺎﻧﺪ رﺳﺎﻧﺪ ﺟﻮاب ﻣﺎ راا ﺑﻪﻪ ﺟ رﺷﺘﻪ اي ﺎ ﻧﻤﺎﻳﺶ ﺷﺘﻪ ﺘﻔﺎده از ﻧ ﺎ ﺶ ااﺳﺘﻔﺎده ﻧﻤﺎﻳﺶ رﺷﺘﻪ اي ﻗﺎدر ﺑﻪ ﻧﻤﺎﻳﺶ ﺳﺎﺧﺘﺎر ﻫﺎي ﺗﻜﺮار و ﺑﺎزﮔﺸﺘﻲ ﻧﻤﻲ ﺑﺎﺷﺪ ﻗﺎﺑﻠﻴﺖ ﺗﻐﻴﻴﺮ دﻳﻨﺎﻣﻴﻚ ﻧﺪارد .ﺗﻌﻴﻴﻦ اﻧﺪازه اوﻟﻴﻪ = ﻣﺤﺪودﻳﺖ زﻳﺎد در ﻣﻴﺰان ﭘﻴﭽﻴﺪﮔﻲ ﻣﺤﺎﺳﺒﺎﺗﻲ
Introduction ﻣﻘﺎﻳﺴﻪ ﺑﺎ اﻟﮕﻮرﻳﺘﻤﻬﺎي ژﻧﺘﻴﻚ: ﺳﺎﺧﺘﺎرﻫﺎي ﭘﻴﭽﻴﺪه ارا ﺑﻬﻴﻨﻪ ﻛﻛﻨﺪ ﻗﺎﻗﺎدر ااﺳﺖ ﺎﺧ ﺎ ﺎ ﺑﺮﻧﺎﻣﻪ ﺳﺎزي ژﻧﺘﻴﻜﻲ ﺑﺮاي ﻣﺤﺪوده وﺳﻴﻌﺘﺮي از ﻣﺴﺎﺋﻞ اﻟﮕﻮرﻳﺘﻢ ﺑﻪ ﻋﻨﻮان ﻳﻚ ﺳﻴﺴﺘﻢ: درﺟﻤﻌﻴﺖ ﻓﻌﻠﻲ ﻲ ﻣﻮﺟﻮد ر اﻓﺮاد ﻮ ﻮ ﻢ= ﺮ ووﺿﻌﻴﺖ ﺳﻴﺴﺘﻢ ورودي ﺳﻴﺴﺘﻢ = ﺗﻨﺎﺳﺐ اﻓﺮاد
ﺳﺎﺧﺘﺎرﻫﺎي ﺑﻜﺎر رﻓﺘﻪ در اﻟﮕﻮرﻳﺘﻢ Hierarchically structured computer programs whose size, shape, and complexity can dynamically change during the process.
.ﻛﺎر ﻧﻤﻲ آآﻳﻨﺪ ژﻧﺘﻴﻚ ﺑﻪ ﻛﺎ ﺳﺎزي ﻚ ﺑﺮﻧﺎﻣﻪ ﺎ ﺧﻄﻲ در ﺎ ﻳﻚ ﺑﻌﺪي ﺧﻄ ﺳﺎﺧﺘﺎرﻫﺎي ﻚ ﺎﺧ ﺎ ﺎ
F = {f1, {f1 f2, f2 ... , fn} T = {a1, a2, ... , am}.
اﻟﮕﻮرﻳﺘﻢ ﻓﺘﻪ ددر اﻟﮕﻮ ﻳﺘ ﺑﻜﺎر رﻓﺘﻪ ﻫﺎي ﺑﻜﺎ ﺳﺎﺧﺘﺎرﻫﺎي ﺳﺎﺧﺘﺎ
The functions may be: standard arithmetic operations (such as addition, subtraction, multiplication, and division), standard mathematical functions (such as SIN EXP etc.), SIN,EXP, etc ) Boolean operations, operations domain-specific domain specific functions, logical operators such as If-Then-Else, and iterative operators such as Do-Until, etc
The terminals may be: variable atomic arguments, such as the state variables of a system; constant atomic arguments, t such h as 0 and d 1; 1 and, d iin some cases, may be other atomic entities such as functions with no arguments.
اﻟﮕﻮرﻳﺘﻢ ﻓﺘﻪ ددر اﻟﮕﻮ ﻳﺘ ﺑﻜﺎر رﻓﺘﻪ ﻫﺎي ﺑﻜﺎ ﺳﺎﺧﺘﺎرﻫﺎي ﺳﺎﺧﺘﺎ
.1 .2 .3 3 .4 .5 .6
ﻫﺮ زﺑﺎﻧﻲ ﺑﺮاي ﺑﺮﻧﺎﻣﻪ ﻧﻮﻳﺴﻲ ژﻧﺘﻴﻚ ﻣﻲ ﺗﻮاﻧﺪ ﺑﻜﺎر ﺑﺮود. ﻣﺎ از LISPاﺳﺘﻔﺎده ﻣﻲ ﻛﻨﻴﻢ ،زﻳﺮا: ﺑﺮﻧﺎﻣﻪ ﻫﺎ و داده ﻫﺎ داراي ﻳﻚ ﻓﺮم ﻣﻲ ﺑﺎﺷﻨﺪ .دﺳﺘﻜﺎري GPﻣﺎﻧﻨﺪ داده ﻫﺎ. دﺳﺘﺮﺳﻲ آﺳﺎن ﺑﻪ درﺧﺖ parseﻓﺮاﻫﻢ اﺳﺖ S-expression .ﻣﻌﺎدل درﺧﺖ ﭘﺎرس اﺳﺖ. دارﻧﺪ. ﻣﺘﻐﻴﺮ دا ﻧﺪ ﺳﺎﺧﺘﺎرﻫﺎﻳﻲ ﻛﻪ اﻧﺪازه و ﺷﻜﻞ ﻣﺘﻐ دﺳﺘﻜﺎري آﺳﺎن ﺳﺎﺧﺘﺎ ﻫﺎﻳ دﺳﺘﻜﺎ ي دﺳﺘﻜﺎري آﺳﺎن ﺳﺎﺧﺘﺎرﻫﺎي ﺳﻠﺴﻠﻪ ﻣﺮاﺗﺒﻲ. ﻗﺎﺑﻠﻴﺖ ﺑﺎزدﺧﻮﻟﻲ ﺑﻮدن ).(Reentrant وﺟﻮد ﻣﺤﻴﻂ ﻧﺮم اﻓﺰاري ﻗﻮي.
ﺳﺎﺧﺘﺎرﻫﺎي ﺑﻜﺎر رﻓﺘﻪLISP S-expression :
ﺳﺎﺧﺘﺎرﻫﺎي ﺑﻜﺎر رﻓﺘﻪ در اﻟﮕﻮرﻳﺘﻢ ﻣﺜﺎل :1 ))OR (AND (NOT D0) (NOT D1)) (AND D0 D1 }F = {AND, OR, NOT }T = {D0, D1
ﻣﺤﻴﻂ :ﭼﻬﺎر ﺗﺮﻛﻴﺐ ﻣﻤﻜﻦ از ﻣﻘﺎدﻳﺮ D0 , D1
ﺳﺎﺧﺘﺎرﻫﺎي ﺑﻜﺎر رﻓﺘﻪ در اﻟﮕﻮرﻳﺘﻢ ﻣﺠﻤﻮﻋﻪ ﺗﻮاﺑﻊ و،ﺑﺮاي ﺣﺼﻮل ﻳﻚ ﻧﻤﺎﻳﺶ درﺳﺖ ﻛﻪ ﻗﺎدر ﺑﻪ ﺣﻞ ﻣﺴﺌﻠﻪ ﺑﺎﺷﺪ :ﺗﺮﻣﻴﻨﺎﻟﻬﺎ ﺑﺎﻳﺪ دو ﻣﺸﺨﺼﻪ داﺷﺘﻪ ﺑﺎﺷﻨﺪ
Closure property of the function set and terminal set: each function of the function set must be able to process all possible argument values generated by other functions or the terminal set set.
Sufficiency property of the function and terminal set: the problem bl must b be solvable l bl using i the h ((proposed) d) selection l i off functions and terminals.
Search Space ﺳﺎﺧﺘﺎرﻫﺎي اوﻟﻴﻪ
اﺳﺘﻔﺎده از ﻣﺠﻤﻮﻋﻪ ﺗﺮﻛﻴﺐ ﺗﺮﻣﻴﻨﺎﻟﻬﺎ و ﺗﻮاﺑﻊ و اﻧﺘﺨﺎب اﺗﻔﺎﻗﻲ آﻧﻬﺎ و ﺳﺎﺧﺖ ﺳﺎﺧﺘﺎرﻫﺎي اوﻟﻴﻪ. ﺗﻮزﻳﻊ اﺣﺘﻤﺎل و ﺗﻌﺪاد آآرﮔﻮﻣﺎﻧﻬﺎي ﺗﻮاﺑﻊ ،ﻋﻤﻖ درﺧﺖ را ﺗﻌﻴﻴﻦ ﻣﻲ ﻛﻨﺪ.
ﺗﻨﺎﺳﺐ ﺗﺎﺑﻊ ﺗﻨﺎ ﺗﺎ
ﻫﺮ ﻓﺮد ﻳﻚ ﻣﻘﺪار fitnessﺑﺎ ﺗﻮﺟﻪ ﺑﻪ ﻣﺤﻴﻂ دارد ﺗﻮﺟﻪ ﺑﻪ ﻣﺤﻴﻂ ﻣﺮﺑﻮط ﺑﻪ ﻣﺜﺎل 1 ﺑﺮاي اﻛﺜﺮ ﻣﻮارد raw fitnessﻫﻤﺎن ﺗﻔﺎوت ﻣﻴﺎن S-expressionو ﻧﻘﻄﻪ واﻗﻌﻲ اﺳﺖ
Search Space :در ﺻﻮرﺗﻲ ﻛﻪ ﻋﺒﺎرت از ﻧﻮع ﻋﺪد ﺻﺤﻴﺢ ﻳﺎ ﺣﻘﻴﻘﻲ ﺑﺎﺷﺪ The raw fitness r(i,t) of an individual LISP S-expression i in the population p p of size M at any yg generational time step p t is:
S(i,j) is the value returned by S-expression i for environmental case j (of N environmental cases) and where C(j) is the correct value for environmental case jj. Less is better ﻣﺠﻤﻮع ﻓﻮاﺻﻞ = ﺗﻌﺪاد اﺷﺘﺒﺎﻫﺎت، داﺷﺘﻪ ﺑﺎﺷﻴﻢboolean b l در ﺻﻮرﺗﻲ ﻛﻪ ﻳﻚ ﻋﺒﺎرت ﺑﺎ ﻣﻘﺎدﻳﺮ
Search Space Adjusted fitness measure
Between 0 and 1 Closer to 1 is better
Normalized fitness
Between 0 and 1 Closer to 1 is better sum of the normalized fitness values is 1 =p proportional p to fitness = fitness p proportionate p
Operations in GP
Primary operations:
Reproduction
Crossover (recombination)
FITNESS PROPORTIONATE REPRODUCTION OPERATION Asexual operation - ﻳﻚ ﻋﻤﻞ ﻏﻴﺮ ﺟﻨﺴﻲ- ﻓﻘﻂ ﺑﺮ روي ﻳﻚ واﻟﺪ ﻋﻤﻞ ﻣﻲ ﻛﻨﺪ ﺗﻮﻟﻴﺪ ﻳﻚ ﻓﺮزﻧﺪ – اﻧﺘﻘﺎل ﺧﻮدش ﺑﻪ ﻧﺴﻞ ﺑﻌﺪي CROSSOVER (RECOMBINATION) OPERATION ﺗﻮﻟﻴﺪ ﺣﺪاﻗﻞ ﻳﻚ ﻓﺮزﻧﺪ Sexual operation - ﻳﻚ ﻋﻤﻞ ﺟﻨﺴﻲ
Operations in GP :2 ل ﻣﺜﺎل
CROSSOVER
Operations
Control Parameters
1. Population size: A larger population allows for a greater exploration of the problem space at each generation and increases the chance of evolving a solution. In general, the more complex a problem the greater the population size needed.
2. Maximum number of generations: The evolutionary process needs to be given time; the greater the maximum number of generations the greater the chance of evolving a solution. further evolution of a population does not guarantee a solution will be found found—it it may be better to start again with a different initial population.
3. Probability of crossover: [Koza, 3 [Koza 1992] does not change this value from 0.90—90% of the population undergoes crossover.
4. Probability of reproduction: Koza Koza’s s work this value stays constant, at 0.10—10% of the population undergoes reproduction.
Five Major Preparatory Steps for GP
Flowchart for Genetic Programming g g
Experimental Results GP ﺑﺎ اﺳﺘﻔﺎده ازmachine learning ﺑﺮرﺳﻲ ﺑﺮﺧﻲ آزﻣﺎﻳﺸﺎت در زﻣﻴﻨﻪ
The algorithm Th l ith is i probabilistic: b bili ti we almost l t never gett the precisely same result twice.
machine learning: The ability of a machine to improve its performance based on previous results.
MACHINE LEARNING OF A FUNCTION
ﺗﻮﺳﻌﻪ ﻳﻚ ﺗﺮﻛﻴﺐ از ﺗﻮاﺑﻊ ،ﻛﻪ ﺑﺘﻮاﻧﺪ ﻣﻘﺪار درﺳﺖ ﻳﻚ ﺗﺎﺑﻊ را ﭘﺲ از ﻣﺸﺎﻫﺪه ﺗﻨﻬﺎ ﻳﻚ ﺗﻌﺪاد ﻧﺴﺒﺘﺎً ﻛﻮﭼﻚ از ﻧﻤﻮﻧﻪ ﻫﺎي ﺧﺎص ﻣﻘﺪار آن ﺗﺎﺑﻊ ﻛﻪ واﺑﺴﺘﻪ ﺑﻪ ﺗﺮﻛﻴﺐ ﺑﺮﮔﺮداﻧﺪ. آرﮔﻮﻣﺎﻧﻬﺎ ﺑﺎﺷﺪ ﺑ ﮔ داﻧﺪ ﺧﺎﺻﻲ از آ ﮔﻮﻣﺎﻧﻬﺎ ﺧﺎﺻ
ﻣﺜﺎﻟﻲ ﺑﺮاي اﻳﻦ ﻣﻮرد: BOOLEAN 11-MULTIPLEXER FUNCTION ﭘﺮدازﻳﻢ. ﻣﻮرد ﻣﻣﻲ ﭘ دازﻳﻢ ﺑﺮرﺳﻲ اﻳﻦ ﻣﻮ د ددر اداﻣﻪ ﺑﻪ ﺑ ﺳ
BOOLEAN 11 11--MULTIPLEXER FUNCTION input to the Boolean multiplexer function consists of k "address" bits ai and 2^k "data" bits di.
)for the 11-multiplexer (where k = 3
ﻧﻤﻮﻧﻪ ﻳﻦ ﻮ ﻲ اﻳﻦ ﻃﺮﻓﻲ ﻲ ﺑﺑﺎﺷﺪ و ازز ﺮ ﺠﺶ ﻣﻲ ﺑﻞ ﺳﻨﺠﺶ ﺠﻮ ﻗﺎﺑﻞ ﺟﺴﺘﺠﻮ يﺟ ل :ﻓﻀﺎي ﻳﻦ ﻣﺜﺎل ﻣﺰﻳﺖ ﻛﺎرر ﺑﺮ روي اﻳﻦ ﺰﻳ در ﺷﺒﻜﻪ ﻫﺎي ﻋﺼﺒﻲ ﺑﻪ ﻋﻨﻮان ﻳﻚ ﺗﺎﺑﻊ ﺗﺴﺖ در ﻧﻈﺮ ﮔﺮﻓﺘﻪ ﺷﺪه اﺳﺖ.
BOOLEAN 11 11--MULTIPLEXER FUNCTION
k+2
.ﻣﺮﺣﻠﻪ اول اﻧﺘﺨﺎب ﻣﺠﻤﻮﻋﻪ ﺗﻮاﺑﻊ و ﺗﺮﻣﻴﻨﺎﻟﻬﺎ اﺳﺖ :ﻣﺠﻤﻮﻋﻪ ﺗﻮاﺑﻊ ﻣﻮرد ﻧﻴﺎز ﺑﺮاي اﻧﻮاع ﻋﺒﺎرات ﺑﻮﻟﻲ
F = {AND, OR, NOT, IF}. T = {{A0,, A1,, A2,, D0,, D1,, ... , D7}. }
k
Number of arguments
K=3
2
2
k + 2k
Number of possible Boolean Functions
2
211
=2
2048
BOOLEAN 11 11--MULTIPLEXER FUNCTION . ﺗﺮﻛﻴﺐ آرﮔﻮﻣﺎﻧﻬﺎ را ﺑﻪ ﻋﻨﻮان ﻣﺤﻴﻂ اﺳﺘﻔﺎده ﻣﻲ ﻛﻨﻴﻢ2048 ﺑﺮاي اﻳﻦ ﻣﺴﺌﻠﻪ ﻛﻞ .ﮔﻴﺮﻳﻢ ﻧﻈﺮ ﻣﻲ ﮔ ﻳ ددر ﻧﻈ4000 ﺑﺮاﺑﺮ ﺑﺎ ﻞ راا ﺑ اﺑ ﺖ ﻫﻫﺮ ﻧﻧﺴﻞ ﺟﻤﻌﻴﺖ ﺟ ﻌ
Raw fitness of a LISP S-expression p is the sum of the distances (taken over all the environmental cases) between the point returned by the S-expression for a given set of arguments g g and the correct p point.
The raw fitness of an S-expression can range over 2049 different values between 0 and 2048. 2048
The number of hits is simply p y 2048 minus the raw fitness (mismatches).
BOOLEAN 11 11--MULTIPLEXER FUNCTION
The initial random population includes a variety of highly unfit individuals Some individual S individuals. S-expressions expressions involve logical contradictions such as (AND A0 (NOT A0)). Others involve inefficiencies such as (OR D7 D7). Some are passive and merely pass an input through as the output, such as (NOT (NOT A1)). Some of the initial random individuals base their decision on precisely the wrong arguments (i.e. data bits), such as (IF D0 A0 A2).
Most of the initial random individuals are partially "blind" in that they do not incorporate all 11 arguments that are known to be necessary for a solution to the problem.
Some S-expressions are just nonsense, such as (IF (IF (IF D2 D2 D2) D2) D2) D2).
BOOLEAN 11 11--MULTIPLEXER FUNCTION ﺑﺎ وﺟﻮد ﺗﻨﺎﺳﺐ ﺑﺴﻴﺎر ﭘﺎﻳﻴﻦ اﻓﺮاد ﺟﻤﻌﻴﺖ اوﻟﻴﻪ ،در ﻣﻴﺎن 4000ﻧﻔﺮ 25 ،ﻧﻔﺮ ﺑﺎ ﻣﻴﺰان hitﺑﺮاﺑﺮ 1280وﺟﻮد دارﻧﺪ. ﻣﻴﺎﻧﮕﻴﻦ raw fitnessﻧﺴﻞ اول = 985.4 ﺳﺎزﻳﻢ. اﺳﺘﻔﺎده ااز crossoverو reproductionﺟﻤﻌﻴﺖ ﺑﻌﺪي ارا ﻣﻲ ﺎ ﺎﺑﺎ ا ﻔﺎ Raw fitnessدر ﻧﺴﻞ دوم = 891 اﺳﺖ.. ﻣﺸﺎﻫﺪه ﻣﻲ ﺷﻮد ﻛﻪ ﺑﻬﺒﻮد ﺗﻨﺎﺳﺐ از ﻧﺴﻞ دوم ﺷﺮوع ﺷﺪه اﺳﺖ در 9ﻧﺴﻞ ﺑﻌﺪي دارﻳﻢ: 845و 823و 763و 731و 651و 558و 459و 382 ﺟﻮاب ﻣﺴﺌﻠﻪ در ﻧﺴﻞ 9ام ﻓﺮدي ﺑﺎ ﺗﻨﺎﺳﺐ 2048ﺑﺮﺧﻮرد ﺣﺎﺻﻞ ﻣﻲ ﺷﻮد
BOOLEAN 11 11--MULTIPLEXER FUNCTION : ام ﺑﻪ دﺳﺖ ﻣﻲ آﻳﺪ9 آﻧﭽﻪ در ﻧﺴﻞ
BOOLEAN 11 11--MULTIPLEXER FUNCTION ﺣﺪ اﻛﺜﺮ ﺗﻌﺪاد اﺟﺪاد ﻓﺮدي ﻛﻪ در ﻧﺴﻞ hام ﻣﻲ ﺑﺎﺷﺪ:
h +1
2
ﺟﻮاب ﺣﺎﺻﻞ ﺷﺪه در ﻧﺴﻞ 9ام ،ﺣﺎﺻﻞ ﻋﻤﻞ crossoverﺑﺮ روي 2ﻓﺮد ﻣﻲ ﺑﺎﺷﺪ. ﭘﺪر از ﻧﺴﻞ 8ام و در ﻣﻴﺎن 4000ﻋﻀﻮ داراي رﺗﺒﻪ 58از ﻟﺤﺎظ ﺗﻨﺎﺳﺐ ﻣﻴﺒﺎﺷﺪ و ﻧﺮخ ﺑﺮﺧﻮرد آن 1792ﻣﻲ ﺑﺎﺷﺪ. ﻣﺎدر از ﻧﺴﻞ 8ام و داراي رﺗﺒﻪ 1از ﻟﺤﺎظ ﺗﻨﺎﺳﺐ در ﻣﻴﺎن 4000ﻋﻀﻮ ﻣﻲ ﺑﺎﺷﺪ .ﻧﺮخ ﺑﺮﺧﻮرد آن 1920اﺳﺖ.
ﻣﺎدر ﺎد
ﭘﺪر ﭘﺪ
BOOLEAN 11 11--MULTIPLEXER FUNCTION
در اﻳﻦ ﺣﺎﻟﺖ ﺗﻌﺪاد ﻋﺒﺎرات ﺑﺮرﺳﻲ ﺷﺪه در ﻣﻴﺎن 10ﻧﺴﻞ ﻋﺒﺎرت اﺳﺖ از:
40000 * 2048 = 81920000 = 81.92E+06 اﻳﻦ ﻣﻘﺪار ﻧﺴﺒﺖ ﺑﻪ ﻛﻞ ﺗﻌﺪاد ﺗﻮاﺑﻊ 10 616ﺑﺴﻴﺎر ﻛﻤﺘﺮ اﺳﺖ.
Simple Symbolic Regression
ﻫﺪف :ﭘﻴﺪا ﻛﺮدن ﻳﻚ ﻋﺒﺎرت رﻳﺎﺿﻲ ﺑﺮاي ) sin(xاﺳﺖ.
0 ≤ x ≤ 180
ﻣﻲ ﺧﻮاﻫﻴﻢ ﺗﻮﺳﻂ symbolic regressionﻳﻚ ﺗﻘﺮﻳﺐ ﺑﺮاي ﺗﺎﺑﻊ روﺑﺮو ﺑﺪﺳﺖ آورﻳﻢ:
xﺑﻴﻦ 0ﺗﺎ 9رادﻳﺎن اﺳﺖ و ﻟﺬا ﻋﺒﺎرت داﺧﻞ ﭘﺮاﺗﺰ ﺑﻴﻦ 0ﺗﺎ 172درﺟﻪ ﺗﻐﻴﻴﺮ ﺧﻮاﻫﺪ ﻛﺮد. ﺗﺎﺑﻊ ﺗﻨﺎﺳﺐ را ﺑﻪ ﺻﻮرت روﺑﺮو در ﻧﻈﺮ ﻣﻲ ﮔﻴﺮﻳﻢ ﻛﻪ ﺗﻔﺎﺿﻞ ﻣﻘﺪار ﺗﻘﺮﻳﺒﻲ ﺑﺎ ﻣﻘﺪار ﭘﻴﺶ ﻓﺮض ﻣﻲ ﺑﺎﺷﺪ.
ﺣﺪاﻛﺜﺮ ﻣﻴﺰان درون ﻗﺪر ﻣﻄﻠﻖ 20ﺧﻮاﻫﺪ ﺑﻮد و ﻟﺬا ﻣﻘﺪار fﺑﻴﻦ 0ﺗﺎ 200ﻣﺘﻐﻴﺮ ﺧﻮاﻫﺪ ﺑﻮد. ﻫﺮﭼﻪ ﺑﻪ 200ﻧﺰدﻳﻜﺘﺮ ﺑﺎﺷﺪ ﻳﻌﻨﻲ ﺗﻨﺎﺳﺐ ﺑﺎﻻﺗﺮ اﺳﺖ. ادازه ﺟﻤﻌﻴﺖ = 100و ﺗﻌﺪاد ﻧﺴﻠﻬﺎ = 30
Simple Symbolic Regression
ﺑﻬﺘﺮﻳﻦ S-expressionﻫﺎي زﻳﺮ در ﻃﻮل 30ﻧﺴﻞ ﺗﻮﻟﻴﺪ ﺷﺪه اﻧﺪ:
% the th protected t t d division di i i operator t
ﻣﺮوري ﺑﺮ ﻧﺘﺎﻳﺞ ﺑﻪ دﺳﺖ آﻣﺪه
ﻣﻴﺰان ﭘﻴﭽﻴﺪﮔﻲ ﺑﺮ اﺳﺎس ﺗﻌﺪاد ﺗﺮﻣﻴﻨﺎﻟﻬﺎ و ﺗﻮاﺑﻊ اﺳﺘﻔﺎده ﺷﺪه در ﻫﺮ ﻋﺒﺎرت اﺳﺖ.
Simple Symbolic Regression
ﺑﺎ ﺳﺎده ﺗﺮ ﻛﺮدن ﻧﺘﺎﻳﺞ ﺑﻪ دﺳﺖ آﻣﺪه:
Simple Symbolic Regression
ﻫﻤﺎﻧﻄﻮري ﻛﻪ از E1ﺗﺎ E4دﻳﺪه ﻣﻲ ﺷﻮد (*x7) , (*xx) ،ﺑﻠﻮﻛﻬﺎي ﺳﺎزﻧﺪه ﺑﺮاي ﺗﻨﺎﺳﺐ ﺑﺎﻻ ﻣﻲ ﺑﺎﻻﺗﺮ ﺑﺴﺎزﻧﺪ. درﺧﺘﻬﺎﻳﻲ ﺑﺎ ﺗﻨﺎﺳﺐ ﺑﺎﻻﺗ ﺑﺎﺷﻨﺪ و اﻳﻦ ﺟﻤﻼت ددر ﻃﻮل ﻧﺴﻠﻬﺎي ﻣﺨﺘﻠﻒ ﻧﮕﻪ داﺷﺘﻪ ﺷﺪه اﻧﺪ ﺗﺎ د ﺧﺘﻬﺎﻳ در ﻧﺴﻞ 11ام ) (E3اﻳﻦ زﻳﺮ ﻋﺒﺎرت ﺑﺎ ﺟﻤﻠﻪ )) (*x(%94ﺗﺮﻛﻴﺐ ﺷﺪه ﺗﺎ E4را ﭘﺪﻳﺪ آورد.
(sufficiencyرا ﺗﻮﺳﻂ ﭘﻴﺎده ﺳﺎزي ( ffi i )property اﻫﻤﻴﺖ اﻧﺘﺨﺎب ﻣﺠﻤﻮﻋﻪ درﺳﺖ ﺗﻮاﺑﻊ ) t ﻛﺮدن ﺗﺎﺑﻊ ) Sin(xﺑﺎ ﺳﻪ ﻣﺠﻤﻮﻋﻪ ﺗﻮاﺑﻊ ﻧﺸﺎن داده اﻳﻢ. اﺟﺮا را ﺗﺎ زﻣﺎﻧﻲ اداﻣﻪ داده اﻳﻢ ﻛﻪ ﻣﻴﺰان ﺣﺪاﻗﻠﻲ ﺑﺮاي ﺗﻨﺎﺳﺐ ﺣﺎﺻﻞ ﺷﻮد.
Applying genetic programming technique in classification trees
ﺑﺮاي ﺣﻞ ﻣﺴﺎﺋﻞ ﻃﺒﻘﻪ ﺑﻨﺪي ﻛﺮدن ﺑﻜﺎر ﻣﻲ روﻧﺪ. ﻣﺎ از GPﺑﺮاي ﺟﺴﺘﺠﻮي ﻳﻚ درﺧﺖ ﻃﺒﻘﻪ ﺑﻨﺪي اﺳﺘﻔﺎده ﻣﻲ ﻛﻨﻴﻢ. ﻧﻤﻮد و ددر ﺗﺒﺪﻳﻞ ﻧ د ﻗﻮاﻧﻴﻦ ﺗ ﺪ ﻞ ﻣﺠﻤﻮﻋﻪ ااي از ﻗ اﻧ ﻪ ﺗﻮان ﺑﻪﻪ ﺷﺪه ارا ﻣﻲ ﺗ ا ﺣﺎﺻﻞ ﺷﺪ ﺧﺖ ﺎ ﻞ ددرﺧﺖ knowledge baseﻗﺮار داد. دو ﻋﻤﻠﮕﺮ ژﻧﺘﻴﻜﻲ ﺟﺪﻳﺪ را ﻣﻌﺮﻓﻲ ﺧﻮاﻫﻴﻢ ﻛﺮد. ﻧﺘﺎﻳﺞ را ﺑﺮاي ﻳﻚ ﻣﺴﺌﻠﻪ ﻛﺎرت اﻋﺘﺒﺎري ﻧﺸﺎن ﺧﻮاﻫﻴﻢ داد. Classification for: Knowledge discovery
Business decision-making
Applying genetic programming technique in classification trees
Two main tasks finding classifiers: by adopting appropriate learning techniques on training data sets. classifying l if i unknown k d data t knowledge ﻧﻘﺶ ﻣﻬﻤﻲ در،ﺳﺎﺧﺖ ﻳﻚ ﻣﺘﺪ ﻛﺎرا ﺑﺮاي ﺗﻮﻟﻴﺪ ﺧﻮدﻛﺎر ﻗﻮاﻧﻴﻦ ﻃﺒﻘﻪ ﺑﻨﺪي . داردmanagementt
:در اﻳﻦ روش ﻣﺎ از ﻋﻤﻠﮕﺮﻫﺎي زﻳﺮ اﺳﺘﻔﺎده ﻣﻲ ﻛﻨﻴﻢ
Crossover Mutation Elimination Merge
Applying genetic programming technique in classification trees :روش ﺑﻜﺎر رﻓﺘﻪ را در ﺷﻜﻞ روﺑﺮو ﻧﺸﺎن داده اﻳﻢ
Applying genetic programming technique in classification trees
The proposed algorithm:
Input: a set of instances to be classified.
Output: an appropriate classification tree.
Applying genetic programming technique in classification trees
Step 1: Create an initial population of n randomly generated classification trees.
Step 2: Evaluate the fitness value of each classification tree by an evaluation function and the set of instances.
Step 3: Select suitable classification trees according to the evaluation results to perform the following genetic operations.
Step 3.1: Perform crossover operations on parent trees to generate offspring trees.
S Step 3 2 Perform 3.2: P f mutation i operations i on parent trees to generate offspring ff i trees.
Step 3.3: Perform elimination operations on offspring trees to remove redundancy.
St 3.4: Step 3 4 Perform P f merge operations ti on offspring ff i trees t to t remove subsumption. b ti
Step 5: If the termination criterion is not satisfied, then go to Step 3; otherwise, do the next step.
Step 6: Select the best classification tree with the highest fitness value from the population as the final classification tree.
Applying genetic programming technique in classification trees
ﭘﺲ از ﻣﺮﺣﻠﻪ ﺷﺸﻢ ،درﺧﺖ ﻃﺒﻘﻪ ﺑﻨﺪي ﻧﻬﺎﻳﻲ را ﺑﻪ ﻣﺠﻤﻮﻋﻪ ﻗﻮاﻧﻴﻦ ﺗﺒﺪﻳﻞ ﻣﻲ ﻛﻨﻴﻢ و در knowledge baseﻗﺮار ﻣﻲ دﻫﻴﻢ. ﻋﻤﻠﮕﺮﻫﺎي mergeو eliminationﺑﺮاي ﻛﺎﻫﺶ ﭘﻴﭽﻴﺪﮔﻲ درﺧﺖ ﺑﻜﺎر ﻣﻲ روﻧﺪ .وﻟﻲ زﻣﺎن اﺟﺮا اﻓﺰاﻳﺶ ﻣﻲ ﺎﻳﺎﺑﺪ. ا ا
ﺟﺰﺋﻴﺎت اﻟﮕﻮرﻳﺘﻢ ﺗﻮاﺑﻊ اﺳﺘﻔﺎده ﺷﺪه:
Applying genetic programming technique in classification trees : ﻣﺜﺎل .ﻛﺎرﺑﺮدي ﻛﻪ ﻣﺜﺎل ﻣﻲ زﻧﻴﻢ در ﻣﻮرد درﺧﻮاﺳﺖ ﺑﺮاي ﻛﺎرت اﻋﺘﺒﺎري اﺳﺖ
Two classes {disapproving, approving} represented as {R0, R1}, are to be distinguished by the three features {Job type, Education, Salary}.
the feature of Job type has four possible values {laborer, low-level manager, middle-level manager, high-level manager} denoted {J0, J1, J2, J3} the feature of Education has four p possible values {{High g school,, College, g , Master, Ph.D} denoted {E0, E1, E2, E3} the feature of Salary has four possible values {< 200 thousand (NT dollars), 200−490 thousand, 500−800 thousand, >800 thousand} denoted {S0, S1, S2 S3}. S2, S3}
Applying genetic programming technique in classification trees : داراي دو ﻗﺎﻧﻮن زﻳﺮ ﺑﺎﺷﺪRSi ﻓﺮض ﻛﻨﻴﺪ ﻛﻪ ﻣﺠﻤﻮﻋﻪ ﻗﻮاﻧﻴﻦ،ﺑﺮاي ﻧﺸﺎن دادن ﻧﻤﺎﻳﺶ درﺧﺘﻲ
Rule1: If (Job type =middle-level middle level manager) and (Education = Master) and (Salary = 500 − 800 thousand) then Class is approving. Rule2: If (Job type = laborer) and (Education = College) and (Salary = <200 thousand) then Class is disapproving. Rule1: If J2 and E2 and S2 then R1; Rule2: If J0 and E1 and S0 then R0.
Applying genetic programming technique in classification trees
ﺗﺎﺑﻊ ﺗﻨﺎﺳﺐ:
ﻧﻴﺎز ﺑﻪ ﻳﻚ ﺗﺎﺑﻊ ﺗﻨﺎﺳﺐ ﻣﻨﺎﺳﺐ ﺑﺮاي ﻛﺎرﻛﺮد ﺻﺤﻴﺢ اﻟﮕﻮرﻳﺘﻢ دارﻳﻢ. ﺗﺄﻳﻴﺪﺪ ﺻﻼﺣ ﺖ ﺻﻼﺣﻴﺖ اي ﺗﺄﻳ آﻣﻮزﺷﻲ راا ﺑﺑﺮاي ﻢ و ﻣﺠﻤﻮﻋﻪ اي از ﻧﻤﻮﻧﻪ ﻫﺎي آﻣﻮزﺷ ﻛﻨﻴﻢ ﻳﻒ ﻣﻣﻲ ﻛﻨ ﺗﻌﺮﻳﻒ
ﻳﻚ ﺗﺎﺑﻊ ﺗﻨﺎﺳﺐ راا ﺗﻌ ﻛﺮدن درﺧﺘﻬﺎي ﻃﺒﻘﻪ ﺑﻨﺪي ﺑﻜﺎر ﻣﻲ ﺑﺮﻳﻢ.
ﻧﻤﻮﻧﻪ ﻫﺎي آﻣﻮزﺷﻲ ﻣﻮاردي ﻫﺴﺘﻨﺪ ﻛﻪ در ﻛﺎرﺑﺮد ﻧﻤﻮﻧﻪ ﺑﻜﺎر ﻣﻴﺮوﻧﺪ و ﻣﻲ ﺗﻮان آﻧﻬﺎ را از ﻣﺤﻴﻂ ﻧﻤﻮد. وري ﻮ ﻊ آوري ﺟﻤﻊ ﻲﺟ وواﻗﻌﻲ
Applying genetic programming technique in classification trees
ﺑﺎ اﺳﺘﻔﺎده از دو ﻓﺮﻣﻮل ﻗﺒﻠﻲ ،ﺗﺎﺑﻊ ﺗﻨﺎﺳﺐ را ﺑﻪ ﺻﻮرت زﻳﺮ ﺗﻌﺮﻳﻒ ﻣﻲ ﻛﻨﻴﻢ.
ﻋﻤﻠﮕﺮﻫﺎي ژﻧﺘﻴﻜﻲ اﺳﺘﻔﺎده ﺷﺪه ﻋﺒﺎرﺗﻨﺪ از: crossover , mutation دو ﻋﻤﻠﮕﺮ اﺳﺎﺳﻲt ti : دو ﻋﻤﻠﮕﺮ ﺟﺪﻳﺪmerge , elimination :
Applying genetic programming technique in classification trees
Crossover
(1) Select a crossover point in one of the parents at random.
(2) If the chosen point occurs at a rule boundary (“IF” node), the crossover point in the other parent tree must also be at a rule boundary Otherwise, boundary. Otherwise the point may be within a rule rule. The crossover point for the other parent must also be within a rule.
(3) Exchange the two chosen parent subtrees,which consist of the entire set of nodes below the crossover point, to generate two new offspring trees.
Applying genetic programming technique in classification trees
Crossover
Applying genetic programming technique in classification trees
Mutation
The mutation operation is desired to help the evolving process escape from local optimum.
Applying genetic programming technique in classification trees
Elimination .ﺑﺮاي ﺣﺬف اﻓﺰوﻧﮕﻲ در زﻣﺎﻧﻲ ﻛﻪ دو ﻳﺎ ﭼﻨﺪ ﻗﺎﻧﻮن ﻣﺸﺎﺑﻪ در ﻳﻚ درﺧﺖ وﺟﻮد داﺷﺘﻪ ﺑﺎﺷﺪ
Let the classification tree to be processed be CTr with k rules. The steps for the elimination operation are shown as follows.
(1) Set i = 1 and j = i + 1, where i and j are used to represent the numbers of the current two selected rules for checking.
(2) Compare Ruleri with Ruler j for their feature values and classes.
(3) If the two rules have the same values, they are redundant. Remove the deeper redundant rule from the classification tree CTr .
(4) Set j = j + 1. (5) If j > k, then do the next step; otherwise, go to step (2).
(6) Set j = i + 2 and i = i + 1.
(7) If i = k, then exit the searching process; otherwise, go to step (2).
Applying genetic programming technique in classification trees
Elimination
Applying genetic programming technique in classification trees
Merge .ﺑﺮاي ﻛﺎﻫﺶ ﭘﻴﭽﻴﺪﮔﻲ درﺧﺖ اﺳﺘﻔﺎده ﻣﻲ ﺷﻮد
Let the classification tree to be processed be CTs with q rules. The steps for the merge operation are shown as follows.
(1) Set i = 1 and j = i + 1, where i and j are used to represent the numbers of the current two selected rules for checking.
(2) C Compare Rule R l sii with ith Rule R l sjj for f th their i feature f t values l and d classes. l
(3) If the two rules have the same classes and the feature values of a rule is a subset of those in the other one, remove the rule with more features from the classification tree CTs .
(4) Set j = j + 1.
(5) If j > q, then do the next step; otherwise, go to step (2).
(6) Set j = i + 2 and i = i + 1. 1
(7) If i = q, then exit the searching process; otherwise, go to step (2).
Applying genetic programming technique in classification trees
Merge
Applying genetic programming technique in classification trees
ﻧﺘﺎﻳﺞ ﻋﻤﻠﻲ
در اﻳﻦ ﺑﺨﺶ از ﻣﺠﻤﻮﻋﻪ اي از داده ﻫﺎ ﺑﺮاي ﻛﺎرﺑﺮدي ﻛﻪ در آن ﻛﺎرت اﻋﺘﺒﺎري درﺧﻮاﺳﺖ ﺷﺪه ،اﺳﺘﻔﺎده ﺷﺪه اﺳﺖ. آﻣﺪه. ﺖآ ﺪ ﺑﻪ ددﺳﺖ ﺗﺎﻳﻮان ﻪ ﺑﺎﻧﻜﻲ ددر ﺗﺎ ا ﻧﻤﻮﻧﻪ از ﺎﻧﻜ ﻣﻮرد ﻧ ﻧﻪ 687د اﻳﻦ ﻣﻮارد ﺷﺎﻣﻞ feature 14و دو ﻛﻼس approve , disapproveﻣﻲ ﺑﺎﺷﺪ. ﻫﺪف ﭘﻴﺪا ﻛﺮدن ﻳﻚ درﺧﺖ ﻃﺒﻘﻪ ﻣﻨﺎﺳﺐ ﺑﻮده ﻛﻪ ﺑﺘﻮاﻧﺪ ﺑﻪ ﻣﺠﻤﻮﻋﻪ اي از ﻗﻮاﻧﻴﻦ ﺗﺒﺪﻳﻞ ﺑﺸﻮد ﻛﻪ ﺗﻮﺳﻂ آﻧﻬﺎ ﺑﺘﻮاﻧﻴﻢ ﻛﻨﻴﻢ. ﻦ ﻛﻨ ﻢ ﻣﻌﻴﻦ ده ﻣﻌ ي ﻛﻛﺮده اﻋﺘﺒﺎري ت اﻋﺘ ﺎ ﻛﺎرت ﺧﻮاﺳﺖ ﻛﺎ دي ﻛﻪ ددرﺧﻮاﺳﺖ اي ﻓﻓﺮدي ﻳﻜﻲ از دو ﻛﻼس راا ﺑﺑﺮاي ﻳﻜ اﻳﻦ 687ﻣﻮرد را اﺑﺘﺪا ﺑﻪ دو دﺳﺘﻪ ﺗﻘﺴﻴﻢ ﻛﺮده اﻳﻢ ،دﺳﺘﻪ اول ﺑﺮاي ﭘﺮورش درﺧﺘﻬﺎ و دوﻣﻲ ﺑﺮاي ﺗﺴﺖ ﻛﺮدن. از دﺳﺘﻪ اول ﺑﺮاي ارزﻳﺎﺑﻲ ﻣﻘﺪار ﺗﻨﺎﺳﺐ درﺧﺘﻬﺎ اﺳﺘﻔﺎده ﺷﺪه اﺳﺖ. ﻧﻬﺎﻳﻲ اﺳﺘﻔﺎده ﺷﺪه. ﻧﺘﻴﺠﻪ ﻧﻬﺎﻳ ﻛﺮدن ﻧﺘ ﺠﻪ ﺗﺴﺖ ﻛ دن ﺑﺮاي ﺗ ﺖ از دﺳﺘﻪ دوم ﺑ اي %70ﻣﻮارد ﺑﺮاي ارزﻳﺎﺑﻲ ﺗﻨﺎﺳﺐ و %30از ﻣﻮارد ﺑﺮاي ﺗﺴﺖ اﺳﺘﻔﺎده ﺷﺪه اﺳﺖ. ﭘﺲ از 500ﻧﺴﻞ ،روش ﭘﻴﺸﻨﻬﺎد ﺷﺪه دﻗﺘﻲ ﺑﺮاﺑﺮ ﺑﺎ ﺟﺪول زﻳﺮ را ﺑﻪ دﺳﺖ آورده اﺳﺖ.
Applying genetic programming technique in classification trees
ﻣﺸﺎﻫﺪه ﻣﻲ ﺷﻮد ﻛﻪ روش اراﺋﻪ ﺷﺪه ،ﻛﺎراﻳﻲ ﺑﺎﻻﺗﺮ از روش kozaو روش quinlanدارد. ﭘﺲ ﺗﻮاﻧﺴﺘﻴﻢ ﻣﺘﺪ ﻛﺎراﻣﺪي ﺑﺮاي ﺗﻮﻟﻴﺪ ﻳﻚ درﺧﺖ ﻃﺒﻘﻪ ﺑﻨﺪي ﻣﻨﺎﺳﺐ ﭘﺒﺪا ﻛﻨﻴﻢ. اﺳﺘﻔﺎده از ﻋﻤﻠﮕﺮﻫﺎي merge , eliminationﻣﻮﺟﺐ ﺷﺪ ﻛﻪ رواﺑﻂ ﺑﻴﻦ ﻗﻮاﻧﻴﻦ را ﻧﻴﺰ در ﻧﻈﺮ ﺑﮕﻴﺮﻳﻢ .اﻣﺎ اﻳﻦ ﺑﻪ ﻗﻴﻤﺖ زﻣﺎن ﭘﺮدازش ﺑﻴﺸﺘﺮ اﺳﺖ. در آﻳﻨﺪه ﻣﻲ ﺗﻮان ﺑﺮ روي ﺗﺎﺑﻊ ﺗﻨﺎﺳﺐ ﺑﻬﺘﺮي ﻛﺎر ﻛﺮد ﺗﺎ دﻗﺖ ﺑﺎﻻﺗﺮي را ﺑﻪ دﺳﺖ آورﻳﻢ.
R f References
[1] John R. Koza, “GENETIC PROGRAMMING: A PARADIGM FOR GENETICALLY BREEDING POPULATIONS OF COMPUTER PROGRAMS TO SOLVE PROBLEMS”, Computer Science Department, Stanford University, Margaret Jacks Hall Stanford, CA 94305
[2] S. Settea,, L. Boullart, “Genetic programming: principles and applications” ,Elsevier, Engineering Applications of Artificial Intelligence 14 (2001) 727–736
[3] Chan-Sheng Ch Sh K Kuo, T Tzung-Pei, P i Ch Chuen-Lung L Chen Ch Hong, H “Applying “A l genetic programming technique in classification trees” Springer-Verlag 2007
[[4]] Matthew Walker,, “Introduction to Genetic Programming”, g g , October 7,, 2001
[5] “Executional Steps of Genetic Programming”, http://www.geneticprogramming.com/gpanimatedtutorial.html