Engineering complex systems: Ant Colony Optimization (and in general natural inspired metaheuristic) to model and to solve complex dynamic problems Luca Maria Gambardella IDSIA, Istituto Dalle Molle di studi sull’Intelligenza Artificiale Galleria 2, 6928 Manno, Lugano, Switzerland
[email protected] Many problems of practical importance can be modeled as combinatorial optimisation problems. Examples are: vehicle routing for distribution of goods and services, scheduling of jobs in a production line, generating timetables or scheduling meetings, multiple fault location, routing in dynamic networks and many more. Combinatorial optimisation problems are formulated defining a model of the problem and an objective function, according to some cost measure, to be optimised. Finding an optimal solution means selecting the best alternative among a finite or countably infinite number of possible solutions. Hence, to practically solve large problem instances one often has to resort to approximate methods that return near-optimal solutions in relatively short time. This type of algorithms are loosely called metaheuristics. Metaheuristics are usually inspired by natural process such as genetic algorithms or simulated annealing. Recently one of the most successful metaheuristic is Ant Colony Optimization ([2], [3],[8]). The natural metaphor on which ACO algorithms are based is real ant colonies behaviour. Real ants are capable of finding the shortest path from a food source to their nest without using visual cues but exploiting a chemical substance called pheromone. While walking, ants deposit pheromone on the ground, and follow, in probability, pheromone previously deposited by other ants. The above real ants behaviour has inspired ACO, where a set of artificial ants cooperate to solve a combinatorial problem by exchanging indirect information via artificial pheromone. This artificial pheromone is accumulated at run-time through a learning mechanism that gives reward to good problem
solutions. ACO is based on a set of parallel artificial ants that iteratively build und update problem solutions. These solutions are defined by sequence of states and are constructed using a probabilistic search. Artificial ants start from random states: next, each ant chooses probabilistically the new state to visit using a probabilistic function mainly based on the pheromone intensity. At the end of the each iteration the pheromone on the best solution is increased according to a learning rule. The rationale is that in this way the structure of ''preferred sequences'' emerges in the pheromone trail and future ants will use this information to generate new and better solutions. Recently, many ant based algorithms have been proposed to solve different types of combinatorial optimization problems such as symmetric and asymmetric travelling salesman problems (TSP/ATSP [7],[10],[11]) the sequential ordering problem (SOP, [14]), the quadratic assignment problem (QAP,[12]), the vehicles routing problem (VRPTW, [13]) and the information routing in dynamic internet and telephone network (AntNet, [5]). In these situations ACO algorithms show robustness, scalability and selforganization capabilities since they are able to dynamic adapt to topological change in the problem structure and to rapidly react to failure conditions. Only recently new applications of Ant Colony Optimization include routing in ad hoc networks (AntHocNet, [6]). Our interests and our possible contributions in the meeting is to investigate how ACO based techniques (and in general metaheuristic) are useful to solve complex problems where dynamic and stochastic situations are the key issue. We are also interested in methodological
and theoretical aspects of swarm intelligence applied to dynamic problems. ACO algorithms show similarities with some optimization, learning and simulation approaches like heuristic graph search, Monte Carlo simulation, neural networks, and evolutionary computation. These similarities are briefly discussed in the following.
problem structure (i.e., the pheromone trails). The recursive transmission of such knowledge by means of stigmergy determines a reduction in the variance of the whole search process: the so far most interesting explored transitions probabilistically bias future search, preventing ants to waste resources in not promising regions of the search space.
Heuristic graph search. In ACO algorithms each ant performs an heuristic graph search in the space of the components of a solution: ants take biased probabilistic decisions to choose the next component to move to, where the bias is given by an heuristic evaluation function which favors components which are perceived as more promising. It is interesting to note that this is different from what happens, for example, in stochastic hillclimbers or in simulated annealing [15], where (i) an acceptance criteria is defined and only those randomly generated moves which satisfy the criteria are executed, and (ii) the search is usually performed in the space of the solutions.
Neural networks. Ant colonies, being composed of numerous concurrently and locally interacting units, can be seen as “connectionist” systems [9], the most famous examples of which are neural networks. From a structural point of view, the parallel between the ACO metaheuristic and a generic neural network is obtained by putting each state i visited by ants in correspondence with a neuron i, and the problem specific neighborhood structure of state i in correspondence with the set of synaptic-like links exiting neuron i. The ants themselves can be seen as input signals concurrently propagating through the neural network and modifying the strength of the synaptic-like interneuron connections. Signals (ants) are locally propagated by means of a stochastic transfer function and the more a synapse is used, the more the connection between its two end-neurons is reinforced. The ACO-synaptic learning rule can be interpreted as an a posteriori rule: signals related to good examples, that is, ants which discovered a good quality solution, reinforce the synaptic connections they traverse more than signals related to poor examples. It is interesting to note that the ACO-neural network algorithm does not correspond to any existing neural network model. The ACO-neural network is also reminiscent of networks solving reinforcement learning problems [18]. In reinforcement learning the only feedback available to the learner is a numeric signal (the reinforcement) which scores the result of actions. This is also the case in the ACO meta-heuristic: The signals (ants) fed into the network can be seen as input examples with associated an approximate score measure. The strength of pheromone updates and the level of stochasticity in signal propagation play the role of a learning rate, controlling the balance between exploration and exploitation.
Monte Carlo simulation. ACO algorithms can be interpreted as parallel replicated Monte Carlo systems. Monte Carlo systems [17] are general stochastic simulation systems, that is, techniques performing repeated sampling experiments on the model of the system under consideration by making use of a stochastic component in the state sampling and/or transition rules. Experiment results are used to update some statistical knowledge about the problem, as well as the estimate of the variables the researcher is interested in. In turn, this knowledge can be also iteratively used to reduce the variance in the estimation of the desired variables, directing the simulation process towards the most interesting state space regions. Analogously, in ACO algorithms the ants sample the problem’s solution space by repeatedly applying a stochastic decision policy until a feasible solution of the considered problem is built. The sampling is realized concurrently by a collection of differently instantiated replicas of the same ant type. Each ant “experiment” allows to adaptively modify the local statistical knowledge on the
Finally, it is worth to make a reference to the work of Chen [4], who proposed a neural network approach to TSP which bears important similarities with the ACO approach. Like in ACO algorithms, Chen builds a tour in an incremental way, according to synaptic strengths. It makes also use of candidate lists and 2-opt local optimization. The strengths of the synapses of the current tour and of all previous tours are updated according to a Boltzmann-like rule and a learning rate playing the role of an evaporation coefficient. Although there are some differences, the common features are, in this case, striking. Evolutionary computation. There are some general similarities between the ACO meta-heuristic and evolutionary computation (EC) . Both approaches use a population of individuals which represent problem solutions, and in both approaches the knowledge about the problem collected by the population is used to stochastically generate a new population of individuals. A main difference is that in EC algorithms all the knowledge about the problem is contained in the current population, while in ACO a memory of past performance is maintained under the form of pheromone trails. An EC algorithm which is very similar to ACO algorithms in general and with AS in particular is Baluja and Caruana’s Population Based Incremental Learning (PBIL)[1]. PBIL maintains a vector of real numbers, the generating vector, which plays a role similar to that of the population in genetic algorithms. Starting from this vector, a population of binary strings is randomly generated: each string in the population will have the i-th bit set to 1 with a probability which is a function of the i-th value in the generating vector. Once a population of solutions is created, the generated solutions are evaluated and this evaluation is used to increase (or decrease) the probabilities of each separate component in the generating vector so that good (bad) solutions in the future generations will be produced with higher (lower) probability. It is clear that in ACO algorithms the pheromone trail values play a role similar to Baluja and Caruana’s generating vector, and pheromone
updating has the same goal as updating the probabilities in the generating vector. A main difference between ACO algorithms and PBIL consists in the fact that in PBIL all the probability vector components are evaluated independently, making the approach working well only in the cases the solution is separable in its components. The (1, _) evolution strategy is another EC algorithm which is related to ACO algorithms, and in particular to Ant Colony Systems. In fact, in the (1, _) evolution strategy the following steps are iteratively repeated: (i) a population of _ solutions (ants) is initially generated, then (ii) the best individual of the population is saved for the next generation, while all the other solutions are discarded, and (iii) starting from the best individual, _ . 1 new solutions are stochastically generated by mutation, and finally (iv) the process is iterated going back to step (ii). The similitude with ACS is striking. Stochastic learning automata. This is one of the oldest approaches to machine learning (see [16] for a review). An automaton is defined by a set of possible actions and a vector of associated probabilities, a continuous set of inputs and a learning algorithm to learn inputoutput associations. Automata are connected in a feedback configuration with the environment, and a set of penalty signals from the environment to the actions is defined. The similarity of stochastic learning automata and ACO approaches can be made clear as follows. The set of pheromone trails available on each arc/link is seen as a set of concurrent stochastic learning automata. Ants play the role of the environment signals, while the pheromone update rule is the automaton learning rule. The main difference lies in the fact that in ACO the “environment signals” (i.e., the ants) are stochastically biased, by means of their probabilistic transition rule, to direct the learning process towards the most interesting regions of the search space. That is, the whole environment plays a key, active role to learn good state-action pairs.
References 1. Baluja S. and Caruana R.. Removing the genetics from the standard genetic algorithm. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, ML-95, pages 38–46. Palo Alto, CA: Morgan Kaufmann, 1995. 2. E. Bonabeau, M. Dorigo, G. Theraulaz, N a t u r e , Volume 406 Number 6791 Page 39–42, 2000 3. Bonabeau E., Theraulaz G., Swarm Smarts, Scientific American, March 2000, Page 55-61, 2000 4. Chen K. A simple learning algorithm for the traveling salesman problem. Physical Review E, 55, 1997. 5. Di Caro G. and Dorigo M. AntNet: Distributed Stigmergetic Control for Communications Networks., Journal of Artificial Intelligence Research, JAIR, 9:317-365, 1998 6. Di Caro G., Ducatelle F., Gambardella L.M., AntHocNet: an Ant-Based Hybrid Routing Algorithm for Mobile Ad Hoc Networks, Technical Report IDSIA, 42, 2004 7. Dorigo M., Gambardella L.M, Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem, IEEE Transactions on Evolutionary Computation 1,1, pp. 53-66, 1997 8. Dorigo M., G. Di Caro and L. M. Gambardella. Ant Algorithms for Discrete Optimization. Artificial Life, 5,2, pp. 137-172, 1999. 9. Feldman J. A. and Ballard V. Connectionist models and their properties. Cognitive Science, 6:205–254, 1982. 10. Gambardella L.M, Dorigo M, Ant-Q: a reinforcement learning approach to the traveling salesman problem, Proceedings of ML-95, T w e l f t h International Conference on Machine Learning, A. Prieditis and S. Russell (Eds.), Morgan Kaufmann, 1995, pp. 252–260. 11. Gambardella L.M, Dorigo M., Solving Symmetric and Asymmetric TSPs by Ant Colonies, Proceedings of the IEEE Conference on Evolutionary
C o m p u t a t i o n , ICEC96, Nagoya, Japan, May 20-22, 1996, pp. 622-627. 12. Gambardella L.M, Taillard E., Dorigo M., Ant colonies for the Quadratic Assignment Problem , Journal of the Operational Research Society, 50, pp.167-176, 1999 13. Gambardella L.M, Taillard E., Agazzi G., MACS-VRPTW: A Multiple Ant Colony System for Vehicle Routing Problems with Time Windows , In D. Corne, Dorigo M. and F. Glover, editors, New Ideas in Optimization. McGraw-Hill, London, UK, pp. 63-76, 1999 14. Gambardella L.M, Dorigo M., An Ant Colony System Hybridized with a New Local Search for the Sequential Ordering Problem, INFORMS Journal on Computing, vol.12(3), pp. 237-255, 2000 15. Kirkpatrick, S. Gelatt C.D., and Vecchi M.P.. Optimization by s i m u l a t e d a n n e a l i n g . Science, 220(4598):671–680, 1983. 16. Narendra K. and Thathachar M.. Learning Automata: An Introduction. Prentice-Hall, 1989. 17. Rubistein R.Y. Simulation and the Monte Carlo Method. John Wiley & Sons, 1981. 18. Sutton R.S. and Barto A.G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.