[eng] Poster - Adaptive Multi-robot Coordination Based On Resource Spending Velocity (erusalimchik, Kaminka, Kraus) -for- Aamas-09 V1.1

  • Uploaded by: Dan Erusalimchik
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View [eng] Poster - Adaptive Multi-robot Coordination Based On Resource Spending Velocity (erusalimchik, Kaminka, Kraus) -for- Aamas-09 V1.1 as PDF for free.

More details

  • Words: 928
  • Pages: 1
Adaptive Game­Theoretic  Multi­Robot Coordination  Based on Resource Spending Velocity Dan ERUSALIMCHIK ,   Gal A. KAMINKA   and   Sarit KRAUS Department of Computer Science, Bar-Ilan University, Israel

Selecting a Coordination Method (Algorithm)

Game Theory Analysis of LCT Tasks

Coordination improves team's performance and increases robustness of team.  But, many different coordination methods exist and no one is always best. The  effectiveness of a coordination method depends on context, agents density and  group size. Therefore, we need technique for select methods dynamically, switching  methods as needed [Rybsk et al. 1998, Rosenfeld et al. 2008].

There are common themes that run through all the tasks in which EI has been successful.  We refer to these tasks as LCT tasks (Loose­coordination, Cooperative, Timed tasks): ➢ Loose coordination between the robots (i.e., only occasional need for spatial coordination) ➢ A cooperative task (the robots seek to maximize group utility) ➢ The task is bound in time. LCT Task as Extensive­Form Game

Coordination Algorithm Selection as Reinforcement Learning in One­stage MDPs

Robot 1 selection

Time : C 1

Each agent has a set of actions (coordination methods). ➢ Each agent selects the actions independently of others. ➢ Participants receive greatest reward for joint actions. ➢

Time : C 1 a

Robot 2 selection

Time : C 1

p

C 2−C 1=I 1 I 1

     … and … 

Time : C 2

No observation of others actions or communication between agents. ➢ All agents anonymous. ➢ Rewards are not known a­priory (have to be learned)

Robot 1 selection

Time : C 2



Time : C 2

C i, 0 C i, j

Start

C i, j 1

I i,p j

t i, j

I ai, j

I ai, j 1

Coordination

I i,p j 1

t i, j1

Finish Time

2

Coordination

Team Level:

The EI (Effectiveness Index) Reward Function

p

p

c

1 K 1,1 U 1,1 K 2,1 U 2,1

1

c

1 1 2

U i , j=[ gain  I i i gain  I j  j ]−[C i i C j  j ]

2

Individual Level:

1. The frequency of coordinating. Monitor for time from conflict to conflict      occurrences and select method that lead to longest period without conflicts. 2.  The time spent coordinating. Monitor for time needed for conflict 

p k

1 1 1 2 2 K 1,1 u1 1  , K 1,1 u 1 1  1 1 2 2 K 2,1 u 2 2  , K 2,1 u1 1 

1

1 1 2

c k

u k  k =gain I  k −C k 

2

    solving and select the quickest method. 3. The cost of coordination. Monitor for      resource spending and select cheapest method.

Calculatable, Individual Level:

EI of method  of robot i th at conflict j

1

1 1 2

1 1 2 EI 1, j 1  , EI 2, j 1  1 2 EI 1, j 2 , EI 2, j 1 

2

2 K 1,2 U 1,2 K 2,2 U 1,2 2

2 1 1 2 2 K 1,2 u1 1 , K 1,2 u2  2  1 1 2 2 K 2,2 u 2 2  , K 2,2 u2 2  2

2 1 2 EI 1, j 1  , EI 2, j  2  1 2 EI 1, j 2 , EI 2, j 2 

But RL does not converge!  Or does it?

a i, j

I cost i  EI i , j = I i , j 

Simple reinforcement learning, in anonymous matched games, causes the population to converge to Nash equilibrium, even If mixed [Hopkins, 1999].

Stateless Q­Learning algorithm

Conditions for convergence (satisfied in EI­based adaptation): ➢ Game matrix must be symmetric ➢ The agents use fictitious play or stimulus response learning. This essentially corresponds to our stateless Q­learning ➢ The agents random select their initial methods.

Qt a=Qt−1 a R t a− Q t−1 a

TeamBots Simulated robots Foraging

AIBO Robots Real robots Foraging

VR­Forces Task­Oriented Decision

sim.foraging pic

Real robot pic

VR-Forces pic

Pucks collected per minute

1.4

resource: unlimited

Median

Pucks retrieved

time:20 min,

50

1.2

40

EI ACIM RND noise aggression repel

Dynamic

30 20

Static

10

5

10

15

20

25

Group size

30

35

1.0 0.8 0.6 0.4 0.2

EI

40

Unexpected resource extra spending time:20 min, resource:500 units, extra spending: aggression 0.5 units per step

50

Number of path selection

Pucks collected by group Pucks retrieved

Time :T

Time : T

We can reduce the game tree into a matrix game, where Ki,j is the number of conflicts occurring within total time T that results from the first robot selecting i , and the second robot selecting j .  Ui,j is the utility gained from this choice. This utility is defined as: 

T

0

Robot A

Noise Coordination Method

Repel

Robot B

Robot C

30

-0.2

Repel

-0.3

-0.5 10

Repel

Repel

-0.4

20

Noise Noise

Noise

-0.6

Learned EI by Robot 5

10

15

20

25

Group size

30

35

40

24

Path 1 (short) 20

Path 2 (long)

15 10 5

Agent B

Agent C

Agent D

See our paper in the

Adaptive & Learning Agents

-0.1

EI ACIM

Learning of the best task solution

Agent A

40

-EI

Pucks retrieved

Robot 2 selection

Time : C 2

From Team Utility to Individual Reward

Conflicts Time­Line

Task

Time : C 2

Time :T

Two states: Coordination and Task­Execution Task

Time : C 2

Higher is better

AAMAS'09 Workshop (and Dan Erusalimchik's M.Sc. Thesis)

Related Documents


More Documents from "Dan Erusalimchik"