Cheng-kdml07poster

  • Uploaded by: Weiwei Cheng
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Cheng-kdml07poster as PDF for free.

More details

  • Words: 529
  • Pages: 1
INTERACTIVE RANKING OF SKYLINES USING MACHINE LEARNING TECHNIQUES ¨ Weiwei Cheng, Eyke Hullermeier, Bernhard Seeger, Ilya Vladimirskiy Department of Mathematics and Computer Science Marburg University, Germany Ranking the Skyline

Algorithm Design Base Learner: Noise tolerant perceptron with margin.

Experimental Results Workflow:

Training Data: A set of revealed (pairwise) preferences a ≺ b, turned into positive and negative examples for classification.

P (O) =

0

0

o ∈ O | {o ∈ O | o ≺ o } = ∅

o

Bayes Point Machine:

Important problem: P (O) may become huge, especially in high dimensions !

Monotone vs. non-monotone learning: 1

1

0,9

0,9

0,8

0,8

0,7 0,6

Non-monotone

0,5

Monotone

0,4 0,3

Kendall tau Coefficient

df

n

Utility: Linear model U (a) = hw, ai = w1a1 + . . . + wdad (monotonicity holds if w ≥ 0) and kernalized version.

Kendall tau Coefficient

The skyline operator maps a finite set O of objects, each characterized in terms of a fixed number of features (criteria), to the subset of Pareto-optimal elements:

Monotonicity: a ≥ b ⇒ U (a) ≥ U (b) must be guaranteed for all a, b ∈ O.

0,7 0,6

0,3 0,2

0,1

0,1

0

0

15 t=7,59

25 t=8,27

35 t=5,09

Monotone

0,4

0,2

5 t=8,17

Non-monotone

0,5

45 t=4,43

5 t=6,72

15 t=8,31

No. of Preferences & t-statistic

25 t=4,03

35 t=4,30

45 t=1,85

No. of Preferences & t-statistic

Approximation of the Bayes point by the center of mass of version space.

0,75

0,75

0,74

0,74

0,73

0,73

0,72 0,71 Non-BPM

0,7

BPM

0,69 0,68

Kendall tau Coefficient

Ranking the skyline via a (latent) utility function:

Kendall tau Coefficient

Ensemble (Bayes point machine) vs. single learner:

– User feedback is used to improve ranking quality.

4. Retrain the committee on the enlarged preference set and go to step 2.

0,69 0,68

0,66

0,66 0,65 5 10 15 20 25 30 35 40 45 50 t=2,93 t=3,84 t=4,60 t=4,33 t=3,83 t=5,32 t=4,54 t=4,59 t=5,97 t=5,21

5 10 15 20 25 30 35 40 45 50 t=2,27 t=4,22 t=5,55 t=3,37 t=3,85 t=5,03 t=2,79 t=3,98 t=4,92 t=3,17

No. of Permutations & t-statistic

No. of Permutations & t-statistic

1

1

0,95

0,95

0,9

0,9

0,85

Non-active

0,8

Active

0,75 0,7

Kendall tau Coefficient

Kendall tau Coefficient

– Utility degrees induce a total order; thus, a ranking can be presented instead of an unsorted answer set.

BPM

Active vs. non-active learning:

2. Find two maximally conflicting learners. 3. For each learner, generate a corresponding ranking. Return the first discordant pair as a query. Add the answer to the preference set.

Non-BPM

0,7

0,67

Active Learning Strategy: – A utility function U (·) assigns a real utility degree to each object a = (a1 . . . ad) ∈ O; U (a) < U (b) means that the user strictly prefers b to a.

0,71

0,67

0,65

1. Constitute a committee of learners.

0,72

0,85

0,7 0,65

0,6

0,6 15 t=2,58

20 t=2,93

No. of Preferences & t-statistic

Poster presented at KDML 2007, Workshop Knowledge Discovery, Data Mining, and Machine Learning, Halle/Saale, September 2007.

Active

0,75

0,65

10

Non-active

0,8

10

15 t=4,78 No. of Preferences & t-statistic

20 t=6,89

More Documents from "Weiwei Cheng"