INTERACTIVE RANKING OF SKYLINES USING MACHINE LEARNING TECHNIQUES ¨ Weiwei Cheng, Eyke Hullermeier, Bernhard Seeger, Ilya Vladimirskiy Department of Mathematics and Computer Science Marburg University, Germany Ranking the Skyline
Algorithm Design Base Learner: Noise tolerant perceptron with margin.
Experimental Results Workflow:
Training Data: A set of revealed (pairwise) preferences a ≺ b, turned into positive and negative examples for classification.
P (O) =
0
0
o ∈ O | {o ∈ O | o ≺ o } = ∅
o
Bayes Point Machine:
Important problem: P (O) may become huge, especially in high dimensions !
Monotone vs. non-monotone learning: 1
1
0,9
0,9
0,8
0,8
0,7 0,6
Non-monotone
0,5
Monotone
0,4 0,3
Kendall tau Coefficient
df
n
Utility: Linear model U (a) = hw, ai = w1a1 + . . . + wdad (monotonicity holds if w ≥ 0) and kernalized version.
Kendall tau Coefficient
The skyline operator maps a finite set O of objects, each characterized in terms of a fixed number of features (criteria), to the subset of Pareto-optimal elements:
Monotonicity: a ≥ b ⇒ U (a) ≥ U (b) must be guaranteed for all a, b ∈ O.
0,7 0,6
0,3 0,2
0,1
0,1
0
0
15 t=7,59
25 t=8,27
35 t=5,09
Monotone
0,4
0,2
5 t=8,17
Non-monotone
0,5
45 t=4,43
5 t=6,72
15 t=8,31
No. of Preferences & t-statistic
25 t=4,03
35 t=4,30
45 t=1,85
No. of Preferences & t-statistic
Approximation of the Bayes point by the center of mass of version space.
0,75
0,75
0,74
0,74
0,73
0,73
0,72 0,71 Non-BPM
0,7
BPM
0,69 0,68
Kendall tau Coefficient
Ranking the skyline via a (latent) utility function:
Kendall tau Coefficient
Ensemble (Bayes point machine) vs. single learner:
– User feedback is used to improve ranking quality.
4. Retrain the committee on the enlarged preference set and go to step 2.
0,69 0,68
0,66
0,66 0,65 5 10 15 20 25 30 35 40 45 50 t=2,93 t=3,84 t=4,60 t=4,33 t=3,83 t=5,32 t=4,54 t=4,59 t=5,97 t=5,21
5 10 15 20 25 30 35 40 45 50 t=2,27 t=4,22 t=5,55 t=3,37 t=3,85 t=5,03 t=2,79 t=3,98 t=4,92 t=3,17
No. of Permutations & t-statistic
No. of Permutations & t-statistic
1
1
0,95
0,95
0,9
0,9
0,85
Non-active
0,8
Active
0,75 0,7
Kendall tau Coefficient
Kendall tau Coefficient
– Utility degrees induce a total order; thus, a ranking can be presented instead of an unsorted answer set.
BPM
Active vs. non-active learning:
2. Find two maximally conflicting learners. 3. For each learner, generate a corresponding ranking. Return the first discordant pair as a query. Add the answer to the preference set.
Non-BPM
0,7
0,67
Active Learning Strategy: – A utility function U (·) assigns a real utility degree to each object a = (a1 . . . ad) ∈ O; U (a) < U (b) means that the user strictly prefers b to a.
0,71
0,67
0,65
1. Constitute a committee of learners.
0,72
0,85
0,7 0,65
0,6
0,6 15 t=2,58
20 t=2,93
No. of Preferences & t-statistic
Poster presented at KDML 2007, Workshop Knowledge Discovery, Data Mining, and Machine Learning, Halle/Saale, September 2007.
Active
0,75
0,65
10
Non-active
0,8
10
15 t=4,78 No. of Preferences & t-statistic
20 t=6,89