DECISION TREE AND INSTANCE-BASED LEARNING FOR LABEL RANKING ¨ ¨ Weiwei Cheng, Jens Huhn and Eyke Hullermeier Knowledge Engineering & Bioinformatics Laboratory Department of Mathematics and Computer Science Marburg University, Germany
Label Ranking
Label Ranking Tree
Local Learning Approach
IF age ≤ 35
label ranking customer 1
MINI _ Toyota _ BMW _ Volvo
customer 2
BMW _ MINI _ Toyota
customer 3
Volvo _ BMW _ Toyota _ MINI
customer 4
Toyota _ BMW
age B_T_M
IF sex = f M_T_B
B_M_T
IF lives in Montreal
M_T M_T
B_T_M
T_M
income
income
Given:
– Target function is estimated (on demand) in a local way.
– a set of training instances { xk | k = 1 . . . m } ⊆ X
– Core part is to estimate a locally constant model.
– a set of labels L = { λ1, λ2, . . . , λn }
– Use probabilistic models for rankings, considering nearby preferences as a representative sample.
¨ ¨ Furnkranz and Hullermeier, ECML 2003
=
Har-Peled, Roth and Zimak, NIPS 2003
k Y
P(E(σi) | θ, π) =
k Y
X
P(σ | θ, π)
i=1 i=1 σ∈E(σi) Qk P i=1 σ∈E(σi) exp (−θd(σ, π)) , k Qn 1−exp(−jθ) j=1 1−exp(−θ)
where E(σi) denotes the set of consistent extensions of σi.
– Log linear models for label ranking Dekel, Manning and Singer, NIPS 2003
• essentially reduce label ranking to classification • are efficient but may come with a loss of information • may have an improper bias and lack flexibility • or may produce models that are not easily interpretable
BÂM
TÂMÂB
BÂMÂT
MÂBÂT
TÂMÂB
Major modifications compared with regression trees: 1. Split criterion: seeking a split of T (set of rankings) into T + and T − that maximizes
2. Stopping criterion: tree is pure OR number of labels in a node is too small
with center π ∈ Ω, spread θ > 0, and distance d on Ω. Maximum likelihood estimation (MLE) based on observed (incomplete) rankings σ = {σ1, . . . , σk }: σ | θ, π) = P(σ
– Constraint classification
fal se
|T +| · θ+ + |T −| · θ− |T |
exp(−θd(π, σ)) P(σ | θ, π) = φ(θ, π)
Existing Methods – Ranking by pairwise comparison
e
fal se
Mallows Model
A ranking function (X → Ω mapping) that maps each x ∈ X to a ranking x of L (permutation πx) and generalizes well in terms of a loss function on rankings.
t ru
B_M_T
true
Find:
IF a PhD student
fal se
e tru
???
– for each training instance xk : a set of pairwise preferences of the form λi xk λj
fals e
tr ue
Observation σ
Extensions E(σ)
a_b
a_b_c a_c_b c_a_b
ˆ = arg maxπ,θ P(σ σ | θ, π) Approximation of the MLE (ˆ π , θ) with an EM-like estimation procedure.
Main Conclusions from Experiments 0,8 Ranking performance
new customer
age
Label Ranking Tree (LRT) Instance-Based Label Ranking (IBLR)
0,75
Constraint Classification (CC)
0,7 0,65 0,6 Probability of missing label
0,55 0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
– Local learning is more flexible and can exploit more preference information. – Given enough data, IBLR is significantly better than LRT and CC in terms of predictive accuracy (Kendall’s tau). – The size of LRT is smaller than expected.
¨ ACKNOWLEDGEMENTS: Weiwei Cheng and Jens Huhn were financially supported by the ICML 2009 Student Scholarship Program. Poster presented at ICML 2009, 26th International Conference on Machine Learning, Montreal, Canada, June 2009