TY  - CONF
ID  - jf:ECML-PKDD-11
T1  - Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
A1  - Cheng, Weiwei
A1  - Fürnkranz, Johannes
A1  - Hüllermeier, Eyke
A1  - Park, Sang-Hyeun
ED  - Gunopulos, Dimitrios
ED  - Hofmann, Thomas
ED  - Malerba, Donato
ED  - Vazirgiannis, Michalis
TI  - Proceedings of the 22nd European Conference on Machine Learning and 	Principles and Practice of Knowledge Discovery in Databases (ECML 	PKDD 2011, Athens, Greece), Part I
Y1  - 2011
SP  - 312
EP  - 327
PB  - Springer
UR  - /publications/papers/ECML-PKDD-11.pdf
N2  - This paper makes a first step toward the integration of two subfields of
machine learning, namely preference learning and reinforcement learning
(RL). An important motivation for a "preference-based" approach to
reinforcement learning is a possible extension of the type of feedback
an agent may learn from. In particular, while conventional RL methods
are essentially confined to deal with numerical rewards, there are many
applications in which this type of information is not naturally
available, and in which only qualitative reward signals are provided
instead. Therefore, building on novel methods for preference learning,
our general goal is to equip the RL agent with qualitative policy
models, such as ranking functions that allow for sorting its available
actions from most to least promising, as well as algorithms for learning
such models from qualitative feedback. Concretely, in this paper, we
build on an existing method for approximate policy iteration based on
roll-outs. While this approach is based on the use of classification
methods for generalization and policy learning, we make use of a
specific type of preference learning method called label ranking.
Advantages of our preference-based policy iteration method are
illustrated by means of two case studies.
ER  -