Themen

Die folgenden nach Themen geordneten Fachartikel stehen zur Verfügung. Den meisten ist eine kurze Zusammenfassung des Abstracts zugeordnet, die teilweise weitere Informationen zu dem Thema beinhaltet. Optimalerweise sollte man sich vor der Vorbesprechung einen oder mehrere Artikel heraussuchen oder eigene Vorschläge überlegt im Voraus bereits zugesandt haben. In der Vorbesprechung werden wir die Themen möglichst nach bestehendem Interesse verteilen. In einzelnen Fällen können auch mehrere Artikel vorgestellt werden.

Hinweis: Viele Artikel (insbesondere die des Springer-Verlages) können nur innerhalb des TUD-internen Netzes heruntergeladen werden.

Performance Measures und ROC (6)

Harald Steck. Hinge Rank Loss and the Area Under the ROC Curve. ECML, Lecture Notes in Computer Science, Vol. 4701, pp. 347-358, Springer, 2007.
Hinge rank loss is defined and it is shown that it provides an asymptotically-tight lower-bound on the AUC. The relation to training Support Vector machines is given, too (i.e., that the AUC is maximized by standard SVM training).
Tadeusz Pietraszek. On the use of ROC analysis for the optimization of abstaining classifiers. Machine Learning, 68(2), pp. 137-169, 2007.
A classifier that holds the classification back on certain examples is built with the use of ROC analysis. Such a classifier often performs better than a normal classifier (whereas they aren't easy to compare). The abstaining classifiers are evaluated by optimizing a cost-based model, a bounded-abstention model (where the boundary is the maximum abstention rate of the classifier) and a bounded-improvement model (where the boundary is based on the maximum miscallification cost of the classifier).
Tom Fawcett and Alexandru Niculescu-Mizil. PAV and the ROC convex hull. Machine Learning, 68(1), pp. 97-106, 2007.
The PAV (Pool Adjacent Violators) algorithm is used for calibrating rankings of a classifier by mapping the instance scores with a nonparametric isotonic regression (this method relies on the assumption that the calibrated scores are monotonically increasing (isotonic) with respect to the classifer score). The transformation gives the lowest Brier Score (and cross-entropy). Note that classifiers often outputs uncalibrated scores (for example Naive Bayes). The paper shows that this method is equivalent to the ROC convex hull method. The aim of the authors is to point out the connections between the two very different methods to allow new extensions to both of them.
Chris Drummond and Robert C. Holte. Cost curves: An improved method for visualizing classifier performance. Machine Learning, 65(1), pp. 95-130, 2006.
ROC curves are compared to the novel cost-curves for comparing prediction performance. In comparison with ROC curves, cost-curves do take into account costs produced by misclassifying an example. Cost-sensitive classifying is a common task in real world problems
Jesse Davis and Mark Goadrich. The Relationship Between Precision-Recall and ROC Curves. Proc. ICML'06, pp. 233-240, 2006.
This paper examines the connection between the ROC space and the PR space. It shows that a curve dominates in ROC space iff it dominates in PR space. A notion of an achievable PR curve is given which is quite similiar to the ROC convex hull (they also give an algorithm for computing this curve). Additionally, some differences between the two spaces are highlighted. Note that PR curves are a better algorithm evaluation if the examples are unequally distributed.
Shaomin Wu, Peter Flach, and Cesar Ferri. An Improved Model Selection Heuristic for AUC. ECML, Lecture Notes in Computer Science, Vol. 4701, pp. 478-489, Springer, 2007.
AUC has been widely used to measure ranking performance for binary classification. AUC only employs the classifier’s scores for ranking ignoring other valuable information conveyed by them. This paper introduces a new model selection method - scored AUC (sAUC) which is the area under the sROC curve - measuring how quickly AUC deteriorates if positive scores are decreased.

Perceptrons, linear Classifiers and incremental learning (5)

Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research Vol. 7, pp. 551-585, 2006.
Family of "passive-aggressive" variants of the perceptron algorithm for binary and multiclass classification and regression, that do not only update the hypothesis when a prediction error occurs, but also when the decision was close to wrong, in order to find a better (more general) separating boundary
Roni Khardon and Gabriel Wachman. Noise Tolerant Variants of the Perceptron Algorithm, Journal of Machine Learning Research, Vol. 8, pp. 227-248, 2007.
(Performance) analysis of variants of the perceptron algorithm especially regarding noise-robustness
Nicolò Cesa-Bianchi, Claudio Gentile, and Luca Zaniboni. Incremental Algorithms for Hierarchical Classification. Journal of Machine Learning Research, Vol. 7, pp. 31-54, 2006.
Simple linear, incremental algorithm applied to datasets with a hierarchical structure of classes. Comparison of performance to a simple perceptron variant on the huge Reuters dataset
Petroula Tsampouka, John Shawe-Taylor. Approximate Maximum Margin Algorithms with Rules Controlled by the Number of Mistakes. Proc. ICML'07, pp. 903-910, 2007
Family of perceptron-like algorithms, analysed especially regarding properties of the learning rate and the margin of the decision boundaries
Seung-Jean Kim, Alessandro Magnani, Sikandar Samar, Stephen P. Boyd and Johan Lim. Pareto optimal linear classification. Proc. ICML'06, pp. 473-480, ACM, 2006.
Tries to find a linear classifier that minimizes the probability of misclassification by finding the Pareto optimal linear decision boundary.

Multiclass (8)

Ranking:

Shai Shalev-Shwartz and Yoram Singer. Efficient Learning of Label Ranking by Soft Projections onto Polyhedra. Journal of Machine Learning Research, Vol. 7, pp. 1567-1599, 2006.
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai and Hang Li. Learning to Rank: From Pairwise Approach to Listwise Approach. Technical Report MSR-TR-2007-40, Microsoft Research Asia, 2007.
Object Ranking Problems were usually tackled with "pairwise" approaches, where two objects were used as instances in learning. In this way, Ranking problems were reduced to binary Classification Problems with paired object as instances. The authors propose a listwise approach and compare it to several pairwise methods.
Corinna Cortes, Mehryar Mohri, and Ashish Rastogi. Magnitude-Preserving Ranking Algorithms. Proc. ICML'07, pp. 169-176, 2007.
Method for learning not only to rank correctly, but also to maintain the differences in the ratings (example application: movie ratings)

Binarization:

Robert Tibshirani and Trevor Hastie. Margin Trees for High-dimensional Classification. Journal of Machine Learning Research, Vol. 8, pp. 637-652, Microtome Publishing, March 2007.
Binarization method that decomposes a multiclass problem into several binary problems in such a way, that the "easy" (binary) problems are learned first. A binary decision tree is created beginning with the classifier of the easiest problem at the root, with the other binary classifiers at the inner nodes, and the predicted classes at the leafs.
Iain Melvin, Eugene Ie, Jason Weston, William Stafford Noble, and Christina Leslie. Multi-class Protein Classification Using Adaptive Codes. Journal of Machine Learning Research, Vol. 8, pp. 1557-1581, Microtome Publishing, July 2007.
Adaptive decoding matrix: a protein multiclass classification problem with a high number of classes is decomposed into several binary problems. Instead of the common way of using a fixed decoding matrix, i.e. a fixed way of combining and weighting the binary predictions for reaching a final decision, a ranking perceptron is trained for this task.

Multilabel:

Adriano Veloso, Wagner Meira Jr., Marcos Goncalves, and Mohammed Zaki. Multi-label Lazy Associative Classification, PKDD 2007, LNAI 4702, pp. 605-612
In multi-label learning tasks instances can be associated with multiple labels simultaneously. This paper proposes a multi-label lazy associative classifier, which progressively exploits dependencies among labels. As the classification model is induced based on instances, this approach provides a better coverage of small disjuncts. Neglecting these small disjuncts may degrade classification accuracy.

Additional papers:

Ling Li. Multiclass boosting with repartitioning. Proc. ICML'06, pp. 569-576, ACM, 2006.
Adaptive coding matrix for binary ensembles.
Yonatan Amit, Michael Fink, Nathan Srebro, and Shimon Ullma. Uncovering Shared Structures in Multiclass Classification. Proc. ICML'07, pp. 17-24, 2007. Factorized Linear Model with Frobenius Regularization

Feature Extraction (4)

Marc Boullé. Compression-Based Averaging of Selective Naive Bayes Classifiers. Journal of Machine Learning Research, Vol. 8, pp. 1659-1685, Microtome Publishing, July 2007.
Feature Selection for Naive Bayes
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng.. Self-taught Learning: Transfer Learning from Unlabeled Data. Proc. ICML'07, pp. 759-766, 2007.
Semisuperwised feature extraction
Juan Dai, Shuicheng Yan, Xiaoou Tang, and James T. Kwok. Locally adaptive classification piloted by uncertainty. Proc. ICML'06, pp. 225-232, ACM, 2006.
Ensembles of local classifiers, where 'local' refers to areas of the input space.
Amir Globerson and Sam T. Roweis. Nightmare at test time: robust learning by feature deletion. Proc. ICML'06, pp. 353-360, ACM, 2006.
Training of linear models such that some of the weights can be dropped.

Additional papers (3)

Janez Demsar. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, Vol. 7, pp. 1-30, 2006.
Theoretical and empirical Study about statistic validation methods in Machine Learning, introduce CD-Diagrams, a visual representation of Classifier Comparisons.
Li Cheng and S. V. N. Vishwanathan. Learning to Compress Images and Videos. Proc. ICML'07, pp. 161-168, 2007.
Use of machine learning techniques to lossly compress video and images. Pixel colors are not computed, but learned. Achieves comparable results to common lossy algorithms
Nam Nguyen and Yunsong Guo. Comparisons of Sequence Labeling Algorithms and Extensions. Proc. ICML'07, pp. 681-688, 2007.
Survey of existing sequence labeling algorithms. These are algorithms that receive instances in a fixed order and thus try to use the class information of previous examples to correctly predict the next ones. Applications are e.g. segmentation and information extraction. In addition an ensemble technique is introduced that uses the previously presented methods.