Seminar aus Data Mining und Maschinellem Lernen

Extreme Classification

The Seminar is available in Tucan right here.

When and where?

The kick-off meeting was on Tuesday, the 12th of April at 17:10 in A213. The regular meetings will take place on Wednesdays at 17:10 in Room S1 03/12 (main building).

Organisation

The topics for the talks will be assigned in the kick-off meeting. For further questions feel free to send an email to ml-sem@ke.tu-darmstadt.de. No prior registration is needed, however, please stlll send us an email so that we are able to estimate beforehand the number of participants.

Content

In the course of this seminar we will try to get an overview on the current state of research in a domain. This year's topic will be Extreme Classification, i.e. methods and approaches for handling multi-class and multi-label classification problems with thousands and millions of categories. We will concentrate on recent papers published in workshops, journals, and conferences. Some entry points for interesting references are the eXtreme Classification workshops (2013, 2015, 2015) and the Extreme Classification repository.

The students are expected to give a 30 minute talk on the material they are assigned, followed by 15 minutes of questions. The content of the talk should exceed the scope of the paper, and demonstrate that a thorough understanding of the material was achieved. See also our general advices on giving talks.

Schedule

The presentation slides are only accessible through the university network (you may use VPN to access them).

4.5.

Hong Linh T.:

I. Katakis, G. Tsoumakas, and I. Vlahavas, Multilabel text classification for automated tag suggestion, in ECML/PKDD Discovery Challenge, 2008.
Arturo Montejo Ráez, Luís Alfonso Ureña López, Ralf Steinberger. Adaptive Selection of Base Classifiers in One-Against-All Learning for Large Multi-labeled Collections. Advances in Natural Language Processing, 4th International Conference, 2004

Anna Marie F.

G. Tsoumakas, I. Katakis, and I. Vlahavas, Effective and efficient multilabel classification in domains with large number of labels, in ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.
S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In NIPS, pages 163–171. Curran Associates, Inc., 2010

11.5.

Jan K.

Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, Alex Strehl, Vishy Vishwanathan. Hash Kernels. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, 2009

Florian B.

Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola. Feature Hashing for Large Scale Multitask Learning. ICML, 2009

18.5.

Zahra F.

S. Ji, L. Tang, S. Yu, and J. Ye, Extracting Shared Subspaces for Multi-label Classification , in KDD, 2008.

Jens B.

F. Tai, and H. Lin, Multi-label Classification with Principle Label Space Transformation, in Neural Computation, 2012.

25.5.

Jonathan G.

C Vens, J Struyf, L Schietgat, S Džeroski, H Blockeel. Decision trees for hierarchical multi-label classification, Machine Learning, 2008

Christian K.

R. Agrawal, A. Gupta , Y. Prabhu, and M. Varma, Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages, in WWW, 2013.

1.6.

Zhizhen W.

Wei Bi , James Tin-Yau Kwok : Multi-Label Classification on Tree- and DAG-Structured Hierarchies. In: Proceedings of the 28th International Conference on Machine Learning, 2011.

Yeimy V.

Y. Prabhu, and M. Varma, FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning, in KDD, 2014.

8.6.

Albert S.

Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NIPS, 2012.

Thomas A.

H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML, 2014.

15.6.

Kim B.

J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011

Camila G.

Jinseok Nam, Eneldo Loza Mencía and Johannes Fürnkranz, All-in Text: Learning Document, Label, and Word Representations Jointly, in: Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016

22.6.

Simon Peter B.

P. Mineiro, and N. Karampatziakis, Fast Label Embeddings via Randomized Linear Algebra, Preprint, 2015.

Daniel S.

N. Karampatziakis, and P. Mineiro, Scalable Multilabel Prediction via Randomized Methods, Preprint, 2015.

29.6.

Marten P.

K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NIPS, 2015.

Lukas M.

K. Balasubramanian, and G. Lebanon, The Landmark Selection Method for Multiple Output Prediction , ICML, 2012.

Nagihan K.

W. Bi, and J. Kwok, Efficient Multi-label Classification with Many Labels , in ICML, 2013.

6.7.

Simon-Konstantin T.

Timothy N. Rubin,·America Chambers, Padhraic Smyth, Mark Steyvers. Statistical topic models for multi-label document classification, Machine Learning, 2011

Paola R.

Piyush Rai, Changwei Hu, Ricardo Henao, Lawrence Carin. Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings , NIPS; 2015

Talks

The talks are expected to be accompanied by slides. In case you do not own a laptop, please send us the slides in advance, so that we can prepare and test the slides. The talk and the slides are allowed to be both english or german, but we strongly encourage the students to give the talk in english.

Grading

The slides, the presentation and the question and answers section of the talk will influence the overall grade. Furthermore, it is expected of the students to participate in the discussions. There is no need for a written verdict of the material.

Most importantly, the autonomous elaboration on the material will influence the grade. To achieve a grade in the 1.x range, the talk needs to exceed the contentual recitation of the given material and include own ideas, own experience or even demos. An exact recitation of the papers will lead to a grade in the 2.x range. A weak presentation and lack of engagement in the discussions may lead to a grade in the 3.x range, or worse.
Please read also very carefully our guidelines for giving a talk.