Seminar aus Maschinellem Lernen und Data Mining

Learning from Weak Supervision

The seminar is available in TUCaN under module number 20-00-0102.

On January 29th, we will start at 17.30h!

When and Where?

Until further notice, the seminar will take place on the dates shown below, 17:10h - 18:40h, in room E202

Content

In the course of this seminar we will try to get an overview on the current state of research on learning with weak supervision. While in conventional, supervised learning one is given a clear indication of the desired target signal, there are many settings in which this training information is not available and must be replaced or enriched with signals from alternative settings. More precisely, when learning a function f(x) from data, one usually receives samples (x_i,y_i = f(x_i)). However, in the literature one can find many other ways for receiving information about f(x):

semi-supervised clustering: The learner is not given any labels, but it is given information about the distribution of the labels over the examples, typically in the form of pairs f(x_i) = f(x_j) or f(x_i) <> f(x_j). Obviously, we can in this case (as in clustering) not aim for recovering f(.), but only for finding a function that partitions the data in the same way as f(.) does.
semi-supervised learning: Like SL, but the learner is given only a few samples (x_i,y_i) and a large amount of unlabeled samples x_j. It can then, e.g., use self-training to bootstrap its learner from the few labeled examples.
active learning: The learner is not given any (or only very few) samples (x_i,y_i), but is allowed to ask for f(x_j) for a limited number of data points x_j, which it can choose or generate on its own.
reinforcement learning (or a special case thereof): The learner is not given any information about f(x), but it can get numeric feedback for samples (x_j,y'_j) which indicates how good its guess y'_j was (without knowing the scale, and without ever knowing what is right or wrong)
preference learning / label ranking (or a special case thereof): The learner is not given pairs (x_i,y_i), but pairs (x_i, y_i1 > y_i2), which indicate which of a pair of labels is better for a given input x_i (without knowing the correct label y_i)
superset learning / learning from partial labels: The learner is given sets of labels (x_i, {y_i1, y_i2, ..., y_im_i}), with the semantic that f(x_i) = y_ik for some k.
distant supervision: The learner is given pairs (x_i, y_i), but the y_i are generated by an automated process, a heuristic, or similar. This can, e.g., mean that the training information is only available for certain parts of the input space, where it is easy to automatically label the data by automatic means, and the learner learns to bootstrap this to the entire space.
incidental supervision: The learner is not given any labels, but it is given access to one or more functions g_l(.) that are assumed to correlate well with f(.), i.e., it can be assumed something like f(x_1) = f(x_2) <=> g_l(x_1) = g_l(x_2) for some, for all, or for some other aggregation of the signals g_l(.).

In the seminar, we will cover some of the above techniques by discussing both important traditional work and recent papers published in workshops, journals, and conferences.

Organization

The language used in the seminar will be English.

The topics for the talks will be assigned in the kick-off meeting. Do not miss the kick-off meeting if you want to participate in the seminar.

It is not necessary to have prior knowledge, but prior knowledge in data mining and machine learning will be helpful. Participation is limited to 20 students. In case we have more students, students with prior knowledge in data mining and knowledge discovery will be preferred. The selection will be made at kick-off meeting. If there are more qualified people than topics, we will use random selections.

The students are expected to give a 20 minute talk on the material they are assigned, followed by feedback, questions, and discussions. Although each topic is typically associated with a single paper, the point of the talk is not to exactly reproduce the entire contents of the paper, but to communicate the key ideas of the methods that are introduced in the paper. Thus, the content of the talk should exceed the scope of the paper, and demonstrate that a thorough understanding of the material was achieved. See also our general advices on giving talks.

For further questions feel free to send an email to ml-sem@ke.tu-darmstadt.de. No prior registration is needed, however, please still send us an email so that we are able to estimate beforehand the number of participants, and have your E-mail address for possible announcements. Also make sure that you are registered in TUCaN.

Talks

The talks are expected to be accompanied by slides. The students will have to send the slides one week in advance to the talk to ml-sem@ke.tu-darmstadt.de. We will use this opportunity to provide early feedback on common problems such as too many slides, too much text on the slides, small font sizes, etc. The talk and the slides should be in English.

There will be two talks in each meeting. As mentioned above, each topic is associated with one paper, but the talk should not exactly reproduce the content of the paper, but communicate the key ideas of the introduced method.

All papers should be freely available on the internet or in the ULB. Note that some paper sources such as Springer link often only works on campus networks (sometimes not even via VPN). If you cannot find a paper, contact us.

Grading

The slides, the presentation and the question and answers section of the talk will influence the overall grade. Furthermore, it is expected that students actively participate in the discussions, and this will also be part of the final grade.

To achieve a grade in the 1.x range, the talk needs to exceed the contentual recitation of the given material and include own ideas, own experience or even demos. An exact recitation of the papers will lead to a grade in the 2.x range. A weak presentation and lack of engagement in the discussions may lead to a grade in the 3.x range, or worse. Please read also very carefully our guidelines for giving a talk.

In addition to the grading, we will also give public feedback on the talks immediately after the talks, and we are considering a best presentation award at the end of the seminar.

Topics

Here is a list of topics, each topic consists of two seminar talks (indicated by the bullet list). For each seminar talk, we give 1-3 papers as a starting point. However, note that you are not supposed to reproduce the papers in all details. For most talks, you should explain the method that is introduced in the paper(s), and show where and how it can be used. Often you will find much better examples or use cases in later publications on these methods. See also our guidelines for giving a talk.

Learning from Crowdsourcing (6.11.2018)

Michael G.
Panagiotis G. Ipeirotis, Foster J. Provost, Victor S. Sheng, Jing Wang: Repeated labeling using multiple noisy labelers. Data Min. Knowl. Discov. 28(2): 402-441 (2014). (Slides)
Theo K.
Christopher H. Lin, Mausam, Daniel S. Weld: To Re(label), or Not To Re(label). HCOMP 2014. (Slides)

Semi-Supervised Clustering (13./20.11.2018)

Jonas K.
Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning (pp. 577–584). San Francisco: Morgan Kaufmann.
Jonas B.
Sugato Basu, Arindam Banerjee, Raymond J. Mooney: Semi-supervised Clustering by Seeding. ICML 2002: 27-34. (Slides)
Christian S.
Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6, 937–965. (Slides)
Benjamin B.
Brian Kulis, Sugato Basu, Inderjit S. Dhillon, Raymond J. Mooney: Semi-supervised graph clustering: a kernel approach. Machine Learning 74(1): 1-22 (2009). (Slides)

Learning from Partial Labels (27.11. / 4.12.2018)

Florian H.
Jin, R., Ghahramani, Z.: Learning with multiple labels. In: 16th Annual Conference on Neural Information Processing Systems, Vancouver, Canada (2002) (Slides)
Dahit G.
Grandvalet, Y.: Logistic regression for partial labels. In: IPMU 2002, Int. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 1935–1941, Annecy, France (2002) (Slides)
Alina B.
Nguyen, N., Caruana, R.: Classification with partial labels. In: Proc. KDD 2008, 14th Int. Conf. on Knowledge Discovery and Data Mining, Las Vegas, USA (2008) https://dl.acm.org/citation.cfm?id=1401958
Anup P.
Cour, T., Sapp, B., Taskar, B.: Learning from partial labels. Journal of Machine Learning Research 12, 1501–1536 (2011) http://www.jmlr.org/papers/volume12/cour11a/cour11a.pdf (Slides)

Superset Learning (11.12.2018)

Lalith S.
Li-Ping Liu, Thomas G. Dietterich: Learnability of the Superset Label Learning Problem. ICML 2014: 1629-1637 http://proceedings.mlr.press/v32/liug14.html (Slides)
Tien N.
Eyke Hüllermeier, Weiwei Cheng:Superset Learning Based on Generalized Loss Minimization. ECML/PKDD (2) 2015: 260-275 https://link.springer.com/chapter/10.1007%2F978-3-319-23525-7_16 (Slides)

Weak Supervision (15.1.2019)

Johannes S.
Alexander J. Ratner, Christopher M. De Sa, Sen Wu, Daniel Selsam, Christopher Ré: Data Programming: Creating Large Training Sets, Quickly. NIPS 2016. https://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quickly
Artem V.
Ruth Urner, Shai Ben-David, Ohad Shamir: Learning from Weak Teachers. AISTATS 2012: 1252-1260.

Weak Supervision in Information Retrieval (15.1.2019)

Jan K.
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft: Neural Ranking Models with Weak Supervision. SIGIR 2017: 65-74, https://dl.acm.org/citation.cfm?doid=3077136.3080832
Yannik K.
Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury: W-TALC: Weakly-Supervised Temporal Activity Localization and Classification. ECCV (4) 2018: 588-607, https://doi.org/10.1007/978-3-030-01225-0_35

Learning from Positive and Unlabeled Data (22.1.2019)

Jannik S. (auf 12.2. verschoben)
Marthinus Christoffel du Plessis, Gang Niu, Masashi Sugiyama: Analysis of Learning from Positive and Unlabeled Data. NIPS 2014: 703-711. http://papers.nips.cc/paper/5509-analysis-of-learning-from-positive-and-unlabeled-data
Tobias T.
Atsushi Kanehira, Tatsuya Harada: Multi-label Ranking from Positive and Unlabeled Data. CVPR 2016: 5138-5146. https://doi.org/10.1109/CVPR.2016.555 (Slides)

Distant Supervision (29.1.2019, 17.30h)

Chantale A.
Mike Mintz, Steven Bills, Rion Snow, Daniel Jurafsky: Distant supervision for relation extraction without labeled data. ACL/IJCNLP 2009: 1003-1011 http://www.aclweb.org/anthology/P09-1113
Pascal H.
Liang Zhao, Junxiang Wang, Xiaojie Guo: Distant-Supervision of Heterogeneous Multitask Learning for Social Event Forecasting With Multilingual Indicators. AAAI 2018. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16556/16762.

Incidental Supervision (12.2.2019)

Sarah L.
Dan Roth:Incidental Supervision: Moving beyond Supervised Learning. AAAI 2017: 4885-4890 https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14950(Slides)
NN
Ivan Titov, Alexandre Klementiev: Crosslingual Induction of Semantic Roles. ACL (1) 2012: 647-656 http://www.aclweb.org/anthology/P12-1068