Seminar aus Maschinellem Lernen und Data Mining

Stream Mining and Concept Drift

The seminar is available in TUCaN under module number 20-00-0102.

When and Where?

The kick-off meeting is on Tuesday, April 17, 17:10h in A213. Please note the different room assignment and kick-off date as in TUCaN.

The dates of subsequent meetings are given below. Unless mentioned otherwise, they will be on Tuesdays, 17.10h, A213.

Content

In the course of this seminar we will try to get an overview on the current state of research in a domain. This year's topic will be Stream Mining and Concept Drift, i.e. methods that learn from an incoming stream of data, with a particular focus on the problem that the concept to learn my change over time. We will cover both important traditional work and recent papers published in workshops, journals, and conferences.

Organization

The language used in the seminar will be English.

The topics for the talks will be assigned in the kick-off meeting. Do not miss the kick-off meeting if you want to participate in the seminar.

It is not necessary to have prior knowledge, but prior knowledge in data mining and machine learning will be helpful. Participation is limited to 20 students. In case we have more students, students with prior knowledge in data mining and knowledge discovery will be preferred. The selection will be made at kick-off meeting. If there are more qualified people than topics, we will use random selections.

The students are expected to give a 20 minute talk on the material they are assigned, followed by feedback, questions, and discussions. Although each topic is typically associated with a single paper, the point of the talk is not to exactly reproduce the entire contents of the paper, but to communicate the key ideas of the methods that are introduced in the paper. Thus, the content of the talk should exceed the scope of the paper, and demonstrate that a thorough understanding of the material was achieved. See also our general advices on giving talks.

For further questions feel free to send an email to ml-sem@ke.tu-darmstadt.de. No prior registration is needed, however, please still send us an email so that we are able to estimate beforehand the number of participants, and have your E-mail address for possible announcements. Also make sure that you are registered in TUCaN.

Talks

The talks are expected to be accompanied by slides. The students will have to send the slides one week in advance to the talk to ml-sem@ke.tu-darmstadt.de. We will use this opportunity to provide early feedback on common problems such as too many slides, too much text on the slides, small font sizes, etc. The talk and the slides should be in English.

There will be two talks in each meeting. As mentioned above, each topic is associated with one paper, but the talk should not exactly reproduce the content of the paper, but communicate the key ideas of the introduced method.

All papers should be freely available on the internet or in the ULB. Note that some paper sources such as Springer link often only works on campus networks (sometimes not even via VPN). If you cannot find a paper, contact us.

Grading

The slides, the presentation and the question and answers section of the talk will influence the overall grade. Furthermore, it is expected that students actively participate in the discussions, and this will also be part of the final grade.

We may also require a short written report.

To achieve a grade in the 1.x range, the talk needs to exceed the contentual recitation of the given material and include own ideas, own experience or even demos. An exact recitation of the papers will lead to a grade in the 2.x range. A weak presentation and lack of engagement in the discussions may lead to a grade in the 3.x range, or worse. Please read also very carefully our guidelines for giving a talk.

In addition to the grading, we will also give public feedback on the talks immediately after the talks, and we are considering a best presentation award at the end of the seminar.

Topics

Here is a list of topics, each topic consists of two seminar talks (indicated by the bullet list). For each seminar talk, we give 1-3 papers as a starting point. However, note that you are not supposed to reproduce the papers in all details. For most talks, you should explain the method that is introduced in the paper(s), and show where and how it can be used. Often you will find much better examples or use cases in later publications on these methods. See also our guidelines for giving a talk.

Overview (8. 5. 2016)

Jan R.
Gama, J., Zliobaite I., Bifet A., Pechenizkiy M., Bouchachia A. (2014). A Survey on Concept Drift Adaptation. ACM Comput. Surv. Article, 46(44). Slides.
Marcel J.
Webb, G. I., Hyde, R., Cao, H., Nguyen, H. L., & Petitjean, F. (2016). Characterizing concept drift. Data Mining and Knowledge Discovery, 30(4), 964–994. Slides.

Concept Drift (8. 5. 2018)

Mark R.
Schlimmer, J. C., & Granger, R. H. (1986). Incremental learning from noisy data. Machine Learning, 1(3), 317–354. Slides.
Johannes C.
Widmer, G., & Kubat, M. (1996). Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning, 23(1), 69–101. Slides.

Decision Trees (15. 5. 2018)

Maximilian W.
Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. Proceedings Kdd, 71–80. Slides.
Christoph S.
Gama, J., Fernandes, R., & Rocha, R. (2006) Decision trees for mining data streams. Intelligent Data Analysis 10 23-45. Slides.

Rule Learning (22. 5. 2018)

Simon H.
Petr Kosina, João Gama: Very fast decision rules for classification in data streams. Data Min. Knowl. Discov. 29(1): 168-202 (2015). Slides.
Jan E.
João Duarte, João Gama, Albert Bifet: Adaptive Model Rules From High-Speed Data Streams. TKDD 10(3): 30:1-30:22 (2016). Slides.

Ensemble Methods - Bagging and Boosting (29. 5. 2018)

Stefan W.
Oza, N. C. (2005). Online bagging and boosting. 2005 IEEE International Conference on Systems, Man and Cybernetics, 3, 105–112. Slides.
Daniel W.
Scholz, M., & Klinkenberg, R. (2007). Boosting classifiers for drifting concepts. Intelligent Data Analysis, 11(1), 3–2. Slides.

Ensemble Methods - Other (5. 6. 2018)

Clemens B.
Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2(1), 226--235. Slides.
Alina D.
Jan N. van Rijn, Geoffrey Holmes, Bernhard Pfahringer, Joaquin Vanschoren. The online performance estimation framework: heterogeneous ensemble learning for data streams. Machine Learning 107(1): 149-176 (2018). Slides.

Clustering (12. 6. 2018)

Alessia D.
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996: 226-231. Slides.
Julian B.
Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu: On High Dimensional Projected Clustering of Data Streams. Data Min. Knowl. Discov. 10(3): 251-273 (2005). Slides.

Statistical Learning (19.6.2018)

Pascal K.
R. Klinkenberg and Th. Joachims. 2000. Detecting Concept Drift with Support Vector Machines. In Proc. of the 17th Int. Conf. on Machine Learning (ICML). Morgan Kaufmann, 487–494. Slides. Demo.
Marcel H.
Bouchachia, A. (2011). Incremental learning with multi-level adaptation. Neurocomputing, 74(11), 1785–1799. Slides.

Active Learning and Forgetting (26.6.2018)

Alexander Z.
Jaber, G., Cornuéjols, A., & Tarroux, P. (n.d.). Online Learning: Searching for the best Forgetting Strategy under Concept Drift. Slides.
Thomas H.
Indre Zliobaite, Albert Bifet, Bernhard Pfahringer, Geoffrey Holmes: Active Learning With Drifting Streaming Data. IEEE Trans. Neural Netw. Learning Syst. 25(1): 27-39 (2014). Slides.

Context Tracking (3.7.2018)

Markus B.
Widmer, G. (1997). Tracking Context Changes through Meta-Learning. Machine Learning, 27(3), 259–286. Slides.
Stefan F.
Katakis, I., Tsoumakas, G., Vlahavas, I., Katakis, I., Tsoumakas, G., & Vlahavas, I. (2010). Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst, 22, 371–391. Slides.

Nonstationary Environments (10.7.2018)

Lukas F.
Elwell, R., & Polikar, R. (2011). Incremental Learning of Concept Drift in Nonstationary Environments. IEEE Transactions on Neural Networks, 22(10), 1517–1531. Slides.

Feature Drift (19.7.2018, 16.30h, E202)

Borhan S.
Jean Paul Barddal, Heitor Murilo Gomes, Fabrício Enembreck, Bernhard Pfahringer: A survey on feature drift adaptation: Definition, benchmark, challenges and future directions. Journal of Systems and Software 127: 278-294 (2017). Slides.