Resources
This site contains datasets, applications, tools and other resources publicly provided by the KE Group.
The Knowledge Engineering Group dissolved at the end of 2021 due to the departure of Prof. Fürnkranz to Linz. Thanks to all former members of the group, to all the students that were part of our group, to all participants in our courses and lectures, and to all supporters.
Since Januar 2022, this is just a mirror of the Knowledge Engineering site.
The following list gives a short description of the available resources:
Datasets
-
- EUR-Lex text collection
- The EUR-Lex text collection provides a large multlabel classification benchmark with up to 4000 different classes.
-
- Datasets for Graded Multilabel Classification
- The known BeLaE Dataset and two new datasets from medical text classification and movie ratings.
- Datasets for Graded Multilabel Classification
-
- Incident-Related Twitter Datasets
- These datasets comprise labeled tweets from 10 major cities in the English-speaking world. The tweets were selected and labeled for the domain of incident detection.
-
- Medical Concept Embeddings
- Concept vector representations learned from a large labeled background corpus. These were used for computing the semantic similarity between terms from the medical domain
- Medical Concept Embeddings
-
- DIP-SumEval: A Data Set of Human Summary Evaluations
- A dataset containing over 400 automatically generated summaries for 49 topics of an data set for multi-document summarization, 1274 judgements according to 11 text and summary quality criteria on a Likert-scale (1 to 5) performed by 26 trained annotators, and 43218 pairwise judgements according to 6 criteria performed by 64 crowd-workers.
- DIP-SumEval: A Data Set of Human Summary Evaluations
Ontologies
-
- UI² Ontology
- The UI² Ontology is a formal ontology for describing user interfaces, their components, and the possible interactions with them.
Software
-
- Computer Poker Bots and the TUD poker framework
- A small repository of (old) Computer Poker Bots and our framework for developing, comparing bots and playing against them with a GUI
- Computer Poker Bots and the TUD poker framework
-
- Attachment Checker
- A Thunderbird plugin that learns to warn you when you forget to attach a file to your message.
-
- Classification GUI
- A graphical user interface that allows to intuitively assign concepts from an ontology to a set of documents in order to quickly and easily develop a (multilabel) classification dataset.
-
- Peewit
- A light-weight meta-framework for machine learning experiments.
-
- FeGeLOD
- A tool for generating machine-learning features from Linked Open Data.
-
- Explain-a-LOD
- A tool for generating possible explanations for statistics based on Linked Open Data.
-
- SeCo
- A framework for Separate-and-Conquer Rule Learning.
-
- Perceptrovement
- A highly modular framework for the efficient Perceptron algorithm containing a great collection of effective extensions
-
- MoB4LOD
- A framework for creating customized browser applications for Linked Open Data
-
- JFreeWebSearch
- A free (i.e., no registration and API key required) Java library to perform searches on the web
-
- Ontology Matching Tools
- The KE group has developed a variety of ontology matching tools.
- Ontology Matching Tools
- Graded Multilabel Classification, Code and Data
The code and data used for our paper about pairwise graded multilabel classification. In this setting, a label is not only present or absent, but can have several grades, e.g. stars. - P³oodle
A browser extension/add-on for personalized privacy-protected web search. - AiTextML
Learn continuous vector representations jointly for words, documents, and labels. Use corpora with labelled documents and use also descriptions of labels. This enables also to do zero-shot learning, i.e., to predict labels for which no documents were observed during training.
Computing
-
- Students Pool
- Students who are active in our group have the possibility to use our infrastructure and our pool with six Linux-based computers in room D205.
-
- Get to know our research computing cluster.