6th and 8th position at the NIPS 2013 Multi-label Bird Species Classification
The Bird NIPS4B competition asked participants to identify which of 87 sound classes of birds and their ecosystem are present into 1000 continuous wild recordings. The training dataset consisted of 687 labelled recordings. Hear the first and the second of these recordings for an example for Sylvia cantillans singing and Sylvia melanocephala calling on the one hand and Sylvia cantillans singing and Petronia petronie calling on the other hand.
Jinseok Nam together with Dong-Hyun Lee (Team ABIA) and Eneldo Loza Mencía (Team ELM@KE) achieved the 8th and 6th best performances, respectively, which were computed on 67% of the test set. During the competition the organizers continuously published the current results on the remaining 33% of the test set, where both teams finished 5th and 4th, respectively. But unfortunately the final ranking was computed on the other 67% of the test set, and when this ranking was revealed, we had to find out that we were eventually overtaken by a very small margin on this relevant test set. There were 32 competitors in total.
The first three competitors used advanced and sophisticated techniques for audio processing, and this is reflected by the margin to the remaining competitors. We are in the next block of competitors who (presumably) did not have any special experience with audio processing. However, by adapting deep learning techniques and optimizing (multi-label) machine learning approaches we managed to be on top of this group.
We used the preprocessed data provided by the organizers, which essentially transforms the audio samples in a frequency dependent representation (MFCC). On this data, we applied techniques known from deep learning and visual computing (patch-based sampling, feature learning, denoising autoencoders) to obtain vectors of 2000 features for each audio recording. This was used as training information for learning a single hidden layer neural network (ABIA) and a combination of a multi-label pairwise ensemble of SVMs and 50000 random decision trees (ELM@KE).