In this week's interview we talk to Werner Bailer, researcher at Joanneum Research Forschungsgesellschaft (JRS).
Hi Werner! First, could you please introduce yourself and the work you do at JRS?
I’m a researcher in the Connected Computing Group at JRS, working on analysis and processing of image and video content. This includes extracting information and enriching metadata, for example, using machine learning methods, as well as representing and modelling this metadata. I’m also active in standardisation efforts related to these topics, for example in MPEG on compact video descriptors and representation of neural networks and in EBU/AMWA FIMS, an initiative defining service interfaces for media processing (e.g. quality analysis and automated metadata extraction).
You have been working in the field of audiovisual research for many years. What are to you the most exciting areas of research at the moment and the challenges to solve there?
In recent years, we have seen a lot of progress based on deep learning methods, enabled by large annotated data sets and the computing power of graphics processors (GPUs). While this is great, and enables solutions for many analysis problems with accuracy that is sufficient for productive use, many of these services rely on general purpose data for training. Adapting them to specific problems, in particular when there is only a small set of labelled data available, is still challenging. Also, many of these tools, in particular those provided as cloud-based services, are more or less black boxes. Their output may change due to retraining, and it is not transparent for the user how certain metadata has been extracted. Making the results of these services traceable and explainable to non-expert users is an important challenge ahead.
What sets apart the audiovisual analysis tools that MARCONI will provide?
The use cases of MARCONI are quite broad, and we think of the system more as a toolkit, that is used in different ways in radio stations of different type and size. This requires services that are easy to adapt, and that can be plugged into different workflows. In that sense, MARCONI services will need to work with the “long tail” of digital content, i.e. very specific content relevant to a relatively small group. For example, if we do face recognition, the usual 10 or 50 thousand celebrities supported by off the shelf services are probably mostly irrelevant, but we need to recognise the rock band from down the street and the local football team. Making these adjustments easy to set up and control is a challenge ahead of us.
Read more about Werner on LinkedIn.