UPC TelecomBCN organises two open lectures as a conclusion of the two deep learning Winter Schools hold in our institution: Introduction to Deep Learning and Deep Learning for Speech and Language. Four renowned local researchers with a high international impact will present their works based on deep learning techniques. The two lectures will be preluded and followed by student project presentations, also open to the public.
10:00 BSc Student Projects from Introduction to Deep Learning
11:30 Joost van de Weijer (Computer Vision Center) – “Domain-adaptive deep network compression”
12:15 Joan Serrà (Telefónica Research) – “Unintuitive Properties of Deep Neural Networks”
13:00 Lunch Break
14:15 Carlos Segura (Telefónica Research) – “Deep Learning for conversational agents”
15:00 Jordi Pons (Universitat Pompeu Fabra) – “Deep learning architectures for music audio classification: a personal (re)view”
16:00 MSc Student Projects from Deep Learning for Speech and Language
The organisers may limit the access to the room to ticket holders only.
Deep Neural Networks trained on large datasets can be easily transferred to new domains with far fewer labeled examples by a process called fine-tuning. This has the advantage that the learned representations of the large source domain can be exploited on smaller target domains. However, the network designed to be optimal for the source task is often prohibitively large for the target task. This is especially problematic when the network is to be applied in applications with limited memory and energy usage requirements. In this talk we address the compression of networks after domain transfer. We focus on a special class of compression algorithms based on low-rank matrix decomposition. Results show that significant compression rates can be applied with only little drop in overall accuracy of the networks.
Deep learning is an undeniably hot topic, not only within both academia and industry, but also among society and the media. The reasons for the advent of this popularity are manifold: unprecedented availability of data and computing power, some innovative methodologies, minor but significant technical tricks, etc. However, interestingly, the success and practice of deep learning seems not to be fully correlated with its theoretical, more formal understanding. And with that, deep learning’s state of the art presents a number of unintuitive properties or situations. In this talk, I will highlight some of these unintuitive properties, trying to show relevant recent work, and exposing the need to overcome them, either by formal or more empirical means.
Joan Serrà is a research scientist with Telefónica R&D in Barcelona. He works on machine learning and artificial intelligence, typically dealing with sequential and/or sparse data. He did his MSc (2007) and PhD (2011) in computer science at the Music Technology Group of Universitat Pompeu Fabra, Barcelona. During that time, he was also an adjunct professor with the Dept. of Information and Communication Technologies of the same university (2006-2011). He did a postdoc in artificial intelligence at IIIA-CSIC, the Artificial Intelligence Research Institute of the Spanish National Research Council in Bellaterra, Barcelona (2011-2015). He has had research stays at the Max Planck Institute for the Physics of Complex Systems in Dresden, Germany (2010), the Max Planck Institute for Computer Science in Saarbrücken, Germany (2011), and visited Goldsmiths, University of London, United Kingdom (2012). Joan has been involved in more than 10 research projects, funded by Spanish and European institutions, and co-authored over 90 publications, many of them highly-cited and in top-tier journals and conferences, in diverse scientific areas. He also regularly acts as peer reviewer for some of those and other publications.
Conversational agents, also known as ChatBots or Dialog Systems, have gained a lot of interest recently. Traditional conversational agent systems usually require significant development effort and are based on complex modules for understanding user inputs, taking decisions and generating meaningful answers. Thanks to the advance of deep learning in close areas such as Neural Machine Translation, recent research is being devoted to data-driven neural conversational models that can learn from human conversations. However, the development of robust and reliable chatbots based purely on deep learning technologies is still an open area of research and it is still common to develop agents based on hybrid approaches using hand-crafted rules and state-of-the-art techniques. This talk is aims at providing an overview of the development of chatbots, while giving a brief review of the state-of-the-art and current challenges.
Carlos Segura, Ph.D., is an Associate Researcher at Telefonica Research in Barcelona, Spain. From 2011 to 2015 he worked at the company Herta Security as the Director of Innovation under the Torres Quevedo program, where his main duties were researching and developing algorithms for speaker and face recognition. He has participated in three national research projects and three EU research projects, and has published many scientific papers in peer-reviewed international journals and international conferences. His research interests include deep learning, machine learning. speech processing, computer vision, and more lately, natural language processing and dialog systems.
A brief review of the state-of-the-art in music informatics research and deep learning reveals that such models achieved competitive results for several music-related tasks. In this talk I will provide insights in which deep learning architectures are (according to our experience) performing the best for audio classification. To this end, I will first introduce a review of the available front-ends (the part of the model that interacts with the input signal in order to map it into a latent-space) and back-ends (the part predicting the output given the representation obtained by the front-end). And finally, in order to discuss previously introduced front-ends and back-ends, I will present some cases we found throughout our path researching which deep learning architectures work best for music audio tagging.
Jordi Pons is a telecommunications engineer specialized in audiovisual systems (Universitat Politècnica de Catalunya, Barcelona) and he holds an M.S. in sound and music computing (Music Technology Group – Universitat Pompeu Fabra, Barcelona). Jordi did his first internship at IRCAM (Paris) where he wrote his undergraduate thesis in source separation for drums transcription under the supervision of Axel Roebel. Later on, he did an internship in the German Hearing Center (Hannover) where he developed his Master’s thesis on how to improve the perception of music in cochlear implant users using source separation – with Waldo Nogueira. Currently, he is pursuing a PhD in music technology, large-scale audio collections, and deep learning at the Music Technology Group (Universitat Pompeu Fabra, Barcelona) under the supervision of Xavier Serra. Related to his thesis, Jordi also did a summer internship at Pandora where he studied how different deep auto-tagging models perform at scale.
Upload your own events here*
* events are manually approved once per week to avoid spam; if you need approval faster please email email@example.com