WiMIR Workshop 2019 Project Guides

Pandora_Wordmark_RGB (1).png

This is a list of Project Guides and their areas of interest for the 2019 WiMIR Workshop, which will take place on Sunday, 3rd November 2019 as a satellite event of ISMIR2019. These folks will be leading the prototyping and early research investigations at the workshop.

This year’s Workshop is organized by Blair Kaneshiro (Stanford University), Katherine M. Kinnaird (Smith College), Thor Kell (Spotify), and Jordan B. L. Smith (Queen Mary University of London) and is made possible by generous sponsorship from Pandora and Spotify.

Planning to attend the WiMIR Workshop?
Read about the Project Guides and their work in detail below, and sign up to attend at https://forms.gle/mCEod8AvtqnBcMJz7

Amélie Anglade and Ryan Groves: Auto-BeatSaber: Generating New Content for VR Music Games

In this workshop we will dive into a specific problem at the intersection of music, gaming, and dance: the generation of a BeatSaber song level. BeatSaber is one of the most popular VR titles, in which the core of the gameplay is to rhythmically slice incoming boxes with light sabers to the sound of the beat. The game has sparked a huge community of modders who create their own choreographies to existing songs, as well as MIR-based tools (such as a MIDI converter or a BPM estimator) designed specifically to support level creation. The task of this workshop will be: how could we use machine learning to generate these choreographies automatically? Participants will have the opportunity to learn from our experience as Data Science & MIR consultants as we will share our own structured process for problem-solving in the music tech industry.

Dr. Amélie Anglade is a Music Information Retrieval and Data Science consultant. She completed her PhD at Queen Mary University of London, before moving to industry, initially taking on positions in R&D labs such as Sony CSL, Philips Research and CNRS, and then being employed as an MIR expert for Music Tech startups such as SoundCloud and frestyl. For the past 5 years she has further developed her expertise in music identification and discovery–assisting startups and larger companies in the AI and music or multimedia space as an independent consultant, researching, prototyping, and scaling up Machine Learning solutions for them. Additionally, Amélie is a contributor to the EU Commision as an independent technical expert in charge of reviewing proposals and ongoing EU projects. In her spare time she attends music hackathons (15+ so far), and is a teacher and mentor for women in the field of data science through multiple organizations.

Ryan Groves is an award-winning music researcher and veteran developer of intelligent music systems. He received his Master’s in Music Technology from McGill University. In 2016, his work on computational music theory was awarded the Best Paper at ISMIR. He also has extensive experience in industry, building musical products that leverage machine learning. As the former Director of R&D for Zya, he developed a musical messenger app that automatically sings your texts, called Ditty. Ditty won the Best Music App of 2015 by the Appy Awards. More recently, he co-founded Melodrive, where he and his team built the first artificially intelligent composer that could compose music in realtime and react to interactive scenarios such as games and VR experiences. He now works as a consultant and startup advisor in Berlin, with a focus on expanding the use cases of music and audio through the application of AI.

Ashley Burgoyne: Cognitive MIR with the Eurovision Song Contest

When Duncan Laurence triumphed at the 2019 Eurovision Song Contest in Tel Aviv, it was the Netherlands’ first victory in the contest since 1975 – and perfect timing for the ISMIR conference! One of the most-watched and discussed broadcasts in Europe, data about the Song Contest are an excellent opportunity to link the patterns we can find using MIR tools in audio to real-world human behaviour. This workshop will show you how, and teach you techniques you can use wherever you want to use MIR to understand not just music but also people.

We will consider a number of questions. Every year, the bookmakers try to predict the contest winner: can MIR do better? The same songwriters write the songs for multiple countries each year: is there nonetheless a typical sound for each country’s entry? People assume that voting is politically rather than musically based, but recent research has called those assumptions into question: what does the music tell us? And can we link Eurovision tracks to what fans say on Twitter or direct experimentation about what they hear?

John Ashley Burgoyne is the Lecturer in Computational Musicology at the University of Amsterdam and part of the Music Cognition Group at the Institute for Logic, Language, and Computation. Dr Burgoyne teaches in both musicology and artificial intelligence and is especially interested in musicometrics: developing behavioural and audio models that are conceptually sound, reliable, and musicologically interpretable as music enters the digital humanities era. He was the leader of the Hooked on Music project, an online citizen science experiment to explore long-term musical memory that attracted more than 170,000 participants across more than 200 countries.

Jakob Abeßer.png

Estefanía Cano and Jakob Abeßer: Learning about Music with MIR

What does John Coltrane have in common with Cannonball Adderley? What makes micro-timing in Brazilian samba unique? What are the tuning characteristics of the harpsichord? Which cues do musicians use to control ensemble intonation? The MIR community has been working for decades in developing reliable methods for research tasks such as beat tracking, melody estimation, chord detection, music tagging, among many others. While most of these methods are not yet perfect, they can certainly be useful tools when attempting to answer questions as the ones above. This holds true especially if computational analysis tools are combined with the experience from musicians, the insights from human listeners, and music knowledge.

This workshop will focus on exploring ways in which we can gain new knowledge about music by combining available MIR techniques and human musical expertise. Instead of focusing on improving MIR methods or in proposing new ways to solve MIR tasks, we want to use this workshop as a platform to brainstorm new questions about the various aspects of music. We want to revisit old questions and propose new alternatives to address them. We want to look back at previous projects and studies, and use the lessons we learned to improve the way we address questions today.

Estefanía Cano is a research scientist at the Semantic Music Technologies group at Fraunhofer IDMT in Germany. Estefanía received her B.Sc. degree in electronic engineering from the Universidad Pontificia Bolivariana, Medellín- Colombia, in 2005, her B.A. degree in Music- Saxophone Performance from Universidad de Antioquia, Medellín-Colombia, in 2007, her M.Sc. degree in music engineering from the University of Miami, Florida, in 2009, and her Ph.D. degree in media technology from the Ilmenau University of Technology, Germany, in 2014. In 2009, she joined the Semantic Music Technologies group at the Fraunhofer Institute for Digital Media Technology IDMT as a research scientist. In 2018, she joined the Social and Cognitive Computing Department at the Agency for Science, Technology and Research A*STAR in Singapore. Her research interests include sound source separation, music education, and computational musicology.

Jakob Abeßer studied computer engineering (Dipl.-Ing., 2008) and media technology (Dr.-Ing., 2014) at the Ilmenau University of Technology. Since 2008, he has been working in the field of semantic music processing at the Fraunhofer Institute for Digital Media Technologies (IDMT) in Ilmenau. In 2005 and 2010 he spent 2 stays abroad at the Université Paul Verlain in Metz, France and the Finnish Centre of Excellence in Interdisciplinary Music Research at the University of Jyväskylä in Finland. Between 2012 and 2017, he also worked as a doctoral researcher in the Jazzomat Research Project at the Franz Liszt School of Music in Weimar, developing methods for the computer- aided analysis of jazz improvisations. Since 2018 he is working as co-investigator of the research project “Informed Sound Activity Detection in Music Recordings” (ISAD) at Fraunhofer IDMT in collaboration with Prof. Dr. Meinard Müller from the International Audio Laboratories in Erlangen, Germany. His current research interests include music information retrieval, machine listening, music education, machine learning and deep learning.

Matthew Davies.png Sebastian Bock.jpg

Matthew Davies and Sebastian Böck: Building and Evaluating a Musical Audio Beat Tracking System

The task of musical audio beat tracking can be considered one the foundational problems in the music information retrieval community. In this workshop we seek to take a tour of the entire beat tracking pipeline by addressing the following steps: i) how to manually annotate ground truth ii) how to construct a lightweight beat tracking model using deep neural networks; iii) how to select appropriate musical material for training and testing; and iv) how to conduct evaluation in a musically meaningful way. In each of these areas, we seek to provide practical hands-on experience and acquired tacit knowledge concerning what works and also what doesn’t work. Throughout the workshop we will promote active participation and discussion with the aim of driving new research in beat tracking and fostering new collaborations.

Matthew Davies is a music information retrieval researcher with a background in digital signal processing. His main research interests include the analysis of rhythm in musical audio signals, evaluation methodology, creative music applications, and reproducible research. Since 2014, Matthew has coordinated the Sound and Music Computing Group in the Centre for Telecommunications and Multimedia at INESC TEC. From 2014-2018, he was an Associate Editor for the IEEE/ACM Transactions on Audio, Speech and Language Processing and coordinated the 4th Annual IEEE Signal Processing Cup. He was a keynote speaker at the 16th Rhythm Production and Perception Workshop, and General Chair of the 13th International Symposium on Computer Music Multidisciplinary Research.

Sebastian Böck received his diploma degree in electrical engineering from the Technical University in Munich in 2010 and his PhD in computer science from the Johannes Kepler University Linz in 2016. Within the MIR community he is probably best known for his machine learning-based algorithms, which pushed the performance of automatic beat tracking and other tasks into regions formerly only achievable by humans. Currently he is continuing his research at the Austrian Research Institute for Artificial Intelligence (OFAI) and the Technical University of Vienna.

Georgi Dzhambazov

Georgi Dzhambazov: Verse and Chorus Detection of Acoustic Cover Versions

Many MIR tasks have as a prerequisite the annotation of structural segments of a song. While cover versions usually retain most music aspects of the original song, there could be a completely new structure (sections appended/missing). In particular, covers with acoustic instrumental accompaniment are characterized by a predominant vocal line, whereby the accompaniment is occasionally missing or improvised. Therefore structure detection algorithms based solely on harmonic features are most likely not a sufficient solution.

In this hands-on-workshop, we will explore the problem of automatic segmentation and labeling of the verse and chorus sections for a given vocal cover version with acoustic accompaniment. Information about the original song (lyrics, chords, guitar tabs etc.) can be found online. Our goal is to come up with ideas/prototypes on how to approach the problem combining existing methods (e.g. vocal activity detection, chord recognition, lyrics-alignment) in new ways, rather than design something completely new. An industry database of vocal cover songs with acoustic accompaniment will be provided.

Georgi is an audio engineer at Smule. He holds a PhD on Music Information Retrieval from the Music Technology Group in Barcelona under the supervision of Xavier Serra. He worked on the topic of automatic alignment of lyrics. He has also experience in applied research on speech recognition and natural language processing. In 2017 he founded VoiceMagix – a company providing solutions for automatic analysis of singing voice.

For several years he is a WiMIR mentor and MIREX task captain. His research interests are algorithms for the singing voice and speech and machine learning in general. He is currently mainly interested in initiatives aiming at bridging the gap between research in MIR and the music industry.

Brian McFee: Coping with Bias in Audio Embeddings

An appealing general approach to modeling problems across many domains is to first transform raw input data through an embedding function, which has been trained on a large (but potentially unrelated) collection of data. This results in a vector representation of each object, which can then be used as input to a simple classifier (e.g., a linear model) to solve some downstream task using a limited amount of data. This approach has been successfully demonstrated in image and video analysis, natural language processing, and is becoming increasingly popular in audio and musical content analysis. However, general-purpose embedding models have been known to encode and propagate implicit biases, which can have detrimental and disparate population-dependent effects.

In this project, we will conduct a preliminary study of embedding bias in MIR data. Using pre-trained audio embeddings and well-known MIR datasets, we will first attempt to quantify the extent to which embedding-based classification exhibits biased results across data sets and/or genres. We will then attempt to de-bias the embedding by adapting recently proposed methods from the natural language processing literature.

Brian McFee is Assistant Professor of Music Technology and Data Science New York University. He received the B.S. degree (2003) in Computer Science from the University of California, Santa Cruz, and M.S. (2008) and Ph.D. (2012) degrees in Computer Science and Engineering from the University of California, San Diego. His work lies at the intersection of machine learning and audio analysis. He is an active open source software developer, and the principal maintainer of the librosa package for audio analysis.

Peter Sobot: Software Engineering for Machine Learners (and Drummers) – Building Robust Applications with Audio Data

Building machine learning systems is hard, but building systems that can scale can be even harder. In this workshop, we’ll discuss software engineering techniques to use when building machine learning systems, including methods to make your code easier to write, test, debug, and maintain. We’ll also build an audio sample classifier with these techniques using basic machine learning concepts, and discuss methods for deploying this system at scale. Finally, we’ll take the system to an extreme and use cloud computing to build a system that learns in response to user input.

Peter Sobot is a Staff Engineer at Spotify, where he works on recommendation products at massive scale, including the systems that power Discover Weekly. His open-source software contributions range from low-level data tools to legendary internet-scale hacks like The Wub Machine. He has spoken at !!Con, Google Cloud Next, and Google Summit, and makes electronic music in his spare time.

Bob Sturm: Computer-Guided Analysis of Computer-Generated Music Corpora

Various iterations of the folkrnn system (folkrnn.org) have generated over 100,000 transcriptions of “machine folk”, e.g., https://highnoongmt.wordpress.com/2018/01/05/volumes-1-20-of-folk-rnn-v1-transcriptions/, but manually looking through these takes a lot of time. In this project we will think about and implement some methods that can help one grasp characteristics of such collections, and find interesting bits. For instance, we can look for instances of plagiarism, find anomalous material, judge similarity in terms of pitches, meter, melodic contour, etc.

Bob L. Sturm is currently an Associate Professor in the Speech, Music and Hearing Division of the School of Electronic Engineering and Computer Science at the Royal Institute of Technology KTH, Sweden. Before that he was a Lecturer in Digital Media at the Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London. His research interests include digital signal processing for sound and music signals, machine listening, evaluation, and algorithmic composition. He is also a musician.

Chris Tralie

Chris Tralie: To What Extent Do Cyclic Inconsistencies Exist in Musical Preferences?

Music recommendation algorithms seek to rank a set of candidate songs in order of some estimated user preference. It may be challenging to ascertain preferences from surveys, however, since Miller’s empirical “rule of 7” could be interpreted to suggest that humans lack the working memory to meaningfully rank much more than 7 items at a time. To learn a longer list of preferences, then, one could consider presenting only a pair of alternatives at a time and aggregating these pairwise preferences into a global ranking. However, real pairwise rankings can lead to cyclic inconsistencies; that is, people often express that A > B and B > C, but also that C > A. This is known as the “Condorcet Paradox.” Fortunately, there exists a topological pairwise rank aggregation technique, known as “HodgeRank,”[1] which can aggregate these rankings into the “most consistent” global order, while simultaneously quantifying the degree to which local (A > B > C > A) and global (A > B > C > … > A) exist. In this workshop, we will first discuss these concepts in more detail, and then we will each listen to pairs of 15 second clips from a diverse corpus of music [2] and rank our preferences, and then apply HodgeRank to see how consistent we all are. We will also use metrics between rankings to show which people in our group have similar preferences. Zooming out, we will also discuss some social psychology literature that correlates musical preferences to personality traits in music [2], and we will discuss concurrent ethical pitfalls that can emerge when collecting data and interpreting results in such studies.

[1] http://www.ams.org/publicoutreach/feature-column/fc-2012-12

[2] Rentfrow, Peter J., et al. “The song remains the same: A replication and extension of the MUSIC model.” Music Perception: An Interdisciplinary Journal 30.2 (2012): 161-185.

Christopher J. Tralie is a data science researcher working in applied geometry/topology and geometric signal processing. His work spans shape-based music structure analysis and cover song identification, video analysis, multimodal time series analysis, and geometry-aided data visualization. He received a B.S.E. from Princeton University 2011, a master’s at Duke University in 2013, and a Ph.D. in at Duke University in 2017, all in Electrical Engineering. His Ph.D. was primarily supported by an NSF Graduate Fellowship, and his dissertation is entitled “Geometric Multimedia Time Series.” He then did a postdoc at Duke University in Mathematics and a postdoc at Johns Hopkins University in Complex Systems. He was awarded a Bass Instructional Teaching fellowship at Duke University, and he maintains an active interest in pedagogy and outreach, including longitudinal mentoring of underprivileged youths in STEAM education. He is currently a tenure track assistant professor at Ursinus College in the department of Mathematics and Computer Science. For more info, please visit http://www.ctralie.com.

TJ Tsai: Generating Music by Superimposing and Adapting Existing Audio Tracks

There has been a lot of work in training models to generate novel music from scratch. In this workshop, we will explore the possibility of generating music by taking a source audio track and enhancing it by superimposing other segments of existing audio material. The specific task we will work on is to take a classical piano recording and to overlay techno beats/music in an aesthetically pleasing manner. We will brainstorm different ways to accomplish this task, develop some prototypes, and hopefully generate some new music by the end of the workshop!

Prof. TJ Tsai completed bachelor’s and master’s degrees in electrical engineering at Stanford University. During college, he studied classical piano with George Barth and participated in the Stanford Jazz Orchestra and the chamber music program. After graduating, he worked at SoundHound for a few years, and then went to UC Berkeley for his Ph.D. Since 2016 he has been a faculty member in the engineering department at Harvey Mudd College, a STEM-focused liberal arts college in Claremont, CA.

Gissel Velarde and Andre Holzapfel: Music Research for Good

In recent years, important advances in artificial intelligence (AI) led to different initiatives considering super-intelligence, its advantages, and dangers. The initiatives fostering beneficial AI include (i) conferences: the AI for Good Global Summit (running since 2017), The Beneficial artificial general intelligence conference (held in 2015, 2017 and 2019), (Iii) the establishment of organizations and projects like OpenAI, Partnership on AI, Google’s AI for Social good, or AI for Humanity from the Université de Montréal, and (iv) governments AI strategies. In last year’s ISMIR conference, our community dedicated a session to discuss ethics in MIR, and there are topics which are still to be addressed. During this workshop, we will review the state of AI for good. We will revisit the tentative ethical guidelines for MIR developers proposed by Holzapfel et. al (2018) and its alignment with guidelines from the IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems (Chatila & Havens, 2019), and Ethics Guidelines for Trustworthy AI from the High-Level Expert Group on Artificial Intelligence [HEGAI] (2019). We will use tools such as SWOAT (strengths, weaknesses, opportunities, and threats) analysis, business canvas, SCAMPER technique for creative thinking and Gantt charts. After a situational analysis, we will define goals, scope, stakeholders, risks, benefits, impact, and an action plan. Finally, we will elaborate ethics guidelines in music research to be proposed to our community for consideration.

Chatila, R., & Havens, J. C. (2019). The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. In Robotics and Well-Being (pp. 11-16). Springer, Cham.

High-Level Expert Group on Artificial Intelligence (2019). Ethics guidelines for trustworthy AI. Retrieved from https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai

Holzapfel, A., Sturm, B.L. and Coeckelbergh, M., 2018. Ethical Dimensions of Music Information Retrieval Technology. Transactions of the International Society for Music Information Retrieval, 1(1), pp.44–55. DOI: http://doi.org/10.5334/tismir.13

Gissel Velarde is a computer scientist, engineer, pianist and composer. She holds a PhD degree from Aalborg University for her doctoral thesis “Convolutional methods for music analysis” supervised by David Meredith and Tillman Weyde. She participated as a research member of the European project, “Learning to Create” (Lrn2Cre8), a collaborative project within the Future and Emerging Technologies (FET) programme of the Seventh Framework Programme for Research of the European Commission. She was a machine learning lead at Moodagent and worked as a consultant for SONY Computer Science Laboratories. She is a DAAD alumni and mentor of the WIMIR program.

Andre Holzapfel received M.Sc. and Ph.D. degrees in computer science from the University of Crete, Greece, and a second Ph.D. degree in music from the Centre of Advanced Music Studies (MIAM) in Istanbul, Turkey. He worked at several leading institutes in computer engineering as postdoctoral researcher, with a focus on rhythm analysis in music information retrieval. His field work in ethnomusicology was mainly conducted in Greece, with Cretan dance being the subject of his second dissertation. In 2016, he became Assistant Professor in Media Technology at the KTH Royal Institute of Technology in Stockholm, Sweden. Since then, his research subjects incorporate the computational analysis of human rhythmic behavior by means of sensor technology, and the investigation of ethical aspects of computational approaches to music.

Eva Zangerle

Eva Zangerle – Multi-Dimensional User Models for MIR

In music information retrieval scenarios (particularly, when it comes to personalization), users and their preferences are often modeled solely by their direct interactions with the system (e.g., songs listened to). However, a user’s perception and liking of recommended/retrieved tracks is dependent on a number of dimensions, which may include the (situational) context of the user (e.g., time, location or activity), the user’s intent, content descriptors and characteristics of tracks the users have listened to and also should be able to model the change of preference over time (short vs. long-term). Comprehensive models that capture these multiple dimensions, however, are hardly devised.

In the scope of this workshop, we aim to look into how users and their preferences (long- and short-term) can be modeled, the implications of such comprehensive user models on the underlying MIR algorithms and also, how we can evaluate the contribution and impact of such user models.

Eva Zangerle is a postdoctoral researcher at the University of Innsbruck at the research group for Databases and Information Systems (Department of Computer Science). She earned her master’s degree in Computer Science at the University of Innsbruck and subsequently pursued her Ph.D. from the University of Innsbruck in the field of recommender systems for collaborative social media platforms. Her main research interests are within the fields of social media analysis, recommender systems, and information retrieval. Over the last years, she has combined these three fields of research and investigated context-aware music recommender systems based on data retrieved from social media platforms aiming to exploit new sources of information for recommender systems. She was awarded a Postdoctoral Fellowship for Overseas Researchers from the Japan Society for the Promotion of Science allowing her to make a short-term research stay at the Ritsumeikan University in Kyoto.