Researcher:Yuta Kusaka
| Division | Music Informatics Group, Advanced Technology Research Department, Research and Development Division |
|---|---|
| Details of Work | Research and Development of Music Information Processing Technology |
| Field of Study | Control Engineering |
| Year Joined Yamaha | 2021 |
My Current Job
I am a member of a team developing advanced technologies and am involved in research in the field of music information retrieval (MIR). My past projects include music source separation, a technique for extracting audio stems from a mixture of various instruments and vocals, and musical accompaniment recommendation systems for playing electronic musical instruments. Currently, I focus on automatic music transcription for real-world applications.
Automatic Music Transcription (AMT) is the task of estimating the musical notation of an instrumental performance from its audio. AMT allows for a conversion of instrumental audio into MIDI, a digital format that can be interpreted by a computer. MIDI is a general-purpose data format that is used as an input for various AI technologies developed by Yamaha.
AMT enables automatic accompaniment for instruments that lack a mechanism for exporting a performance as MIDI, such as acoustic pianos and guitars. This allows a solo musician to play as part of an ensemble, or to record their performances in MIDI for practice purposes. I believe that AMT has significant potential to expand the scope of application of Yamaha’s music AI.
Bringing Music AI to Users
The difficult but interesting part of my research is how to deploy machine learning models in a real-world environment. Many deep learning-based AMT models are trained solely on instrumental sounds recorded in a controlled environment. For example, in the case of a piano, performances are recorded in an environment with low environmental noise, using the same piano, room, and microphone. However, a model trained only on such data recorded under specific conditions often does not perform well in a real-world environment, resulting in missing or false positive notes. This is a problem in machine learning called domain shift, caused by the difference in data distribution between the training and deployment data. I am currently researching methods to mitigate domain shift in AMT. I feel a sense of accomplishment when a model implemented as an application works well in a real-world environment.
One of the key differences between research in industry and academia is a focus on real-world applications. While it is important to aim for cutting-edge performance and tackle challenging tasks, I also consider how my research can add value to the experiences of users who enjoy music.
Participation in International Conferences and Presentation of Research
Yamaha has been a sponsor of the International Society for Music Information Retrieval (ISMIR), a leading academic conference in the MIR field, for several years. We have participated in ISMIR for recruitment purposes and to network with researchers. At ISMIR 2023 held in Italy, Yamaha and Queen Mary University of London presented a poster on our joint research, demonstrating an implementation of piano transcription models as a use case for the published dataset.
At the European Signal Processing Conference (EUSIPCO) held in France in August 2024, I presented our research on real-time piano transcription. I was nervous because it was my first oral presentation at an international conference. However, thanks to my supervisor’s support in writing the paper and preparing materials, as well as an in-house English conversation course, I was able to deliver a successful presentation. After the session, several people commented that my presentation was very interesting, and I had in-depth discussions about my research with the attendees. I also had the chance to enjoy the local cuisine and explore the area around the venue, which increased my motivation for future research.
Working Environment
At Yamaha’s Research & Development Division, we are encouraged to work on our own initiative. While there are top-down research themes, we can also work based on our own interests or questions. The team members are not only well-versed in research fields but also have extensive experience playing musical instruments. This unique combination enables us to deepen our research from multiple perspectives, including theoretical perspectives such as performance science and from a performer’s point of view. Moreover, we are increasingly active in publishing our research to international conferences and journals. This open environment is quite different from the “closed” image of research at the company I had as a student.
Living in Hamamatsu
Hamamatsu, where Yamaha’s headquarters is located, is known as the “City of Music,” but it is also famous as the “Home of Motorcycles.” I obtained a motorcycle license during my first year with the company and enjoy touring on weekends. Hamamatsu and its surrounding areas are rich in nature, with popular touring spots like Okumikawa and the Oi River. A little further out, you can easily reach Fuji and Izu to the east, Gifu and Nagano to the north, and the Kii Peninsula to the west. Although I spend much of my work time in front of the computer, I feel refreshed when I hop on my motorcycle to enjoy delicious food and spectacular scenery.