AI Music Ensemble Technology

AI generates performances tailored to people

This is an AI technology that allows a machine musician to play together with human musicians in a music ensemble. It analyzes human performance in real-time to generate a musical performance that is both synchronized and matches the musical expression of the human player.

The AI compares the sound of the human performance with the music score data of the piece being played, and analyzes where in the score the human performance is currently playing, at what speed, and with what kind of musical expression. In response, it can generate a performance that matches the timing of the human performance by predicting the performance a little further ahead.

AI can analyze a variety of input sources, including a piano, a musical ensemble of acoustic instruments like violins and flutes, multiple players, and orchestras. The results of the analysis of the human performance can be used to accompaniment on piano, or to control various devices such as lighting and video for a performance.

How it Works

The AI analyzes human performance from four perspectives.

1. Multimodal Analysis of Sound and Movement

The AI uses microphones to listen to the sound being played, and analyzes various information pertaining to music performance. It can also capture the anticipatory movements of the performer with a camera to better predict the timing of the music performance.

2. Inference of Timing Fluctuations and Expression that is Robust against Mistakes

AI can infer the tempo and expression of a human performer by comparing the music score with the human performance.

Since human performance includes both intentional and unintentional performance fluctuations, such as tempo fluctuation and errors, it is necessary to make comparisons while taking into account how humans vary their performance when playing from a music score. Therefore, AI learns typical mistakes and deviations from the score in human performances by learning how the music score maps to the human performance. This will enable the system to take appropriate action when a performer stumbles or makes a mistake.

AI can also recognize expression based on articulation and dynamics, such as “lively” or “quiet,” by learning the various performance expressions of a piece of music in advance. Thus, it can generate accompaniment according to the overall expression of the human performance.

In addition, if the AI is trained specifically for a particular individual’s performance, it will be able to learn that person’s unique tempo fluctuations and tendency to make mistakes.


3. Timing Adjustment

Coordination is at the heart of a music ensemble. When listening to a human ensemble, at first glance it may sound as if everyone is controlling the tempo with an equal contribution. In reality, some musical parts have more control over the others, and the degree of control changes according to the musical context of each part. This kind of subtle delegation of control of timing is essential for coordination in a music ensemble that fits the musical context.

The AI learns this kind of timing adjustment by analyzing how humans perform with each other and the corresponding score data. By learning the tendencies of the human performers, such as when they concede to the other performers and when they take the lead in the performance, it will be able to correct the timing musically according to the context of the score.

Application Examples

Project Using this Technology: Otomai-no-Shirabe – Transcending Time and Space

On Thursday, May 19, 2016, we used “AI Music Ensemble Technology” to reproduce the performance of legendary pianist Sviatoslav Richter at the concert “Otomai-no-Shirabe: Transcendental Time and Space” held at the Sogakudo of Tokyo University of the Arts.

This made it possible for the Berliner Philharmoniker’s Scharoun Ensemble to perform with legendary pianist Sviatoslav Richter on a stage that transcends time and space.

Projects Using this Technology: Project Sekai Piano

A “AI Music Ensemble Technology” was used for the “Project Sekai Piano” installation, which was on exhibit from Friday, March 26 to Sunday, June 27, 2021. The “Project Sekai Piano” project allows virtual singers Hatsune Miku and Hoshino Ichika (CV: Noguchi Ruriko) to sing along with your performance of “Senbonzakura” (lyrics and music by Kurousa P) and “Aokukakero!”(lyrics and music by Marasy).


International Conference

Akira Maezawa. “Using AI to inspire musicians.” AIxMusic Industry Application Oriented Research Session, Ars Electronica Festival. 2019.
Akira Maezawa, Kazuhiko Yamamoto, Takuya Fujishima. “Rendering Music Performance With Interpretation Variations Using Conditional Variational RNN.” Proceedings of the Internal Conference on Music Information Retrieval (ISMIR), pp. 855-861, 2019.
Akira Maezawa. “Deep Linear Autoregressive Model for Interpretable Prediction of Expressive Tempo.” Proceedings of the Sound and Music Computing Conference (SMC), pp. 364-371, 2019.
Bochen Li, Akira Maezawa, and Zhiyao Duan, “Skeleton Plays Piano: Online Generation of Pianist Body Movements from MIDI Performance”, in Proc. International Society for Music Information Retrieval (ISMIR), 2018.
Akira Maezawa, Kazuhiko Yamamoto. “MuEns: A Multimodal Human-Machine Music Ensemble for Live Concert Performance.” Proceedings of the 2017 ACM CHI Conference on Human Factors in Computing Systems (CHI ’17; 25% acceptance rate), pp. 4290–4301, 2017.
Akira Maezawa, Kazuhiko Yamamoto, “Automatic music accompaniment based on audio-visual score following” in ISMIR 2016 Late-breaking
Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno. “Unified inter- and intra-recording duration model for multiple music audio alignment” Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Related Items

Technical Exhibits

Related Movie