Researcher:Hideki Kenmochi

Division Planning Group, R&D Planning Department
Details of Work Planning and management of internal technical exchange events, coordination of technical development within the audio business, promotion of collaboration with external parties, etc.
Field of Study Electrical and Electronic Engineering (Bachelor's and Master's degrees)
Year Joined Yamaha 1993

Major Career Highlights

I joined Yamaha in 1993 and initially worked in research and development related to acoustics. In 1996, I was seconded to L&H Japan (a joint venture between L&H of Belgium and Yamaha) to work on the development of text-to-speech synthesis technology, and returned to Yamaha in 1999. I then worked on the development of music and audio signal processing technology, and developed VOCALOID. I am honored to be known as the “father of VOCALOID.” Currently based in the Tokyo metropolitan area, I am responsible for internal communications, external collaboration, and management.

Known as the Father of VOCALOID
In 1996, when I was seconded to L&H Japan At L&H in Belgium

Note: What is VOCALOID?

VOCALOID is a technology that allows you to easily synthesize singing voices by inputting melodies and lyrics. By combining “the ability to freely create human-like singing voices,” “characters,” and “creative culture,” we have democratized the act of making music itself and brought about a new revolution in music culture.

VOCALOID products from various companies
VOCALOID operation screen

Development of VOCALOID

In 2000, it was possible to reproduce the sound of musical instruments to a certain extent using digital devices, but it was not possible to reproduce singing with lyrics at a sufficient level of quality. My boss at the time, Shigeki Fujii, predicted that there would definitely be a demand for singing synthesis in the future, so he launched a singing synthesis development project. I had been involved in the development of text-to-speech synthesis technology and had knowledge and insight into phonetics, so I joined the development team. Development was conducted jointly by Pompeu Fabra University in Barcelona, Spain, and four engineers from Yamaha, including myself, with the aim of reproducing the human voice with high musical quality suitable for practical use.

At the time of development in 2000
Development team at the time

At the beginning of development, the singing voices were still mechanical, and there were issues such as lyrics being difficult to understand and consonants being difficult to reproduce. We continued development using a synthesis method that focused on the parts where phonemes change to the next phoneme, connecting the parts where phonemes change and the parts where sounds are prolonged. After five months, we were finally able to reproduce singing voices that included consonants. I was very happy that we were able to reproduce smooth consonants despite the extreme difficulty.

From there, we continued to improve the core components of the singing synthesis system and develop the input interface, and after two years, we created a prototype. We considered commercializing this prototype within Yamaha, but unfortunately we had to give up on the idea. However, believing that there must be a demand for singing voice generation somewhere, I began searching for external companies on my own to make the product a reality.

Under such circumstances, we were fortunate to be introduced to Crypton Future Media, with whom we had connections through the ringtone business at the time. When we demonstrated the prototype, they were thankfully very interested in it. I sensed a passionate spirit within Mr. Ito of Crypton, and I was so delighted by this encounter that I wrote about it in my business trip report after returning to the office.

VOCALOID was released in collaboration with Crypton. Furthermore, Zero-G from the UK, who we were introduced to by Crypton, also showed great interest in our company. In 2003, we announced to the public that we were developing a singing synthesis technology called VOCALOID. VOCALOID was announced in February at the Hamamatsu Chamber of Commerce and Industry in Japan, and in March at Musikmesse in Frankfurt, Germany. We then decided to the technology and software license to Crypton and Zero-G.

Musikmesse in Frankfurt, March 2003
Top row, from left: LEON, LOLA, MIRIAM Bottom row, from left: MEIKO, KAITO

Announcement of VOCALOID

A year later, in 2004, ZERO-G released the products “LEON,” “LOLA,” and “MIRIAM,” and Crypton released “MEIKO” in Japan. However, sales were not very good. Looking back now, the synthesized singing voicce still sounded a bit mechanical, and we received some harsh reviews from several magazines. I think that the value of computers singing in and of itself had not yet been sufficiently conveyed.

Just when I was regretting that we hadn’t improved the quality more, the development team was reduced from four to two people.

At that time, I received advice from Koji Niimi, who was our general manager at the time. He said, “Set a goal, even if small, and keep going for a long time.” I took this as a message that it is important to persevere in new fields. (Incidentally, Mr. Niimi was decades ahead of me at my university and graduate school, and also at my high school (Shimizu Higashi High School) and junior high school (Shimizu Dai-nana Junior High School), making him a special person who has followed the same path as me.) New instruments and music do not just appear overnight. I reminded myself that I had people who believed in me, and I set a new goal of having VOCALOID used in 50% of commercial music. With that in mind, I threw myself back into research and development.

I have always believed that it is important to pursue a human-like voice in order to create vocals for commercial music, and from the beginning of development, I thought that “the blending of breath and voice is what makes a voice human.” In the first version, we removed the breath components from the sampled vocals and added them back later, but it still sounded a bit off. Therefore, we continued development in collaboration with Pompeu Fabra University, with which we had been conducting joint research, to see if it was possible to synthesize the substance while retaining the components of breath.

End of August 2007 at Interspeech 2007 (Antwerpen)
From left to right: Masafumi Yoshida (currently in charge of VOCALOID), Kenmochi, Fujii, Niimi

VOCALOID Becomes a Hit

Around that time, Wataru Sasaki joined Crypton and was placed in charge of VOCALOID. Mr. Ito and Mr. Sasaki saw posts on Nico Nico Douga showing the VOCALOID MEIKO singing hit songs, and sensed that there might be a demand for this kind of thing. In the culture of self-published magazines, manga, and novels, where users create and promote their own works, Mr. Ito and Mr. Sasaki thought that there was a need for a character that would make people excited to see what kind of songs they could sing.

They created “Hatsune Miku, a girl from the future with a crystal clear voice.” At first, I had no idea what they were talking about, but after many conversations, I finally understood. I resonated with this concept, so I decided to take a chance on Mr. Ito and Mr. Sasaki and have been supporting them with voice recording and the development of voice bank.

On August 31, 2007, Hatsune Miku was finally released. I was in Antwerpen, Belgium, participating in a research presentation at a conference called Interspeech. Hatsune Miku sold well from the first day, and I remember receiving a call from Mr. Sasaki during the reception at the academic conference. In addition, shortly after its release, I witnessed a flood of videos using VOCALOID being posted on video sharing sites. Seeing the technology I developed being accepted and used around the world, and witnessing it being used to create numerous songs, made me truly feel that continuing this work had been worthwhile.

The first sound from the future, “Hatsune Miku,” was an unprecedented hit and established the VOCALOID culture. If the popularity of video streaming sites and the culture of posting videos had occurred one year later, I don’t think this huge hit would have happened. Furthermore, Yamaha was unable to release VOCALOID on its own, so if we had not met Crypton, the technology might have disappeared without ever being seen by anyone. I believe that the greatest achievement was creating a new future and culture together with the people at Crypton, who had a different way of thinking from Yamaha.

Hatsune Miku (C) CFM
I still go out drinking with Mr. Sasaki from Crypton.

What I Want to Tell Young People

People in my position are often asked to give messages to younger generations. I play in an amateur orchestra in my spare time, so I believe that I know the joy of creating live music and the fun of performing. I now believe that this experience led to the development of VOCALOID in a profound way. I think that my experience of listening to as many kinds of music as possible and feeling their beauty has been put to good use. If I have any advice for young people interested in a career in music, it would be to experience as many different types of music as possible and to appreciate the beauty of music.

In order to reproduce human-like singing voices with VOCALOID, I paid particular attention to parts where the sound changes. I had a vague feeling that the important parts of songs with lyrics were the parts where the sounds changed, and that those parts were what made them human. I think that not only the constant elements but also the temporal changes are the essence of music, and that this feeling has contributed to the development and success of VOCALOID. When development first began, the idea of synthesizing singing voices was considered science fiction. However, I wanted to make it possible for anyone to create “songs”, which have a dual emotional transmission circuit namely “melody” and “words.”

At Yamaha Sound Crossing Shibuya
Playing in a local amateur orchestra (Shimizu Philharmonic Orchestra)

Agriculture and Development

For the past few years, I have been helping my father with his farm work on weekends. Through this experience, I began to think about the similarities between agriculture and development. If you plant seeds, wait for them to sprout, and nurture them properly, as long as there are no problems such as typhoons, I think there is a certain probability that your efforts will be rewarded and you will see results. However, if you neglect to water them every day or fail to take good care of them, they will not grow properly. Even if you do take good care of them, there is no guarantee that your crop will be 100% successful. I think development is similar.

We continued to study VOCALOID seriously, but that alone was not enough to make it a hit, and I don’t think it would have made it into the mainstream. I think it was very important to be able to go outside the company, meet the people at Crypton, and create something new together. You should also venture outside your comfort zone and listen carefully to various kinds of music. I think this is important.