Spatial Audio Technology

Spatial Audio: From Recording to Reproduction

Yamaha has been involved in the research, design, and implementation of spatial audio for many years. In recent years, the demand for spatial audio has been increasing due to the spread of video content and games that allow users to experience fully immersive spatial sound.

Yamaha has developed comprehensive spatial audio technologies for the recording, editing, and playback of sound; we call them ViReal ^*1. Recording and reproduction of spatial audio can be classified into three types of technology. They are channel-based systems that record and play back audio using a predetermined spatial arrangement of channels, object-based systems that add information such as position and speed of the sound sources, and scene-based systems that represent the entire sound field as a unified entity. With ViReal, we apply all these methods to develop recording and playback technologies for a variety of purposes and environments.

^*1：ViReal® is a registered trademark of Yamaha Corporation.

We will introduce ViReal Mic, for capturing spatial audio, ViReal for Headphones, a spatial audio processing technology for headphones, ViReal for Speakers, a spatial audio processing technology for loudspeakers, and ViReal Dome, a large-scale playback system.

Audio Capture : Spherical Microphone Array [ ViReal Mic ]

The ViReal Mic is an audio capture device for capturing high-order Ambisonics. It can record audio coming from all directions in space.

High-order Ambisonics is a scene-based method of sound field recording and reproduction that uses the spherical harmonic expansion ^*2 to analyze and reproduce the sound field. First-order spherical harmonic expansions are called Ambisonics, while second-order and higher are called high-order Ambisonics (HOA).

^*2 Spherical Harmonic Expansion: This method describes the shape of an object or a phenomenon on the surface of an object in terms of a combination of simple shapes in spherical coordinates.

ViReal Mic 1

To perform HOA capture, multiple microphone units must be uniformly placed on a spherical surface. The maximum mathematically feasible arrangements based on the Platonic solids, with icosahedrons being the largest. If more microphones are desired, a geodesic arrangement with a split icosahedron is commonly used.

ViReal Mic 1 uses a Fibonacci spiral arrangement of 64 microphones. This arrangement has been found to have fewer errors in the spherical harmonic domain than a geodesic arrangement, allowing for more accurate acquisition of directional components.

The 64 channels of audio signals that are captured can be transmitted simultaneously via a single LAN cable using Dante ^*3. Only one microphone and one cable are required for each unit. Compared to conventional audio capture equipment, this system provides a compact audio capture environment and requires only a small amount of installation time, while providing extremely accurate recordings.

^*3 Dante: Audio network technology developed by Audinate that allows the transmission of multiple channels between audio equipment using only one LAN cable.

ViReal Mic 2

For better performance and ease of use, ViReal Mic 2 has been upgraded to a new, fully digital architecture, replacing the analog microphone unit and A/D converter setup of the ViReal Mic 1.

The microphone arrangement uses a spherical t-design, which achieves zero-error in the spherical harmonic domain, improving over ViReal Mic 1. Based on testing, we have confirmed that performance capturing 5th order Ambisonics is the same when using either 60 or 64 microphones, so we have adopted the use of 60 microphones.

The full digital architecture allows separation of the sphere from the Dante interface section, resulting in a very clean design. In addition, since ViReal Mic 2 can be manufactured with less variation in performance than ViReal Mic 1, multiple ViReal Mic units can be used in a variety of spatial audio measurements. We intend to utilize these features to expand our spatial audio measurement solutions and to achieve audio recordings that are richer in spatial information.

ViReal Mic 2, a new model with improved performance

Use in experiments to evaluate the sensitivity of musical instrument sounds

Playback : Spatial Audio Playback Technology for Headphones [ ViReal for Headphones ]

Binaural technology is commonly used to play back spatial audio over headphones. Two microphones inside the ears of a dummy head can be used to record binaural sound. Alternatively, the head-related transfer function (HRTF), which represents the characteristics of the acoustic path from the sound source to both ears of the listener can be used to synthesize it.

The HRTF can be measured by placing a small microphone in each ear of an individual, or it can be calculated from the shape of the person’s head. This acoustic transfer function is then applied to the headphones signal played back to the listener, resulting in a very realistic spatial audio effect. In practice, since it is difficult to measure the HRTF for each individual, standard HRTFs obtained from dummy heads are used in most cases. However, because HRTFs have different properties depending on the shape of a person’s ears and head, some dummy head HRTFs cannot achieve a rich spatial audio effect.

Therefore, we have developed our own HRTF database by collecting, analyzing, and synthesizing a large number of human ear and head shapes. ViReal for Headphones applies HRTFs from this database, so that each user can experience a realistic spatial effect without the need to measure their own HRTF. A high resolution of more than 2,000 directions allows the accurate positioning and smooth movement of sound sources. With this technology alone, users can easily enjoy spatial audio with any pair of headphones or earphones.

Synthesizing optimal HRTF data based on ear and head shape data of many people

Accurate reproduction of sound source localization and sound source movement using synthesized HRTF data

Playback : Spatial Audio Playback Technology for Speakers [ ViReal for Speakers ]

The best method to present spatial audio depends on the purpose and requirements of each application. Depending on the environment, people, and sound contents,　either channel-based, object-based, or scene-based methods must be selected. Similarly, the choice of equipment, recording and playback methods also varies.

ViReal for Speakers implements a variety of playback processing technologies to meet different requirements. For example, three-dimensional panning (adjustment of sound volume and timing) is used to reproduce a sound image at its desired position, and HOA processing is used to present sounds arriving from all directions.

Super Surround Theater, a facility for experiencing spatial audio in the corporate museum

Spatial audio laboratory for multi-channel audio playback

Playback: 122-Channel Spherical Loudspeaker Array [ ViReal Dome ]

Yamaha has an experimental facility that features a large spherical loudspeaker array with 122 independent channels. This system is used to validate a variety of technologies that are under development.

The loudspeakers are placed on 122 vertices generated by subdividing an icosahedron, 91 of which are suspended in　a frame and 31 of which are on the floor. Although not structurally a perfect sphere, it can be treated as a virtual sphere by calibrating the output of each speaker (adjusting their volume and timing relative to each other). All speakers are connected to a Dante network, which can send 122 channels of audio signals simultaneously over a single LAN cable. This allows the system to faithfully reproduce anything from individual sound sources to rich sound fields. It can be used with a variety of playback methods such as HOA, and helps us validate new audio capture and playback techniques.

A system of this magnitude is unparalleled anywhere in the world. Our great strength is that we can utilize this system to develop new technologies and further improve the accuracy of existing ones.

Providing Technology for Our Products and Products from Other Companies

In addition to products from Yamaha, ViReal technology is also used in products from other companies.

・AFC Image (Active Field Control Image)
This is a sound image control system from Yamaha. It uses the ViReal for Speakers object-based processing technology and ViReal for Headphones technology.

・Super Surround Theater
This 108.6-channel theater, installed in the “Innovation Road” corporate museum on the premises of Yamaha’s headquarters, offers a variety of spatial audio content produced by the ViReal series. For example, content from a concert by the Yamaha Symphonic Band was captured with ViReal Mic and multiple microphones, and used in combination with ViReal for Speakers’ HOA processing and three-dimensional panning.

・Capcom “Resident Evil” and “Monster Hunter”
ViReal for Headphone technology was used in “Resident Evil 7: Biohazard Gold Edition” and “Monster Hunter: World”, released in 2018 by Capcom Co., Ltd.

・Sound xR Core
This is Yamaha’s virtual spatial audio solution for headphones and earphones. It uses ViReal for Headphones technology.