VOCALOID is a vocal-synthesizing software that enables song writers to
generate authentic-sounding singing on their PCs by simply typing in
the lyrics and music notes of their compositions. The software
synthesizes the sound from “vocal libraries” of recordings of actual
singers, retaining the vocal qualities of the original singing voices
to reproduce realistic vocals. The software also features simple
commands that enable users to add expressive effects – such as vibrato
and pitch bends – to their synthesized vocals. Additional releases to
the range of vocal libraries currently available will broaden the range
of voices and singing styles that can be generated by VOCALOID.
VOCALOID can generate singing in Japanese and English. It runs on
VOCALOID software is on sale bundled with VOCALOID
libraries from soundware companies under licence from Yamaha . Yamaha
is not currently planning to sell the actual VOCALOID software engine
as a dedicated product.
An authentic voice can be generated by simply inputting
words and notes.
It synthesizes singing exactly by simply typing in words and
music notes on your PC. The sung vocal can be output as a Wav file, so it can
be imported to other sequencers and played alongside the accompaniment.
Various styles of singing can be synthesized by adding
further vocal libraries.
By changing database of “Vocaloid Singer Libraries”,
you can synthesize various types of male and female vocals. A number of
soundware developers worldwide will release “vocal libraries” (some of
which actually did) and the VOCALOID software engine is bundled into their
Expressive effects can easily be added to the synthesized
vocals in a simple operation.
Expressive effects can easily be added to the synthesized
vocals, such as vibrato, inflection and tremolo, which are added by using the
simple GUI commands, resulting in the creation of fully expressive songs.
Both Japanese and English vocals can be synthesized.
English and Japanese languages “vocal libraries” are
VOCALOID uses Frequency-Domain Singing Articulation Splicing
and Shaping, a vocal (singing-voice) synthesizing system developed by Yamaha
after lengthy reseach and development of signal processing in frequency domains.
With this system, the “singing articulations” (collections of voice
snippets, such as syllables and snippets of vocal expression variations, like
vibrato) needed to reproduce vocals, are collected from custom-produced
recordings of professional singers and put into a database after conversion
into frequency domains. To synthesize vocal parts, the system retrieves data
consisting of voice snippets, applies pitch conversion, then splices and shapes
them to form the words of a song as typed by the user. As this processing is
done at the frequency-domain level, pitch can be easily changed according to
the specified melody, and the voice snippets can be spliced in a way that
reproduces smooth-flowing words. For example, “sai” of “saita”is
produced by using two snippets “sa” and “ai”. Because the
timbre of the vowels “a” and “ai” are usually different to
each other, if these sounds were simply spliced together they would not sound
right to the listener. To solve this problem, smooth processing of the splicing
facility within the frequency domain is carried out, resulting in a smoother
Fig1.Processing within the Frequency Domain
In addition, conversion within the frequency domain makes it
easy to control pitch and timbre in order to get expressive effects, such as
vibrato. VOCALOID enables the reproduction of the actual pitch-time and timbre
variations (accurately emulating the way they occured in the real singer’s
original vibrato) by storing the timbre/time variation of pitch and vibrato
from the real singer’s voice, into a database and applying it at the point of
Fig2.Process at Vibrato
VOCALOID consists of a score editor which handles the scale,
song-word, and expression processing; the Vocal Sound Generator (the engine
that synthesizes the vocals); and vocal libraries (each comprised of a
pronunciation database and a timbre database) for each virtual singer. The “vocal
libraries” have been released by soundware developers who entered into a
license agreement with Yamaha, and more libraries are coming.
Fig 3. VOCALOID System Configuration
(C)2003-2005 by YAMAHA Corporation. All rights reserved. All brand
names and product names
are trademarks or registered trademarks of
their respective companies. ‘Kando’ (is a Japanese word that) signifies
an inspired state of mind.
Yamaha announced its development in 2003 and in January 2004 the
first application software product was launched. It was not released as
a Yamaha product, but a software package of Vocaloid Singer Libraries
was developed by third party licensees which included Yamaha’s Vocaloid
software. Leon, Lola, and Miriam have been released from Zero-G
Limited, UK, while Meiko and Kaito have been released from Crypton
Future Media, Japan.
In January 2007, Yamaha announced a new version of the software
engine, Vocaloid2, with various major improvements in usability and
synthesis quality. Zero-G and others have announced plans to release
products powered by the new software engine in 2007. PowerFX have
released the first English package that is powered by Vocaloid2 named
Sweet Ann. Prima was released in the UK. Crypton has followed and
announced a series of character Vocaloid2 packages to be released, the
first being Hatsune Miku. The second package Kagamine Rin/Len was
released on December 27,2007 while a third will be sometime in 2008.
Character Vocal Series
The Character Vocal Series is a computer music program that synthesizes singing in Japanese. Developed by Crypton
Future Media, it utilizes Yamaha’s Vocaloid2 technology with specially
recorded vocals of voice actors. To create a song, the user must input
the melody and lyrics. A piano roll type interface is used to input the
melody and the lyrics can be entered on each note. The software can
change the stress of the pronunciations, add effects such as vibrato,
or change the dynamics and tone of the voice.
The series is intended for professional musicians as well as light
computer music users. The programmed vocals are designed to sound like
an idol singer from the future. According to Crypton, because
professional singers refused to provide singing data, in fear that the
software might create their singing voice’s clones, Crypton changed
their focus from imitating certain singers to creating characteristic
vocals. This change of focus led to sampling vocals of voice actors.
Each vocal is given an anime-type character with specifications on
age, height, weight, and musical forte (as in the type of music, range
and tempo). The characters of the first two installments of the series
are created by illustrator Kei.
Any rights or obligations arising from the vocals created by the software belong to the software user. Just like any music synthesizer,
the software is treated as a musical instrument and the vocals as sound. Under the term of license, the Character Vocal Series software can be used to create vocals for commercial or non commercial use, as long as the vocals do not offend public policy. In other words, the user is bound under the term of license with Crypton not to synthesize derogatory or disturbing lyrics. On the other hand, copyrights to the mascot image and name belong to Crypton. Under the term of license, a
user cannot commercially distribute a vocal as a song sung by the character, nor use the mascot image on commercial products, without Crypton’s consent.
Hatsune Miku (初音ミク, Hatsune Miku?) is the first installment in the Vocaloid
Character Vocal Series released on August 31, 2007. The name of the title and
the character of the software was chosen by combining Hatsu (初, First?), Ne (音, Sound?), and Miku (未来, Future?). The data for the
voice was created by actually sampling the voice of Saki Fujita, a
Japanese voice actress. Unlike general purpose speech synthesizers, the
software is tuned to create J-pop songs commonly heard in anime, but it is
possible to create songs from other genres.
Nico Nico Douga played a fundamental role in the recognition and popularity of the software. Soon after the release of the software, users of Nico Nico Douga started posting videos with songs created by the software. According
to Crypton, a popular video with a comically altered software mascot holding a leek, singing Ievan Polkka, presented multifarious possibilities of applying the software in multimedia content creation. As the recognition and popularity of the software grew, Nico Nico Douga became a place for collaborate content creation. Popular original songs written by a user would generate
illustrations, animation in 2D and 3D, and remixes by other users. Other creators would show their unfinished work and ask for ideas.
On October 18, 2007, an Internet BBS website reported Hatsune
Miku was suspected to be victim of censorship by Google and Yahoo!, since
images of Miku did not show up on the image searches. Google and Yahoo
denied any censorship on their part, blaming the missing images on a bug that
does not only affect “Hatsune Miku” but other search keywords as well.
Both companies expressed a willingness to fix the problem as soon as possible.Images of Miku were relisted on Yahoo on October 19.
A Hatsune Miku manga called Maker Hikōshiki Hatsune Mix
began serialization in the Japanese manga magazine Comic Rush on November 26, 2007,
published by Jive. The manga is drawn by Kei, the original character designer
for Hatsune Miku. A second manga called Hachune Miku no Nichijō Roipara! drawn
by Ontama began serialization in the manga magazine Comp Ace on December 26, 2007,
published by Kadokawa Shoten.
Released on December 27, 2007, Kagamine Rin/Len (鏡音リン・レン, Kagamine Rin/Len?) is
the second installment of the Vocaloid Character Vocal Series. According to
Vocaloid’s official blog, the package includes two voice banks: one of a girl’s
(Rin) and one of a boy’s (Len), both provided by the seiyū Asami Shimoda. Despite
the double voice banks, the package still sells at the same price as Hatsune