

Text-To-Speech Foreign Language
Lima, Peru
AT&T has released a demo of their Natural Voices text-to-speech voice synthesizer, which has the ability to speak text using linguistic rules from various dialects. AT&T is a large telephone company, and I imagine they are using this technology to continue the development of automated systems dialog prompts (though the applications reach as far as in-dash car navigation and screen-readers for the visually disabled).
When I announced my son's birth, I mentioned the various pronunciations of his name, and which my girlfriend and I have settled on calling him by. This demo does a fun job of taking names and pronouncing them under different languages:
Natural Voices TTS speaking 'Aidric Heimburger' in U.S. English, UK English, Spanish, German, and French.
Since I'm sporting a wildly Germanic family name, and the origins of 'Aidric' are undoubtedly from the same region, it's of little surprise his name sounds so good in its original tongue (although we choose to pronounce it as you would in English).
I do find it interesting that German found its way onto the list. I read that it's a popular second language (I took it back in high school myself), though most believe it has business applications only (as most German-speakers have a propensity to have studied English for many, many years).
The United Nations has six official languages under its charter: Arabic, Chinese, English, French, Russian, and Spanish. A global overlay of these languages looks something like this:
If you're looking to learn a foreign language or two, I'd shoot for one of those.
More about Text-To-Speech:
Text-To-Speech (TTS) is often described as two conceptual stages. In the first stage, it decides how the text should be spoken, that is, how each word should be pronounced, what length and pitch each phoneme should have, etc. In the second stage, the system does its best to create audio that matches the specifications produced by stage one.
TTS software has little or no understanding of the text being read. It uses rules, lists, dictionaries, etc. to make very sophisticated guesses about how a piece of text should be read. While general performance can be quite good, some decisions are intrinsically hard to make without some level of understanding. For example, the word "bass" in the phrases "bass drum" or "bass boat". Intonation depends in many cases on the writer's intention, which often cannot be inferred in short texts even by human readers. As a result, TTS systems will occasionally make mistakes and can be fooled by carefully constructed texts.
The type of TTS we do is called a "concatenative" system, meaning that we record a human speaker to make a voice database. We re-use small chunks of the recordings to create new sentences containing words that were never recorded. Further, we do "unit selection" synthesis. This means that we use large voice databases and do clever searches on-the-fly to find chunks in the voice database that best match the requested sentences.
Comments:
Note: Comments are open to everyone. To reduce spam and reward regular contributors, only submissions from first-time commenters and/or those containing hyperlinks are moderated, and will appear after approval. Hateful or off-topic remarks are subject to pruning. Your e-mail address will never be publicly disclosed or abused.
Anguilla
Argentina
Belize
Bolivia
Brazil
Bulgaria
Cambodia
Chile
Colombia
Costa Rica
Dominican Republic
Ecuador
Egypt
El Salvador
Grenada
Guatemala
Honduras
Hungary
Indonesia
Israel & Palestinian Terr.
Jordan
Laos
Lebanon
Malaysia
Mexico
Nicaragua
Panama
Philippines
Poland
Puerto Rico
Romania
Singapore
Slovakia
St. Martin
St. Vincent & Grenadines
Syria
Thailand
Trinidad & Tobago
Turkey
United States
Uruguay
Venezuela
Vietnam
Jennifer
January 17th, 2008
We want to hear more about your experiences as a new father! Thoughts…feelings…diapers…