Networking takes place in our life on a daily basis, using verbal communication (audio) as the key interacting tool. The "Social Networking” (Web based), as we know it, uses written communication (data) but think how much easier it will be if we were able to incorporate voice, into it. And voice, not as a store and forward file but as an interactive and search-able medium.
Building a speech recognition system today is an endeavour that few organizations can afford. It requires massive up-front development costs per language. With the exception of certain tonal languages (e.g., Mandarin and Cantonese), developing a new language involves training a language-agnostic ASR engine with appropriate speech data. This speech data is collected to model the phonetic sounds of the target language and the environment associated with the target applications. Starting from scratch, a new ASR language needs data from around two thousand different speakers. As a rule, this speech data should represent a wide range of accents and environmental conditions. Text-To-Speech [TTS] products require special development efforts for each language / TTS voice offered. In addition to modeling each new language, acoustic inventories (speech audio collections) are a prerequisite. However, current technology is still very fragile and is easily broken by small changes in speaker characteristics, channel characteristics or discourse domain.
Google recently published a technical paper on building large models for machine translation of language prior to the release of its voice software for the iPhone and more recently the BlackBerry. Archived search queries were used to construct a statistical model of the way words are frequently strung together along with a sound analysis model plus a mechanism for linking the basic components of language to actual words. The researchers noted they had trained the system on two trillion “tokens” or words.
So what’s next? Firstly, many people will have to overcome their distrust of communicating with a machine But eventually, mobile phones will be viewed as a “personal assistant”; a speech recognition interface will be standard on any home device and might be used alone or in conjunction with other entry modes such as graphics and text. Also, speech transcription, where the spoken word is converted to text (the Holy Grail that we have aspired to from the beginning), will be within reach. These advances will happen faster in Asia than elsewhere; Japanese, Korean, Mandarin and Cantonese speech applications have been in use for a number of years having originally been developed to compensate for the difficulty of typing on PCs and then on mobile phones.
Several challenging problems remain and according to some leading experts there are enough for a lifetime of research. A voice application to replace a touch-tone menu has a 99.5% accuracy today but a universal language approach only has a success rate of 65% or less. The gap between human Speech Recognition and Automatic Speech Recognition is still very large as, at present, speech recognition requires customization on specific domains to maximize performance. Speech recognition must also adapt to particular channels, such as broadband, mobile phones, and VoIP. New languages are not easily added and it is unlikely that minor languages will ever be automated using current techniques. Every new dimension (language, domain, or channel) is created through speech data collection, transcription, and the art / science of running the data through complex algorithms which is extremely time-consuming, tedious, and expensive.
The ability to recognize virtually any phrase from any individual has been the supreme goal of AI researchers looking for ways to make interactions between man and machine more natural. For every language, the "product quality" is inextricably correlated to the required development effort which is often driven by the potential business opportunity.
This brings us back to the mobile phone. ..
Stephane Attal is CEO of AskKinjo, where we use voice interface in providing Location Based services. AskKinjo services are available in the Greater Toronto Area (GTA). Please visit us at www.AskKinjo.com



Your post is indeed very informative. These days not only amateur webmasters are doing internet marketing but also young grad students are promoting their less professional sites as their practice and projects. Articles on starting an internet marketing business are available on the internet easily.
Posted by: Internet Marketing | October 28, 2009 at 02:24 AM