If you are like me, your experience with most of the voice recognition systems offered by the airlines devolves quickly to a frustrated plea for human help. Still, while voice interaction is still not at the point HAL 9000 promised by 1992, there's little doubt that it has come a long way in the last 10-15 years.
Today, Yahoo announced the broad availability of voice-enabled search on the oneSearch mobile platform and a concurrent investment in vlingo, the partner providing the voice technology:
With the voice-enabled version of Yahoo! oneSearch, consumers can search for anything, including flight numbers, locations, Web site names, local restaurants, and more, by simply speaking...Whereas most mobile voice recognition systems are specific to vertical categories such as local listings, Yahoo! oneSearch with Voice lets consumers perform "wide open" searches - returning relevant results for practically every kind of query.
It continues a buzz of activity and investment in voice-recognition and text-to-speech that was capped by SpinVox's recent $100m round and $500m valuation, but also includes some other notable events:
- Nuance announced voicemail-to-email transcription on April 1
- YouMail added voicemail-to-email transcription on April 1
- Jott added the ability to reply to email and sms by voice on March 29 (computer world)
- Microsoft showcased audible text messaging in Sync
There's a palpable mix of excitement and skepticism about the buzz. The virtual receptionist Wildfire that seduced the valley in 1995 found a respectable exit to Orange / France Telecom in 2000, but the service had only 10,000 users when it was discontinued in 2005. And TellMe did find $100 million in annual revenue powering corporate voice interaction systems for companies like FedEx and a $800+ million exit to Microsoft in March 2007, but the technology never seemed to deliver on a fraction of what HAL promised. And Nuance has built a $4 billion market cap, but they've rolled up almost every company in the market to get there.
Today, with a new influx of investment and interest, I'm bullish on the applications that deliver a specific value in a narrow context, bearish on the broader applications that seek to be your voice window to the world.
Despite some persistent limitations and frustrations, several of the narrow and focused applications look like they will clear the bar for acceptability and be broadly adopted:
- SpinVox, Simulscribe and Nuance voice-mail to email/sms transcription
- Goog-411
- Garmin voice directions
But the broader applications (like oneSeach by voice, vlingo, VoiceOnTheGo) will either need to bite off small pieces of the puzzle (OneSearch and vlingo do this by limiting voice to the input and presenting results visually) or I expect that they'll struggle.
As usual, it's all about the user experience, and one of the biggest challenges is that the visual UIs that are
defacto standards are improving so rapidly, especially on a 3G iPhone. We could use GOOG-411 while
driving, but the predictive power of Google SMS makes it my preferred
choice. We could use oneSearch by voice for local search, but
iGoogle maps for the Blackberry delivers more information and more value.
Bottom line: Despite large investments in voice recognition and text-to-speech applications in the next 18 months, and many product new launches by start-ups and established players, only a small set of narrowly focused services that hurdle a high UE bar and deliver targeted value will thrive.


The application of voice noted by you in your previous post is more of luxury rather than need. With the advance of technology in voice arena, many companies want to come up with cool ideas like getting directions by speaking, Email transcription etc. Now I consider them luxury applications. One can do without it.
An application calling you to read out status and get your next set of commands. System monitoring softwares can send out alerts, with the ability to accept spoken commands. Now such applications are essential and not a luxury.
Currently voice is not being used as a feature. What that means is one can add voice as a feature to their enterprise app. so it will be an extra channel to communicate. This channel will enable applications to place a call to someone and speak out the status, and even accept commands which can then be used to derive further logic. This kind of feature is crucial for the mobile workforce. Instead of sending email (which is still one way communication) one can use voice (bi-directional communication) and get the work done.
Posted by: Kris Subramanian | December 16, 2008 at 07:04 AM