This story was written by Keith Dawson for UBM DeusM’s community Web site Develop in the Cloud, sponsored by AT&T. It is archived here for informational purposes only because the Develop in the Cloud site is no more. This material is Copyright 2012 by UBM DeusM.

Speech Recognition on Mobile

It's available to developers now, but will need middleware to spread widely.

Developers' use of speech recognition on mobile devices is poised to take off. SDKs are here and middleware is coming.

Android has offered speech recognition (SR) in its SDK for some time, but SR got a big boost in visibility when Siri debuted on Apple's iPhone 4S in the fall of 2011. Siri relies on cloud resources to do the heavy lifting of SR, so "she" doesn't operate without a network connection -- the faster the better.

Siri is tightly integrated with iOS and with various Apple-supplied apps, such as telephony, address book, calendar, and maps. At the time the iPhone 4S was introduced, developers wondered when they would get access to the functionality Siri offers via an SDK. They are still waiting. In the meantime Siri has became available on the iPad, a successor phone (the iPhone 5) has been released, and iOS has been updated (to version 6).

Meanwhile, Android was not standing still. A friend of mine bought her first Android phone in November 2011, a month after the iPhone 4S and Siri came out, and she reports that she immediately began testing the first of what eventually became "dozens and dozens" of apps purporting to offer Siri-like functionality. (Here's one of them.) They all used the SR functions in the Android SDK and, like Siri on iOS, relied on the cloud (Google's in this case) to do the hard work of figuring out what human speech means.

SR SDKs come to iOS
When developer-accessible SDKs appeared on iOS, it was not Apple who provided them. Nuance, the powerhouse behind Dragon Naturally Speaking and other SR products, last August announced the availability of Nina. Nina is a speech-recognition SDK that, like Apple's and Google's, relies on a cloud connection. Nuance will only sell this product to "large enterprise organizations implementing mobile customer-service apps," and says explicitly that Nina is "not available for general consumer app development."

If you are working on lowly consumer-focused apps, Nuance sends you to the Dragon Mobile SDK. There is a free tier, as well as a couple of enterprise ones, but I could not determine from my reading whether you can actually deploy an app using Dragon speech recognition without paying Nuance a little something. I also couldn't locate enough detail on the SDK to learn whether or not it is cloud-dependent.

Going wide
Business2community.com has a roundup by John Moore of some of the developments in mobile speech. Moore quotes a Nuance spokesman as saying that more than 13,000 developers have used the Nina SDK (that's a lot of mobile customer-service apps).

Moore's article goes on to speculate as to what it will take for mobile SR to become more widespread. He quotes Chris Silva, mobile analyst at Altimeter Group, voicing the opinion that libraries and middleware are going to have to emerge to help developers by encapsulating speech recognition's complexity. Silva makes an analogy with the introduction of mobile push-notification technology, which only took off after Urban Airship developed a simplifying API for push on iOS, Android, and Blackberry. "It took Airship to provide a set of libraries and tools to harness the new interaction methods and the same thing will happen with voice," said Silva, as quoted by Moore.

Have you experimented with mobile speech recognition? Please share your experiences below.