Voice drives new software paradigm

Bob O'Donnell

Posts: 79   +1
Staff member

A great deal has been written recently on the growing importance of voice-driven computing devices, such as Amazon’s Echo, Google Home and others like it. At the same time, there’s been a long-held belief by many in the industry that software innovations will be the key drivers in moving the tech industry forward (“software eats the world”—as VC Marc Andreesen famously touted over 5 years ago).

The combination of these two—software for voice-based computing—would, therefore, seem to be at the very apex of tech industry developments. Indeed, there are many companies now doing cutting-edge work to create new types of software for these very different kinds of computing devices.

The problem is, expectations for this kind of software seems to be quickly surpassing reality. Just this week, in fact, there were several intriguing stories related to a new study which found that usage and retention rates were very low for add-on Alexa “skills”, and similar voice-based apps for the Google Assistant platform running inside Google Home.

Essentially, the takeaway from the study was that outside of the core functionality of what was included in the device, very few new add-on apps showed much potential. The implication, of course, is that maybe voice-based computing isn’t such a great opportunity after all.

While it’s easy to see how people could come to that conclusion, I believe it’s based on an incorrect way of looking at the results and thinking about the potential for these devices. The real problem is that people are trying to apply the business model and perspective of writing apps for mobile phones to these new kinds of devices. In this new world of voice-driven computing, that model will not work.

Of course, it’s common for people to apply old rules to new situations; that’s the easy way to do it. Remember, there was a time in the early days of smartphones when people didn’t really grasp the idea of mobile apps, because they were used to the large, monolithic applications that were found on PCs. Running big applications on tiny screens with what, at the time, were very underpowered mobile CPUs, didn’t make much sense.

In a conceptually similar way, we need to realize that smart speakers and other voice-driven computing devices are not just smartphones without a screen—they are very different animals with very different types of software requirements. Not all of these requirements are entirely clear yet—that’s the fun of trying to figure out what a new type of computing paradigm brings with it—but it shouldn’t be surprising to anyone that people aren’t going to proactively seek out software add-ons that don’t offer incredibly obvious value.

Plus, without the benefit of a screen, people can’t remember too wide a range of keywords to “trigger” these applications. Common sense suggests that the total possible number of “skills” that can be added to a device is going to be extremely limited. Finally, and probably most importantly, the whole idea of adding applications to a voice-based personal assistant is a difficult thing for many people to grasp. After all, the whole concept of an intelligent assistant is that you should be able to converse with it and it should understand what you request. The concept of “filling in holes” in its understanding (or even personality!) is going to be a tough one to overcome. People want a voice-based interaction to be natural and to work. Period. The company that can best succeed on that front will have a clear advantage.

Despite these concerns, that doesn’t mean the opportunity for voice-based computing devices will be small, but it probably does mean there won’t be a very large “skills” economy. Most of the capability is going to have to be delivered by the core device provider and most of the opportunity for revenue-generating services will likely come from the same company. In other words, owning the platform is going to be much more important for these devices than it was for smartphones, and companies need to think (and plan) accordingly.

Existing business models and existing means for understanding the role that technologies play don’t always transfer to new environments, and new rules for voice-based computing still need to be developed.

That doesn’t mean there isn’t any opportunity for add-ons, however. Key services like music streaming, on-demand economy requests, and voice-based usage or configuration of key smart home hardware add-ons, for example, all seem like clearly valuable and viable capabilities that customers will be willing to add on to their devices. In each of those cases, it’s also important to realize that the software isn’t likely going to represent a revenue opportunity of its own; simply a means of accessing an existing service or piece of hardware.

New types of computing models take years to really enter the mainstream, and we’re arguably still in the early innings when it comes to voice-driven interfaces. But, it’s important to realize that existing business models and existing means for understanding the role that technologies play don’t always transfer to new environments, and new rules for voice-based computing still need to be developed.

Image credit: Cnet

Permalink to story.

 
"Voice drives new software paradigm".
Hearing voices also drove Son of Sam, or so I've heard... Oh no! Now I'm hearing them too. ;)
 
I'm the world's most incompetent smart-phone typist - and often enough android's fill-in-the-blanks tricks actually make me worse - I could fill books with the all the *****ic things I've managed to write by accident.

Also, I've had good results with voice recognition, so long as the phone doesn't slot me into the wrong language (I switch between German and English). But still: Every time I update my device, I try it out for a while, and then, gradually, stop. In part, I stop because it's difficult to fix the messes when it does make mistakes, but most of all because there is something profoundly weird about dictating stuff to your phone. In a public context, I just get more sidelong-eyeball time than I want to generate ...
 
The big problem with most speech-driven technology is that it isn't direct speech recognition (DSR). That is, the spoken input is sent via a wireless network to a server which does the translation to zeros and ones and returns the resulting code to the originating device.
IF the network is efficient, no problem. However, substandard speech transmission to the server is often a huge problem (garbage in-garbage out). With less than stellar equipment and/or conditions (cheap microphones, low-fidelity data links, noisy environments etc.), poor results are inevitable. dictating into cell phones is usually a total crap-shoot.
With decent-quality mikes starting at $150 (Sennheiser noise-cancelling headset) the equation starts to make some sense but this level of hardware isn't available on most smartphones. And while most current smart phones have hardware capable of running decent direct speech recognition (DSR) software onboard, most folks are content to thumb-tap a text.

Edited: 5.25.19 to change "discrete" DSR to "direct" (DSR) as per current IT industry standards and correct 1 misspelling.
Cheers!
Wizwill
 
Last edited:
Back