Opinion: The evolution of smart speakers

Bob O'Donnell

Posts: 81   +1
Staff member

For a relatively nascent product category, smart speakers like Amazon Echo and Google Home are already seeing a huge influx of attention from both consumers and potential competitors eager to enter the market. Apple has announced the HomePod and numerous other vendors have either unveiled or are heavily rumored to be working on versions of their own.

Harman Kardon (in conjunction with Microsoft), GE Lighting and Lenovo have announced products in the US, while Alibaba, Xiaomi and JD.com, among others, have said they will be bringing products out in China. In addition, Facebook is rumored to be building a screen-equipped smart speaker called Gizmo.

One obvious question after hearing about all the new entrants is, how can they all survive? The short answer, of course, is they won’t. Nevertheless, expect to see a lot of jockeying, marketing and positioning over the next year or two because it’s still very early days in the world of AI-powered and personal assistant-driven smart speakers.

Yes, Amazon has built an impressive and commanding presence with the Echo line, but there are many limitations to Echos and all current smart speakers that frustrate existing users. Thankfully, technology improvements are coming that will enable competitors to differentiate themselves from others in ways which reduce the frustration and increase the satisfaction that consumers have with smart speakers.

One obvious question after hearing about all the new (smart speakers) entrants is, how can they all survive? The short answer, of course, is they won’t.

Part of the work involves the overall architecture of the devices and how they interact with cloud-based services. For example, one of the critical capabilities that many users want is the ability to accurately recognize different individuals that speak to the device, so that responses can be customized for different members of a household. To achieve this as quickly and accurately as possible, it doesn’t make sense to try and send the audio signal to the cloud and then wait for the response. Even with superfast network connections, the inevitable delays make interactions with the device feel somewhat awkward.

The same problem exists when you try to move beyond the simple single query requests that most people are making to their smart speakers today. (Alexa, play music by horn bands or Alexa, what is the capital of Iceland?) In order to have naturally flowing, multi-question or multi-statement conversations, the delays (or latency) have to be dramatically reduced.

The obvious answer to the problem is to do more of the recognition and response work locally on the device and not rely on a cloud-based network connection to do so. In fact, this is a great example of the larger trend of edge computing, where we are seeing devices or applications that use to rely solely on big data centers in the cloud start to do more of the computational work on their own.

That’s part of the reason you’re starting to see companies like Qualcomm and Intel, among others, develop chips that are designed to enable more powerful local computing work on devices like smart speakers. The ability to learn and then recognize different individuals, for example, is something that the DSP (digital signal processor) component of new chips from these vendors can do.

Another technological challenge facing current generation products is recognition accuracy. Everyone who has used a smart speaker or digital assistant on other device has had the experience of not being understood. Sometimes that’s due to how the question or command is phrased, but it’s often due to background noises, accents, intonation or other factors that essentially end up providing an imperfect audio signal to the cloud-based recognition engine. Again, more local audio signal processing can often improve the audio signal to be sent, thereby enhancing overall recognition.

Going further, most of the AI-based learning algorithms used to recognize and accurately respond to speech will likely need to be run in very large, compute-intensive cloud data centers. However, the idea of being able to start do pattern recognition of common phrases (a form of inferencing—the second key aspect of machine learning and AI) locally with the right kind of computing engines and hardware architectures is becoming increasingly possible. It may be a long time before all that kind of work can be done within smart speakers and other edge devices, but even doing some speech recognition on the device should enable higher accuracy and longer conversations. In short, a much better user experience.

As new entrants try to differentiate their products in an increasingly crowded space, the ability to offer some key tech-based improvements is going to be essential. Clearly there’s a great deal of momentum behind the smart speaker phenomenon, but it’s going to take these kind performance improvements to move them beyond idle curiosities and into truly useful, everyday kinds of tools.

Bob O’Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting and market research firm. You can follow him on Twitter . This article was originally published on Tech.pinions.

Permalink to story.

 
What I think is, companies should concentrate on making good speakers, without the gimmicks. You know, like the cast frame 10 pound alnico magnet equipped technological masterpieces JBL produced, once upon a time.

Or a least, companies should start putting their name on the cabinets in Chinese characters, so you'd know what kind of crap you're getting.

But, I'd settle for, (and own), speakers I can hear, but which can't hear me. Or, to put a colloquial twist on that, "there's a elephant in the room, and her name is Alexa".

Incidentally, James Patterson just put out a new novel entitled, "the Store". It's centers around a mega eTailer that follows you around, knows where you are, who you're talking to, what it's going to try and make you buy next, and so forth. I think he should have titled it "Alexa the Amazon" (*), but he most likely would have gotten sued for that.

Ah, the irony of being sued for telling the truth, seems to be an integral part of modern life...:eek:

52462364

The only thing I'm getting from this discussion, is that a complete surrender of personal privacy is tantamount to being revered as, a progressive'. And...., that trying to maintain that privacy, is tantamount to inviting yourself to be branded a, 'Luddite'.


(*) Speaking of which, I wonder if Brazil now has to pay royalties to Jeff Bezos, to be allowed to keep the name "Amazon", for their big, big river.:D
 
Last edited:
Well, I was impressed with mine from Amazon until recently when it arbitrarily started announcing the Amazon "great deals I just had to have". It was a decent back up alarm clock and I enjoyed hearing the present temperature but if it keeps yacking away they are going to have to change their "specials" to induce the workers at the land fill cause it ain't staying here! I had an x-wife that did that and she's looooong gone now!
 
What I think is, companies should concentrate on making good speakers, without the gimmicks. You know, like the cast frame 10 pound alnico magnet equipped technological masterpieces JBL produced, once upon a time.

Or a least, companies should start putting their name on the cabinets in Chinese characters, so you'd know what kind of crap you're getting.

But, I'd settle for, (and own), speakers I can hear, but which can't hear me. Or, to put a colloquial twist on that, "there's a elephant in the room, and her name is Alexa".

Incidentally, James Patterson just put out a new novel entitled, "the Store". It's centers around a mega eTailer that follows you around, knows where you are, who you're talking to, what it's going to try and make you buy next, and so forth. I think he should have titled it "Alexa the Amazon" (*), but he most likely would have gotten sued for that.

Ah, the irony of being sued for telling the truth, seems to be an integral part of modern life...:eek:

52462364

The only thing I'm getting from this discussion, is that a complete surrender of personal privacy is tantamount to being revered as, a progressive'. And...., that trying to maintain that privacy, is tantamount to inviting yourself to be branded a, 'Luddite'.


(*) Speaking of which, I wonder if Brazil now has to pay royalties to Jeff Bezos, to be allowed to keep the name "Amazon", for their big, big river.:D
A diplomatic, yet long winded way of saying "they're garbage".
 
Aw shucks, that's so nice of you to say that. "A diplomat", is that when you're as long winded as your own flatulence? :D
Anyway I agree with you but I don't like them as much as you do... They're garbage. PERIOD. I don't see any purpose, let alone value in any of them unless getting rid of the last vestiges of whatever little privacy still enjoy doesn't bother you. I see them as nothing more than spying intruders thinly disguised as something trying to be useful.
 
Anyway I agree with you but I don't like them as much as you do... They're garbage. PERIOD. I don't see any purpose, let alone value in any of them unless getting rid of the last vestiges of whatever little privacy still enjoy doesn't bother you. I see them as nothing more than spying intruders thinly disguised as something trying to be useful.
Well, it's the boss's thread, and I was trying to keep my response a bit muted...

In any event none of those plastic junks has earned the right to be called, "a loudspeaker".

That's a loudspeaker:
maxresdefault.jpg

JBL D-130 Extended range 15" speaker SPL= 105 DB 1 watt 1 meter

All that, and it's stone deaf, can't hear a word you say..

(Get too close for too long, it'll make you deaf too :eek: ).
 
Last edited:
Well, it's the boss's thread, and I was trying to keep my response a bit muted...

In any event none of those plastic junks has earned the right to be called, "a loudspeaker".

That's a loudspeaker:
maxresdefault.jpg

JBL D-130 Extended range 15" speaker SPL= 105 DB 1 watt 1 meter

All that, and it's stone deaf, can't hear a word you say..

(Get too close for too long, it'll make you deaf too :eek: ).
Now you're talkin'... err... blasting I mean. We grew up on a diet of these things with no ill effects (hopefully) but these days there's SPL (sound pressure levels) frequency response, dB, response bandwidth, simulated 7.1 etc.and all sorts of other intelligent 'sounding' but uninteresting medical/technical gobbledygook they hit us with just for marketing purposes. Pure, powerful, unadulterated sound is all we really want after all and the speakers don't have to have an Einstein IQ either.
 
1485967154659-1435884042694653.jpeg

Above,: The Grateful Dead's stage sound system, peppered with D-130's

Below: Screw it, read what you will into it :):eek::
f4f01119141f6ddfb22c21997c453a9c--nsa-spying-big-brothers.jpg
 
Back