In context: Getting machines to understand natural language interactions is a lot harder than it first appeared. Many of us learned this to some degree in the early days of voice assistants when what seemed like very reasonable information requests often ended up being answered with frustratingly nonsensical responses. It turns out human beings are much better at understanding the subtle nuances (or very obvious differences) between what someone meant versus what they actually said.
Ever since Amazon introduced Alexa via its Echo smart speakers, I've longed for the day when I could just talk to devices and have them do what I wanted them to. Unfortunately, we're not there just yet, but we're getting somewhat closer.
One of the apparent issues when understanding natural language is that the structure and syntax of spoken language that we all understand intuitively often needs to be broken down into many different sub-components before they can be "understood" by machines.
That means the evolution of machine intelligence has been slower than many hoped because of the need to figure out the incremental steps necessary to really make sense of a given request. Even today, some of the most sophisticated natural language AI models are running into walls when it comes to doing any kind of simple reasoning that requires the kind of independent thinking that a young child can do.
On top of this, when it comes to smart home-focused devices---which is where voice-assistant powered machines continue to make their mark---there has been a frustrating wealth of incompatible standards that have made it physically challenging to get devices to work together.
Thankfully, the new Matter standard---which Amazon, Apple, Google and many others are planning to support---goes a long way towards solving this challenge. As a result, the very real problem of getting multiple devices from different vendors or even different smart home ecosystems to seamlessly work together may soon be little more than a distant memory.
With all this context in mind, the many different developer focused announcements that Amazon made at Alexa Live 2022 make even more sense. The company debuted the Connect Kit SDK for Matter. This extends a range of Amazon connection services to any Matter-capable device that supports it. This means that companies building smart home devices can leverage the work Amazon has done for critical features like cloud connectivity, OTA software updates, activity logging, metrics and more. The goal is to get a baseline of functionality that will encourage users to purchase and install multiple Matter-capable smart home products.
Of course, once devices are connected, they still need to communicate with each other in intelligent ways to provide more functionality. To address this, Amazon also unveiled the Alexa Ambient Home Dev Kit, which combines services and software APIs that allow multiple devices to work together easily and silently in the background.
Amazon and others call this "ambient computing", because it's meant to provide a mesh of essentially invisible computing services. The first version of this dev kit includes Home State APIs to do things like simultaneously put all your smart home devices into different modes (such as Sleep, Dinner Time, Home, etc.). Safety and Security APIs automatically send alarms from connected sensors, such as smoke alarms, to other connected devices and applications to ensure the alarms are noticed/heard. API for Credentials makes user setup across multiple devices easier by sharing Thread network credentials (a key part of the Matter standard), so that users don't have to do it more than once.
Speaking of easier setup, Amazon also announced plans to let its "Frustration-Free Setup" features be used by non-Amazon devices purchased in other retail stores. The company plans to leverage the Matter standard to enable this, emphasizing once again how important Matter is going to be for future devices.
For those working with voice interfaces, Amazon is working to enable some of the first real capabilities for an industry development called the Voice Interoperability Initiative, or VII.
First announced in 2019, VII is designed to let multiple voice assistants work together in a seamless way to provide more complex interactions. Amazon said it is working with Skullcandy and Native Voice to allow use of Alexa along with the "Hey Skullcandy" assistants and commands at the same time. For example, you can use "Hey Skullcandy" to enable voice-based control of headphone settings and media playback, but also ask Alexa for the latest news headlines and have them play back over the Skullcandy headphones.
The Alexa Voice Service (AVS) SDK 3.0 debuted to combine Alexa capabilities with the previously separate set Alexa Smart Screen SDK for generating smart screen-based responses. Using this would allow companies to potentially do things like have a voice-based interface with visual confirmations on screen or to create multi-modal interfaces that leverage both at the same time.
Finally, Amazon also unveiled a host of new Skills, Skill Development, Skill Promotion, and Skill education tools designed to help developers who want to create Skill "apps" for the Alexa ecosystem across a wide range of different devices, including TVs, PCs, tablets, smart displays, cars, and more. All told, it looks to be a comprehensive range of capabilities that should make a tangible difference for those who want to leverage the installed base of roughly 300 million Alexa-capable devices.
Unfortunately, browsing through multi-level screen-based menus, pushing numerous combinations of buttons, and trying to figure out the mindset of the engineers who designed the user interfaces is still the reality of many gadgets today. I, for one, look forward to the ability to do something like plug a new device in, tell it to connect my other devices, have it speak to me through some connected speaker to tell me that it did so (or if it didn't, what needs to be done to fix that), answer questions about what it can and can't do and how I can control it, and finally, keep me up-to-date verbally about any problems that may arise or new capabilities it acquires.
As these new tools and capabilities start to get deployed, the potential for significantly easier, voice-based control of a multitude of digital devices is getting tantalizingly closer.
Bob O'Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech.