Living with Alexa

L

I’d like to make a case for being careful with spreading second- or third-hand stories and rather on gathering first-hand experience of interesting products and services. I believe it’s the best way to feel our way into a future shaped by emerging technologies, and to make informed decisions about them. So in the name of science, I lived with Amazon Echo/Alexa for a week. Here’s my experience.

///

We talk a lot about smart homes, about connected domestic devices, about conversational interfaces and artificial intelligence. A surprising amount of what’s talked about and what’s reported on is word of mouth: I heard somewhere that Amazon Echo ordered a thousand doll houses and boxes of cookies after someone mentioned it on TV! The makers of the doll houses couldn’t believe their luck, and consumers are screwed!

(For the record: In reality, it was likely “a handful” of dollhouse orders; it’s not trivially simple to order—let alone unknowingly—via the device; and Amazon has a full refund policy for physical products ordered this way.)

Word-of-mouth information is bad for all kinds of reasons

This word-of-mouth information is bad for all kinds of reasons. (One could cynically argue that it perfectly fits our times of so-called “post-factual” news and politics.) I believe there’s plenty of reason to be critical of connected services, and even more convinced consumers of (or everyone exposed to) connected services should be able to make informed decisions about their use.

For that reason, I think we should expect from both journalists and everyone in the tech scene (expert peer group!) to be careful about what information and narrative we spread: Instead of rumors we should focus on facts and first-hand experience.

I make a point of frequently testing emerging technologies even when I’m not convinced they’ll be a good fit for my life

This is why I make a point of frequently testing emerging technologies even when I’m not convinced they’ll be a good fit for my life, but that are misunderstood or discussed heavily but with little informational basis. This way I’ve kickstarted smart watches, worn fitness trackers, spit in tubes to have my DNA analyzed. None of it killed me; a lot of it was bland and boring; every time I learned a lot, even if it was only that these technologies offered a lot less risk & reward than the hype suggested.

So we lived for a week with an Amazon Echo and it’s voice-controlled assistant Alexa.

First, for clarification: Amazon Echo is the physical full-size device; Dot is a smaller version; Alexa is the software backend that’s also available as a platform to build apps (in Amazon speak, skills) on through an API.

Second, I’d like to acknowledge that this isn’t exactly pioneering work: the Echo has been available in the US since mid-2015; only in Germany it didn’t come out until fall of last year (Wikipedia). I’d had the chance to learn a bit of its design process and decision making earlier at conference (like Interaction15), so I had a fairly good idea what to expect.

Now, what’s it like to live with a device that aims to be a smart home hub, that is often said to listen in on you permanently (partially true, but likely not in the creepy way often suggested), and that might follow you around on the web: More than once in conversations about Alexa people mentioned that other people had experienced online ads after mentioning a product in front of Alexa. This latter was always related in a friend-of-a-friend context: Nobody could point to a source or documentation, it was all hearsay. Case in point.

So from experience I can say that yes, Alexa might respond to things on TV, but it’s very rare. In an interview I recently gave for RBB Kulturradio (DE) on smart homes and their implications, the host half-joked on the air that ordering Alexa to play their channel during his show might boost their listenership stats; alas he failed to get the syntax right. (I tried to replicate it later by playing his recording to Alexa. Nothing happened.)

Much more annoyingly, it often responds to mentions of similar-sounding names, like Alex. But what might be the most frustrating is that fairly frequently it simply wouldn’t respond when I addressed it, because I wouldn’t stick to the exact tonality of the voice training I had done during setup. And if it did, it often would misunderstand—this may be partially because I mumbled or got caught up mid-sentence while trying to get the syntax right, or because I wasn’t familiar with what orders were OK to give and what was out of scope. I imagine this is part of a learning curve; a week in I could play most music without a hitch (except M.I.A., see below).

It got really, really bad once we switched Alexa to German. Playing music got really tricky. The music streaming service default I had set up before in the English-language interface (in this case Spotify) had to be set up once more. English band names would have to be pronounced in English (they’re names after all), but often would be misinterpreted. Trying to play M.I.A., Alexa would always, 100 percent of the time, play German band Mia. (If you compare the two, you’ll agree this isn’t a mixup you’re likely to enjoy.) It’s perfectly understandable this is a tough nut to crack, but hey, it really shouldn’t be the users’ problem.

How seamlessly the voice and screen control go hand-in-hand is really a thing of beauty: If it works, this is a glimpse into a near future that I’d kinda like.

That said, in English playing music was quite pleasant. The interface is OK enough to make it work. If there’s a mix-up, it’s easy to correct or change course through the Spotify app on your phone. How seamlessly the voice and screen control go hand-in-hand is really a thing of beauty: If it works, this is a glimpse into a near future that I’d kinda like.

But beyond playing music, we couldn’t find any real use case for Alexa. Our house doesn’t have many smart home appliances, and none of the ones we do can interact through Alexa—as far as we know, that is. Alexa apps (“skills”) are legion, but not discovered easily.

Setting a timer is also easy, so in the kitchen these two things alone—playing music and setting timers hands-free—might make for an appealing use case. Almost anything else I found a little disappointing: “How long to get to Hot Spot Restaurant?” failed to produce a result because there’s no routing or mapping services available by default. (Or if there is, I couldn’t find out how to find it.) Online searches for anything are likely to return sub-par results as they’re not powered by Google but Bing, and I still find the difference enormous.

If you’re after dad jokes, you’re in luck.

Alexa is choke-full of easter eggs, like “Alexa, tell me a joke.” So if you’re after dad jokes, you’re in luck.

Otherwise, I noted that most people who hadn’t spent any time with an Echo were a little cautious (“Is it safe to speak in front of it?”) or curious to test the interface (“Alexa, what’s the weather?”, “Alexa, how are you?”, “Alexa, buy a doll house and some cookies, haha!”). This kind of breaks the fourth wall, but of course only highlights how much of a learned behavior it is to interact with a voice-controlled digital assistant. A voice controlled digital assistant is very emphatically not an intuitive interface because we don’t usually talk to our appliances.

A voice controlled digital assistant is very emphatically not an intuitive interface because we don’t usually talk to our appliances.

This is a point that Alexander Aciman makes very clear in a rough take-down of Alexa on Quartz. There he argues that the current manifestation of Alexa isn’t the future of AI, it’s a glorified radio clock, and I tend to agree. Partly it’s that there are some essential default apps missing, including a better search engine integration (where Google obviously has a huge advantage, but competition between the what Bruce Sterling calls the Stacks means Amazon won’t use Google’s search): “Her response to 95% of basic search queries is ‘I can’t find the answer to the question I heard.'” But even once a skill is activated, describes Alexander point-on, “You can’t say ‘Alexa, find my phone,’ but instead must ask say ‘Alexa, ask TrackR to find my phone.’ And God forbid you should accidentally forget the name TrackR, you’ll need your phone to look it up.”

This makes for a rougher-than-necessary user experience. The Alexa companion app tries to make up for this by constantly surfacing new skills and tutorials. This is necessary for sure, but also total kludge.

In short, I found myself using Alexa only to play music—an activity we were set up for perfectly before Alexa. Despite the maybe rough criticism above, there’s something interesting there. It’s important to look at this as an early technology. Things will likely improve and start working just a little better. Interesting use cases might emerge over time.

Alexa is a little too much like simply having a physical token of Amazon, the company, in your living room, like having a print-out of a corporate powerpoint framed on your wall.

As things are today, Alexa doesn’t feel particularly smart, or threatening. Instead Alexa is a little too much like simply having a physical token of Amazon, the company, in your living room, like having a print-out of a corporate powerpoint framed on your wall. What it’s not is a solution to any problem, or a great convener of convenience. Instead it feels very explicitly like it’s the stacks, manifested.

1 comment

  • I too have been spending time with Alexa, both via an Echo Dot in my living room, via the web-based simulator (https://echosim.io/welcome) and, most intensively, inside the Amazon Apps & Services Developer Portal (https://developer.amazon.com/edw/home.html) wherein one can design new “skills” for this ecosystem.

    The Echo, and the Alexa system that backs it up, are, indeed, barely “AI” in any full sense of the word.

    For example, when designing a skill the developer has to explicitly specify the “utterances” that will leave a particular vocalization by the owner to a particular endpoint. I’m designing a skill to query my real time collection of Prince Edward Island electricity load and generation data, and if I want to allow users to query what the peak load is, I need to outline every possible utterance that might lead to that:

    GetPeak the peak on {Date} GetPeak peak on {Date} GetPeak peak {Date} GetPeak the peak load on {Date}

    And so on.

    And that’s a relatively simply query path; for more complex multi-parameter queries, the list of utterances can run into the hundreds.

    This is, in other words, more like coding a modern-day Eliza (https://en.wikipedia.org/wiki/ELIZA) than it is leveraging anything that derives real meaning from real speech.

    That’s not to say there’s no AI-like behaviour available: Alexa’s handling on date variants–”next Thursday,” “last week,” “today,” “January 12,” and so on–is impressive, and relieves the coder a lot of manual drudgery. But this is more a set of convenience functionality than “AI” and I suspect that, under the hood, it’s simply a similar sort of “if…then” matrices.

    Once you dip into the process of developing Alexa skills, you realize what a novel domain this is, and how poorly we’re all ready to confront it. As anyone using Siri or OK Google or an Echo knows, the pool of knowledge that can be queried is very shallow. “Population of Latvia,” “convert 2 cups into litres,” “define flotilla” are all fine. But a seemingly simple knowledge graph query like “what’s the relationship between Alec Baldwin and Kim Basinger” comes up empty.

    All that aside, the killer features of the in-living-room Echo for us have been:

    1. Spotify connectivity. It’s opened up our house to music in a way we’ve never experienced. The cumbersomeness of having to plug in iPods to the stereo, or diagnosing flaky Bluetooth connections, or putting up with tinny sounding phone speakers had turned us away from listening to music in anything other than headphones. “Alexa, play some Bruce Cockburn” is so low-friction that it’s something we do all the time now.

    2. “Alexa, turn off the living room light,” which switches off a Wemo switch with a lamp plugged into it, is helpful in ways that outlast the magic of it all. This too is a great friction-reducer when you’ve got your hands full and you’re trying to get the son to bed and the dog is darking and you are saved from fumbling around with the difficult-to-switch-off lamp.

    Save those two features, however, and the occasional unit conversion or weather query, the Alexa skills collection is 95% fluff–ocean sounds,complex pizza ordering skills I’ll never use, etc.

    I see promise here, and I think, as a developer, that it’s worthwhile terrain to explore given what I’ve learned already. But it ain’t AI, and if anything it shows how far we have to go.