Amazon is racing to transplant Alexa’s “brain” with generative AI

Amazon is preparing to relaunch its voice-powered digital assistant Alexa as an artificial intelligence “agent” that can complete practical tasks, as the tech group races to solve challenges that have dogged the AI overhaul of the system.
The $2.4 billion company has been looking for the past two years to redesign Alexa, its conversational system built into 500 million consumer devices worldwide, so the software “brain” is transplanted with generative AI .
Rohit Prasad, who leads the artificial general intelligence (AGI) team. Amazontold the Financial Times that the voice assistant still needed to overcome several technical hurdles before rollout.
This includes solving the problem of “hallucinations” or fabricated responses, their response speed or “latency”, and reliability. “Hallucinations should be close to zero,” Prasad said. “It’s still an open issue in the industry, but we’re working very hard.”
The vision of Amazon leaders is to transform Alexa, which is currently used for a narrow set of simple tasks such as playing music and setting alarms, into an “agent” product that acts as a personalized concierge. This could include anything from suggesting restaurants to configuring the lights in the room based on a person’s sleep cycles.
The Alexa redesign has been in train since the launch of Microsoft-backed OpenAI’s ChatGPT in late 2022. While Microsoft, Google, Meta and others have rapidly integrated generative AI into their computing platforms and strengthen its software services, critics have questioned whether Amazon can. resolve their technical and organizational struggles in time to compete with their rivals.
According to several staffers who have worked on Amazon’s voice assistant teams in recent years, their efforts have been accompanied by complications and follow years of AI research and development.
Several senior workers said the long wait for a rollout was largely due to the unexpected difficulties involved in changing and combining the simpler, predefined algorithms that Alexa was built on, with more powerful but unpredictable language models.
In response, Amazon said it is “working hard to enable even more proactive and capable assistance” from its voice assistant. He added that a technical implementation of this scale, in a live service and a suite of devices used by customers around the world, was unprecedented, and not as simple as superimposing an LLM on the Alexa service.
Prasad, Alexa’s former chief architect, said last month’s release of the company’s Amazon Nova models — led by its AGI team — was partly motivated by specific needs for speed, cost and optimal reliability, to help the AI. applications like Alexa “get to the last mile, which is really hard.”
To operate as an agent, Alexa’s “brain” must be able to call hundreds of third-party software and services, Prasad said.
“Sometimes we underestimate how many services are integrated into Alexa, and it’s a massive number. These applications receive billions of requests per week, so when you’re trying to make reliable actions happen at speed… you have to be able to do it in a very cost- effective,” he added.
The complexity comes from Alexa users expecting fast responses and extremely high levels of accuracy. Such qualities stand in contrast to the inherent probabilistic nature of today’s generative AI, statistical software that predicts words based on speech and language patterns.
Some senior staff also indicate the struggles to preserve the original attributes of the assistant, including its consistency and functionality, while imbuing it with new generative functions such as creativity and fluid dialogue.
Because of the more personalized and chatty nature of LLMs, the company also plans to hire experts to train the AI’s personality, voice and diction so that it remains familiar to Alexa users, according to one of the people familiar with the matter.
A former senior member of the Alexa team said that while LLMs were very sophisticated, they came with risks, such as producing answers that are “completely made up sometimes”.
“At the scale that Amazon operates, that could happen a large number of times per day,” they said, damaging their brand and reputation.
In June, Mihail Eric, a former machine learning scientist at Alexa and a founding member of its “conversation modeling team,” he said publicly that Amazon had “dropped the ball” to become “the unequivocal market leader in conversational AI” with Alexa.
Eric said that despite having strong scientific talent and “enormous” financial resources, the company had been “riddled with technicalities and bureaucracy”, suggesting that “the data were poorly annotated” and “the documentation was either non-existent or stale”.
According to two former employees working on Alexa-related AI, the historical technology underpinning the voice assistant had been inflexible and difficult to change quickly, weighed down by a clumsy and disorganized code base and an engineering team “over-spread thin”.
The original Alexa software, built on technology acquired by British start-up Evi in 2012, was a question-answering machine that worked by searching a defined universe of facts to find the right answer, such as the weather of the day or a specific. song in your music library.
The new Alexa uses a bouquet of different AI models to recognize and translate voice requests and generate answers, as well as to identify policy violations, such as picking inappropriate answers and hallucinations. Building software to translate between legacy systems and new AI models has been a major hurdle in the Alexa-LLM integration.
The models include Amazon’s own software, including the latest Nova models, as well as Claude, the AI model from the start-up Anthropic, in which Amazon has invested. $8 billion more the course of the last 18 months.
“(The) most challenging thing about AI agents is making sure they are safe, reliable and predictable,” Anthropic CEO Dario Amodei told the FT last year.
AI software as an agent must reach the point “where . . . people can really trust the system,” he added. “Once we get to that point, then we will release these systems.”
A current employee said more steps were still needed, such as overlaying child safety filters and testing custom integrations with Alexa like smart lights and the Ring doorbell.
“Reliability is the problem – getting the job done close to 100 percent of the time,” added the employee. “That’s why you see us . . . or Apple or Google ship slowly and gradually.”
Numerous third parties developing “skills” or features for Alexa said they weren’t sure when the new AI-enabled generative device would launch and how to create new features for it.
“We look forward to details and understanding,” said Thomas Lindgren, co-founder of Swedish content developer Wanderword. “When we started working with them they were much more open . . . then over time, they changed.”
Another colleague said that after an initial period of “pressure” that was put on developers by Amazon to start preparing for the next generation of Alexa, things have been quiet.
An enduring challenge for Amazon’s Alexa team — which has been hit with major layoffs in 2023 — is how to make money. Figuring out how to make the assistants “cheap enough to run at scale” will be a major task, said Jared Roesch, co-founder of the generative AI group OctoAI.
Options being discussed include creating a new Alexa subscription service, or taking a cut of sales of goods and services, a former Alexa employee said.
Prasad said Amazon’s goal was to create a variety of AI models that could act as “building blocks” for a variety of applications beyond Alexa.
“What we’re still customer-based and practical AI, we’re not doing science for science’s sake,” Prasad said. “We do this . . . to deliver customer value and impact, which in this era of generative AI becomes more important than ever because customers want to see a return on investment.”
https://www.ft.com/__origami/service/image/v2/images/raw/https%3A%2F%2Fd1e00ek4ebabms.cloudfront.net%2Fproduction%2Fa20bc117-daa6-4429-86eb-839ba77d45f7.jpg?source=next-article&fit=scale-down&quality=highest&width=700&dpr=1
2025-01-14 05:00:00