We’ve all been stuck waiting on hold or for that call-back. Voice workflows might have been streamlined and digitised but, until now, still have a critical bottleneck: manual human interventions. This is expensive, slow, and a poor user experience.
Voice AI has reached a level of sophistication to finally change this. And so unlock a huge TAM of the labour required in much of customer support, sales, recruiting, and appointments scheduling.
Solutions exist across Voice-to-text, helping to boost productivity with meeting and healthcare appointment notes, Text-to-voice, allowing for conversations especially in simple Q&A or receptionist call solutions, and Voice-to-voice, becoming dominant as more sophisticated voice AI agents have full conversations.
We see an emerging market across the relevant user profiles:
Models: The broadest impact is felt from innovations at the model layer. Advances here are an enabler across the stack, improving underlying quality of both transcription and the voice experience. One of the recent European breakout successes is ElevenLabs, establishing themselves as leaders after achieving unicorn status in 18 months.
Middleware: The middleware layer enables businesses, from enterprises to SMBs, to create custom voice AI solutions with the confidence that they have the latest and greatest technology from the model layer, integrated, updated, and working smoothly. In this layer, business models are largely usage-based (for example, dollar per minute).
Applications: Finally, the application layer provides businesses with tailored voice solutions for specific vertical or functional use cases such as sales, recruiting, or health appointments transcription. This layer has the largest number of players, reflecting the sheer diversity of use cases and the value of having an out-of-the-box solution. Over time, these businesses may adopt more outcome-based business models (for example, charging per qualified lead) reflecting their ambition to replace employees with fully agentic technologies.
Quality of voice experience and transcription has been an important factor today to build customer trust and prove reliability (for example, with latency, emotionality, context-relevance). However: this is likely to become commoditised table-stakes.
Instead, we believe winners at the application and middleware layers will excel on product UI, seamless integrated workflows, and post-call workflow automation. Advantage at the application layer can come from delivering vertical-specific workflows, and at the horizontal middleware layer from seamless and intuitive voice-bot builder experiences.
Commercial focus is also key in this competitive market. The market is large, but a land grab is already underway, and there is a narrow window of time to establish a position before the customer base saturates. An enduring business will require strong customer retention, currently a common challenge among businesses in voice AI. Optimizing for leading indicators of end-customer satisfaction such as resolution rates or call terminations, can get ahead of the problem.