The Great Voice Consolidation

Speech is our fastest natural means of communicating. Until we’re all chipped and can talk in wordless brainwaves like Professor X, voice is the human ceiling for external command and expression. With a new market emerging around advances in neural network recognition of vocal inputs, efficiency may well be the core advantage that drives user adoption and so overall maturation of the voice sector.

Its hard to imagine that the voice ecosystem will look anything like the mobile-app layer, or for that matter the early world of early e-commerce marketplaces. Voice as an interface and an experience makes invisible design a reality. There’s still a long way to go to saturate – and potentially replace – software design on the client side. However, all the early Skills (I hate this term fwiw) for the various voice-powered hardware-platforms are 100% invisible once enabled. There’s nothing to look at, click, drag, swipe, post, comment on, delete, log into, or choose to download.

You might think yeah obviously, there’s no screen but shifting away from a base interaction layer that’s visual means the characteristics and design of voice infrastructure will not mirror or even grow out of what’s existed before. There might not be an ‘app store for voice’ for example. Voice-powered products and services will exist and succeed or fail via as yet unknown but fundamentally different looking distribution and discovery mechanisms.

Without visual cues, users will rely on words that they already know today to get what they want from Voice. I think of two general buckets of known ‘wake-up words’ that will drive a voice user’s product and services bias:

Established Internet Brands Today: Sort of like the Sticky Note, we recall certain brands as representative of an entire sector of product or service: Uber and Seamless have held this metonymy distinction. Because voice activation may require users to know both what they want and who provides it, today’s leading brands in any given vertical will have a huge advantage. For better or worse, they are already in our heads as the accompanying solution to what we want, so we’re much more likely to use their names to direct the voice interface.

Real World Category Definitions: An even simpler mental process for new voice users will be calling out the general category of product or service: play music, order lunch, Read news, Schedule doctor’s appointment. Like the Alexa platform today, users may be able to set certain preferences via an or website. But many users may be satisfied if the original request is fulfilled, without caring which internet product or service carried it out. Take streaming media for example, if I say Play Isaiah Rashad or Stream Boogie Nights I really don’t care what service either comes from as long as it does. Even with very specific asks, the value of platform differentiation beyond library volume starts to erode with voice. This reveals another strategic value the move by many media streaming companies to secure exclusive rights to content or create their own.

These two categories of voice-user activation further underscore the future power of dominant voice platforms to develop into independent operating systems. These can then prioritize their own products and services (Amazon already does) or drive the evolution of competitive search monetization and SEO rankings for voice. There are many more open questions to consider about Voice design, user interaction, and market dynamics like: what does a Voice wave without any visuals mean for branding and advertising, which today is driven by measuring clicks, eyeballs, and logo placements? What seems clear is that voice-powered interfaces have the potential to consolidate our product and service selections in the name of efficiency. If wide-open user choice that incites intense competition to acquire those users is a core dynamic of dominant tech ecosystems up to this point, Voice may break from that model.

What competition and choice end up meaning on a Voice dominant internet of course remains to be seen…

Tim Devane

Tim is the New York-based Principal at NextView Ventures. Tim wakes up everyday hoping to meet, work with and write about seed stage startups and the entrepreneurs behind them.

Tim began his career at Betaworks, working in a variety of roles for the Betaworks’ seed fund and studio. From Betaworks, Tim joined Bitly as one of the first employees and became the Director of Business Development and Sales. After Bitly, Tim became COO of Epic Magazine and an EIR at Red Sea Ventures. A graduate of Wesleyan University where he majored in English – Creative Writing with a certificate in International Relations, Tim helped launch Digital Wes, an alumni-student organization that helps undergrads find jobs at startup. Tim’s first foray into entrepreneurship was starting an environmental non-profit in college called Birthright Earth.

Tim was born in London, grew up in Washington D.C. and now lives in Brooklyn with his dog Calypso.

  • Trevor Sumner

    Super interesting points on how Voice gives incumbents advantage and how owning the Voice interface excludes competition and disrupts traditional markets. I wonder how much of voice will be multi-modal, for the very reason you mention that voice is our fastest input, but much of our consumption will continue to be non-audio. Additionally, we are at such a nascent stage where so many of the apps are media or brand related because they are constrained by simplicity and focused on content (vs function). As the market matures and NLP and intent enables much more complex interaction models, I think we will see a shift more to an app (function) mindset which opens the market back up, if the ecosystem allows it and we have the types of prototyping tools to really design to user behavior. Lots to think about. Thank you.