The Great Voice Consolidation

Speech is our fastest natural means of communicating. Until we’re all chipped and can talk in wordless brainwaves like Professor X, voice is the human ceiling for external command and expression. With a new market emerging around advances in neural network recognition of vocal inputs, efficiency may well be the core advantage that drives user adoption and so overall maturation of the voice sector.

Its hard to imagine that the voice ecosystem will look anything like the mobile-app layer, or for that matter the early world of early e-commerce marketplaces. Voice as an interface and an experience makes invisible design a reality. There’s still a long way to go to saturate – and potentially replace – software design on the client side. However, all the early Skills (I hate this term fwiw) for the various voice-powered hardware-platforms are 100% invisible once enabled. There’s nothing to look at, click, drag, swipe, post, comment on, delete, log into, or choose to download.

You might think yeah obviously, there’s no screen but shifting away from a base interaction layer that’s visual means the characteristics and design of voice infrastructure will not mirror or even grow out of what’s existed before. There might not be an ‘app store for voice’ for example. Voice-powered products and services will exist and succeed or fail via as yet unknown but fundamentally different looking distribution and discovery mechanisms.

Without visual cues, users will rely on words that they already know today to get what they want from Voice. I think of two general buckets of known ‘wake-up words’ that will drive a voice user’s product and services bias:

Established Internet Brands Today: Sort of like the Sticky Note, we recall certain brands as representative of an entire sector of product or service: Uber and Seamless have held this metonymy distinction. Because voice activation may require users to know both what they want and who provides it, today’s leading brands in any given vertical will have a huge advantage. For better or worse, they are already in our heads as the accompanying solution to what we want, so we’re much more likely to use their names to direct the voice interface.

Real World Category Definitions: An even simpler mental process for new voice users will be calling out the general category of product or service: play music, order lunch, Read news, Schedule doctor’s appointment. Like the Alexa platform today, users may be able to set certain preferences via an or website. But many users may be satisfied if the original request is fulfilled, without caring which internet product or service carried it out. Take streaming media for example, if I say Play Isaiah Rashad or Stream Boogie Nights I really don’t care what service either comes from as long as it does. Even with very specific asks, the value of platform differentiation beyond library volume starts to erode with voice. This reveals another strategic value the move by many media streaming companies to secure exclusive rights to content or create their own.

These two categories of voice-user activation further underscore the future power of dominant voice platforms to develop into independent operating systems. These can then prioritize their own products and services (Amazon already does) or drive the evolution of competitive search monetization and SEO rankings for voice. There are many more open questions to consider about Voice design, user interaction, and market dynamics like: what does a Voice wave without any visuals mean for branding and advertising, which today is driven by measuring clicks, eyeballs, and logo placements? What seems clear is that voice-powered interfaces have the potential to consolidate our product and service selections in the name of efficiency. If wide-open user choice that incites intense competition to acquire those users is a core dynamic of dominant tech ecosystems up to this point, Voice may break from that model.

What competition and choice end up meaning on a Voice dominant internet of course remains to be seen…