Part 5: Conversational AI & Voice Assistants — The Next Consumer Interface
- Tetsu Yamaguchi

- Jun 2
- 2 min read
Updated: Jun 3
Voice is reclaiming centre‑stage in 2025 as the primary UI for phones, cars, and wearables. What changed? Ultra‑low latency pipelines, on‑device “mini‑LLMs,” and memory‑augmented agents are converging to make talking to machines faster than tapping screens.
Startup watch‑list
Cartesia.ai — Focused on “ultra‑realistic voice with the world’s lowest latency.” Uses SSM + custom Flash3 kernels for streaming and claims 99.9 % uptime and SOC‑2 / HIPAA compliance. $64 M Series A led by Kleiner Perkins (Jan 2025). (smallest.ai, aimresearch.co)
Gambit Co. — Builds persona‑driven companions such as AskEllyn for breast‑cancer support. Relies on domain‑LoRA libraries atop open‑source LLMs plus secure RAG for medical/legal docs. Ranked #7 in FoundersBeta “Top 100 Companies to Watch 2025.” (gambitco.io, gambitco.io)
Key take‑aways
One network, not three. Pipeline fusion (audio → text → audio) is cutting latency by >50 %.
Edge first. SSM + INT4 + sparsity make phone‑scale inference viable; Apple, Google and Samsung all ship local LLMs in 2025 handsets.
Memory is a feature. Assistants that remember you score 18–25 % higher in user satisfaction.^[Cartesia internal test, April 2025]
Personas go vertical. Startups like Gambit win contracts by packaging specialised LoRAs (health‑care empathy, legal tone) that enterprises can audit.
Guardrails matter. As voice enters regulated spaces (finance, health), in‑loop filters and audit logs become table‑stakes.




Comments