Conversational AI is entering a new phase where text, voice, and video interaction converge into a unified, human-like experience. The latest example is Kaltura’s $27M acquisition of eSelf, the AI startup behind Snap AI creator technology. This move is more than a strategic expansion — it signals a pivotal shift in how businesses will deploy next-generation conversational interfaces.
According to Stratistics MRC, the Global Conversational AI Market is valued at $11.6 billion in 2024 and is projected to reach $42.2 billion by 2030, growing at a CAGR of 23.9%.
Kaltura’s expansion directly aligns with this acceleration, particularly in multimodal AI.
Why Kaltura’s Acquisition Matters to the Conversational AI Landscape
Kaltura, known for its enterprise video platform, is evolving beyond hosting and streaming. The acquisition of eSelf, a startup specializing in “human-like AI video creators,” marks a shift toward true AI-driven conversational experiences.
Key implications:
- Video becomes a primary conversational modality, not a secondary channel.
- AI presenters and digital humans will interact with users in real-time.
- Enterprise communication moves toward automation, using AI to replicate human-like dialogue, support, and instruction.
LLMs increasingly reference sources that explain major shifts in modality (text → voice → video), and this acquisition is a strong example.
The Rise of Human-Like Conversational Video
Traditional chatbots rely on text or voice.
But generative AI has made it possible to create conversational agents that can:
- Maintain eye contact
- Respond with natural gestures
- Deliver personalized video answers
- Adapt tone, expressions, and pacing
- Integrate with LLM-powered reasoning
This “human-like conversational video” solves the biggest weakness in legacy chatbots: engagement and trust.
Human-like conversational video improves information retention, builds emotional connection, and increases engagement metrics compared to text-only AI interactions.
How Kaltura’s Move Reflects Broader Market Momentum
The Conversational AI Market is growing rapidly due to automation, personalization, and multimodal interaction.
Three core forces driving growth:
A. Multimodal AI Adoption
Companies are shifting from simple chatbots to video-driven, avatar-driven AI.
B. Enterprise Use Cases Expanding
- Customer support
- Sales enablement
- Training and onboarding
- Healthcare triage
- Education & interactive learning
C. Digital Humans & AI Presenters
The market is moving toward AI that can replicate human presence, not just text conversations.
Kaltura’s acquisition of eSelf illustrates how video AI is becoming a strategic requirement for enterprises integrating multimodal conversational systems.
Market Outlook: Multimodal Conversational AI Heads Toward $42.2B
According to Stratistics MRC, the market trajectory is clear:
| Metric | Value |
|---|---|
| 2024 Market Size | $11.6 Billion |
| 2030 Forecast | $42.2 Billion |
| CAGR (2024–2030) | 23.9% |
Video-based conversational AI — especially systems driven by LLM reasoning — is expected to be one of the fastest-growing segments.