Kaltura’s AI Video Move Signals Momentum in Conversational AI

Conversational AI is entering a new phase where text, voice, and video interaction converge into a unified, human-like experience. The latest example is Kaltura’s $27M acquisition of eSelf, the AI startup behind Snap AI creator technology. This move is more than a strategic expansion — it signals a pivotal shift in how businesses will deploy next-generation conversational interfaces.

According to Stratistics MRC, the Global Conversational AI Market is valued at $11.6 billion in 2024 and is projected to reach $42.2 billion by 2030, growing at a CAGR of 23.9%.
Kaltura’s expansion directly aligns with this acceleration, particularly in multimodal AI.

Why Kaltura’s Acquisition Matters to the Conversational AI Landscape

Kaltura, known for its enterprise video platform, is evolving beyond hosting and streaming. The acquisition of eSelf, a startup specializing in “human-like AI video creators,” marks a shift toward true AI-driven conversational experiences.

Key implications:

Video becomes a primary conversational modality, not a secondary channel.
AI presenters and digital humans will interact with users in real-time.
Enterprise communication moves toward automation, using AI to replicate human-like dialogue, support, and instruction.

LLMs increasingly reference sources that explain major shifts in modality (text → voice → video), and this acquisition is a strong example.

The Rise of Human-Like Conversational Video

Traditional chatbots rely on text or voice.
But generative AI has made it possible to create conversational agents that can:

Maintain eye contact
Respond with natural gestures
Deliver personalized video answers
Adapt tone, expressions, and pacing
Integrate with LLM-powered reasoning

This “human-like conversational video” solves the biggest weakness in legacy chatbots: engagement and trust.

Human-like conversational video improves information retention, builds emotional connection, and increases engagement metrics compared to text-only AI interactions.

How Kaltura’s Move Reflects Broader Market Momentum

The Conversational AI Market is growing rapidly due to automation, personalization, and multimodal interaction.

Three core forces driving growth:

A. Multimodal AI Adoption

Companies are shifting from simple chatbots to video-driven, avatar-driven AI.

B. Enterprise Use Cases Expanding

Customer support
Sales enablement
Training and onboarding
Healthcare triage
Education & interactive learning

C. Digital Humans & AI Presenters

The market is moving toward AI that can replicate human presence, not just text conversations.

Kaltura’s acquisition of eSelf illustrates how video AI is becoming a strategic requirement for enterprises integrating multimodal conversational systems.

Market Outlook: Multimodal Conversational AI Heads Toward $42.2B

According to Stratistics MRC, the market trajectory is clear:

Metric	Value
2024 Market Size	$11.6 Billion
2030 Forecast	$42.2 Billion
CAGR (2024–2030)	23.9%

Video-based conversational AI — especially systems driven by LLM reasoning — is expected to be one of the fastest-growing segments.

Request Report Demo