Indian AI start-up Sarvam has released an updated version of its text-to-speech AI model.
Indian AI start-up Sarvam has rolled out a new version of its text-to-speech AI model, enhancing natural speech generation across various Indian regions, scripts, and accents.
The updated model, named Bulbul V3, boasts over 35 high-quality voices from professional voice artists and supports more than 11 Indian languages, as stated by Sarvam in a blog post on Thursday, February 5.
The company aims to expand support to all 22 scheduled Indian languages soon.
Bulbul V3, built on a large language model (LLM), uses prosodic elements like pauses, emphasis, pacing, and tone modulation to produce a more natural sound as it converts text into AI-generated speech.
Users can create and play audio in real time when using the low-latency streaming output mode.
“This is essential for conversational applications, live interactions, and any scenario where quick responses enhance user engagements”, Sarvam mentioned.
“Indian speech is inherently complex. People often switch languages mid-sentence. Accents differ by region. Names, abbreviations, and emotions are just as important as the words themselves. To function effectively in India, voice technology must manage all these nuances seamlessly”, the start-up added.
Additionally, users can clone and customize AI-generated voices using the AI model.
The consent-based voice cloning feature is designed for high-volume enterprise applications and has built-in safeguards.
Sarvam is also among 12 start-ups and organizations chosen by the Indian government to develop sovereign LLMs under the Rs 10,300-crore India AI Mission.
These AI models are expected to be revealed at the India-AI Impact Summit 2026, scheduled for February 16 to February 20, 2026. It is set to take place in New Delhi.
If you’re keen to try out the new model, you can find the Bulbul V3 on the Sarvam Dashboard.
The company is also giving developers unlimited API access to this new AI voice-generation model until February 28, 2026.
Bulbul V3 was evaluated by an impartial third party in a blind A/B human listening study across 11 languages as part of its testing.
In this test, the same input text was used to compare audio samples produced by Bulbul V3 with those produced by speech models from competitors.
The company also stated that its new AI model surpassed all others in 8 kHz (telephony) evaluations.
Bulbul V3 showed “the lowest rates of word skips and mispronunciations, while keeping performance on extra-content errors comparable”.









Leave a Reply