What if Spanish actors could speak Tamil with natural ease? Or at least appear to speak Tamil naturally?
This is possible with the intervention of generative AI, say the founders of Bengaluru-based deep tech startup .
Today, content owners and distributors dub content in several languages to reach a wider audience. However, the dubbed content doesn’t offer a cohesive viewing experience as the lip and jaw movements of the actors do not match the words coming out of their mouths.
Take the example of a Spanish show dubbed in Tamil. The audio is in Tamil, but the actors on the screen still look like they are speaking Spanish. More often than not, viewers find this mismatch irksome and may eventually lose interest in the show.
This is the problem that NeuralGarage aims to address with its flagship product VisualDub.
The genesis of VisualDub started with a personal experience.
Anjan Banerjee, one of the founders of NeuralGarage, is an avid fan of Korean shows and movies. While watching the Korean movie Train to Busan, dubbed in English, he experience disconnect as the dubbed audio did not synchronise with the facial movements of the actors.
This bothered him, as it prevented him from fully immersing himself in the stories and appreciating their visual aspects.
“I have a great fondness for Korean films, and this was clearly at its peak during the lockdown. But I had problems with the visual dissonance due to the lack of audio-video synchronisation. This sparked the idea of whether this could be solved at a time when I was fully immersed in my work on generative networks,” says Banerjee, Chief Product Officer at NeuralGarage.
He wondered if advancements in artificial intelligence could help address the issue and decided to embark upon research to explore the possibilities of this technology.
Driven by his curiosity and desire to bridge the gap between audio and video, Banerjee began studying the potential of AI along with his batchmates from IIT Kanpur, Subhabrata Debnath and Subhashish Saha.
As the trio began building the VisualDub technology to address audio-visual dissonance, they also reached out to Mandar Natekar, a media and entertainment veteran, for advice and mentorship, which paved the way for the birth of NeuralGarage.
Birth of Neural Garage
In 2015, Debnath, Banerjee, and Saha founded Visage Map, a facial recognition startup, which was later acquired by FaceFirst, a US-based facial tech company. They quit FaceFirst in 2021 and started working on the VisualDub technology.
The same year, they founded the deep tech startup NeuralGarage with Natekar, who has a rich experience of 20 years working with companies such as Viacom18, Times Television Network, Turner International, Reliance Entertainment, and Sony.
<figure class="image embed" contenteditable="false" data-id="522351" data-url="https://images.yourstory.com/cs/2/6c7d986093a511ec98ee9fbd8fa414a8/CopyofImageTaggingnoframesEditorialTeamMaster-1690655918230.png" data-alt="NeuralGarage Founders" data-caption="
NeuralGarage Founders
” align=”center”> NeuralGarage Founders
.thumbnailWrapper
width:6.62rem !important;
.alsoReadTitleImage
min-width: 81px !important;
min-height: 81px !important;
.alsoReadMainTitleText
font-size: 14px !important;
line-height: 20px !important;
.alsoReadHeadText
font-size: 24px !important;
line-height: 20px !important;
“Our vision at NeuralGarage is to make communication seamless across all barriers of language visually through the power of AI,” says Natekar, Co-founder and CEO, NeuralGarage.
The name ‘NeuralGarage’ represents neural networks—the heart of AI—and pays tribute to tech companies Apple, Microsoft, Google, and Meta, which were, as the legend goes, started from a garage.
NeuralGarage’s flagship product VisualDub, reduces audio-visual disparity in dubbed content by syncing the lip and jaw movements of actors with the audio.
Eliminating audio-visual discord
VisualDub runs on proprietary algorithms that map phonemes, the lowest bit of human sound, with visemes, the corresponding lip shapes. These are unique mappings that are universally true for every language in the world.
Visual dissonance happens when the audio cues and visual cues are not in sync. VisualDub’s proprietary generative AI tech removes the discord that’s apparent in dubbed content by syncing the jaw and lip movements of actors with the words being spoken.
Generative AI transforms facial parts using audio activations, blending them with the rest of the scene. The lip movements are tweaked to match syllables, and the jaw and chin movements and smile lines are harmonised with this to make the dubbed content visually realistic and natural.
This technology is person- and language-agnostic.
<figure class="image embed" contenteditable="false" data-id="522352" data-url="https://images.yourstory.com/cs/2/6c7d986093a511ec98ee9fbd8fa414a8/WhatsAppImage2023-07-27at19-1690657325643.jpg" data-alt="NeuralGarage" data-caption="
NeuralGarage’s Proprietary VisualDub
” align=”center”> NeuralGarage’s Proprietary VisualDub
“Removing visual dissonance makes the dubbed content look authentic and local, which helps viewers and consumers connect more with the content,” says Natekar.
The synchronisation solution doesn’t interfere with the actual dubbing process. A technology layer is added on top of the dubbed content, he adds.
NeuralGarage offers this technology through API integration, SaaS, and desktop software. The beta version of the software was released two months ago.
The startup uses Amazon Web Services for client delivery and to ensure security and privacy. Additionally, it leverages complex AI and computer vision algorithms to improve content consumption, delivery, and creation.
The technology has been tested in more than 30 languages across the world, including many Indian languages and international languages such as Italian, German, Spanish, Japanese, Korean, and Mandarin.
Business and growth
Recently, Amazon’s ad campaign with actor Manoj Bajpayee was shot in Hindi and dubbed in Tamil, Telugu, Kannada, Malayalam, Bengali, Gujarati, and Marathi.
VisualDub was then used to lip-sync the creative in the dubbed languages to give the feeling that the creative was actually shot in these many languages, thus creating an authentic connection with the consumer, explains Natekar.
NeuralGarage generates business from verticals such as advertising, influencer marketing, content creation, OTT, and films. Film and edtech content account for more than 90% of its revenues. All these projects are under process.
The startup has 10 clients including
India, , Hippo Video, and ..thumbnailWrapper
width:6.62rem !important;
.alsoReadTitleImage
min-width: 81px !important;
min-height: 81px !important;
.alsoReadMainTitleText
font-size: 14px !important;
line-height: 20px !important;
.alsoReadHeadText
font-size: 24px !important;
line-height: 20px !important;
NeuralGarage—which is backed by institutional VC fund
and prolific angel investors including Amit Patni—has raised $1.45 million to date in its seed round.According to 60 startups in the generative AI landscape in India, offering solutions and services to customers across various industry verticals. They include TrueFoundry, which helps build ChatGPT with proprietary data; Cube, which enables AI-based customer support on social media, and Gan.ai, a generative AI video platform.
, there are overOver $590 million in funding has flowed into this space, with 2022 seeing the highest funding.
“Video, speech, text, and images are the focus of most gen AI startups today. Media, entertainment, and marketing services have massive demand for this technology,” points out Natekar.
NeuralGarage’s objective this year is to drive commercial testing and adoption of VisualDub across verticals and keep validating different use cases. The startup is targeting a revenue of $1 million and 50 clients by the end of this year.
Edited by Swetha Kannan