Results for ""
Another Diwali is around the corner and another Cadbury ad has stolen the limelight, this time with an AI twist.
If you were living under a rock for the past one week or longer, then let me bring you up to the pace. Cadbury has launched an ad campaign featuring the Bollywood Badshah Shah Rukh Khan. But, the plot twist is that it is not “just a Cadbury ad” or as per the Gen Z lingo- #NotJustACadburyAd. Instead, this one lets local store owners to create an ad for their shops featuring Shah Rukh Khan for free. The local shops can now use the face and voice of the actor to promote their brand without spending a penny.
Thanks to the magical power of AI & ML, as these technologies were used to recreate Khan's face and voice in a way that it sounds like the actor is saying the local store or brand's name.
Pretty awesome, right?
To launch this ad campaign, Cadbury collaborated with an Indian AI startup called Rephrase AI- a synthetic media production platform that principally uses AI for video personalisation.
"Rephrase AI is proud to partner with Mondelez (the parent company of Cadbury) to help enable personalised ads for all local stores in the country. This shows the power of the Generative AI technology we've been pioneering for the last three years,” said Ashray Malhotra, CEO and Co-Founder of Rephrase AI to INDIAai.
“This is the start of the synthetic media revolution where we help businesses communicate with their customers on a one-on-one basis with their brand ambassadors, help CXOs talk to their employees personally at scale, and help run hyper-targeted ad content,” he added.
Despite the ad campaign putting most of us in awe, numerous concerns have been raised about its ethics and close similarity to another type of synthetic media called deepfakes, which has gained significant notoriety in the last couple of years.
Rephrase AI points out that their work is not deepfakes but they do share the same techniques for creation. According to Rephrase AI, while deepfakes essentially means creating synthetic videos with deep neural networks, the colloquial use of the term is for transferring movements and expressions of one person to another; an equivalent in the real world would be voice mimicking for speech.
According to Rephrase AI, for this particular ad campaign, real videos were created from scratch and they used facial re-enactment tools, where we predict the lip movements, facial expressions, head movements, eye blinks, and everything that goes into making the AI-driven presenter's face look photorealistic - to drive the campaign.
To understand the technical nuances behind this controversial ad campaign, we reached out to Atharva Peshkar and Atharva Khedkar, founders of Nagpur based Detectd, a deepfake finding startup. The duo, while describing the process of synthetic media creation, also explained to us how this particular ad campaign might have developed.
According to them, deep learning algorithms can be used to manipulate or even generate media. In general synthetic media or deepfakes, are created using a special class of Deep Neural Networks called Generative Adversarial Networks (GANs).
These GANs are made up of two networks: a G(x) : Generator network and a D(x) : Discriminator network.
They both engage in a two player adversarial (zero-sum) game- one’s win is another’s loss. The ideal state that the generator and the discriminator networks achieve is Nash equilibrium i.e. generated samples by the generator are so realistic that the discriminator can judge with 50% correctness.
During the training process, the generator attempts to fool the discriminator by generating data that seems realistic and is identical to the training set. Whereas the discriminator seeks to avoid being fooled by trying to learn a sharp classification boundary between the fake (generated by the generator) and real data.
Thus generator model learns how to make realistic images by generating images from random noise. Random noise is sampled using a uniform or normal distribution before being fed into a generator that creates an image. The discriminator learns how to distinguish fake images from real photos by feeding on the generator output, which includes fake images and actual images from the training dataset.
Coming to facial manipulation, it is most commonly done with synthetic videos and can be classified into the following categories:
An example of this would be “Wav2Lip: Accurately Lip-sync Videos to Any Speech” research by IIIT Hyderabad. This is the same category where the synthetic video by Cadbury featuring Shah Rukh Khan comes into play.
Retailers could upload their details (like the business name) to a platform. Then a GAN model for deepfake speech synthesis, like WaveNet(a vocoder developed by DeepMind in 2016), or Tacotron (a text-to-speech algorithm created by Google in 2017) might have been used to generate the appropriate audio clip for every retailer in SRK’s voice. This voice was then ultimately lip-synced with the video to make the final clip for the ad.
However, for creating such an ad campaign certainly has numerous challenges to overcome.
“This campaign was the most stringent test of our AI yet, in multiple ways. The tech tends to be very sensitive to slight changes in position and lighting. Because we had to create avatars of Shah Rukh in 5 different positions, each of them in the middle of an extended act, such slight changes between the training data and the actual ad were inevitable. So we had to make the tech much more robust, which ultimately involved us writing whole new AI models from scratch," says Rephrase AI co-founder Nisheeth Lahoti.
“Also, store names are a mix of English and Hindi words, which posed a bit of a problem because until recently, our multilingual models were notably worse than our monolingual (particularly English-language) ones. We've closed this gap substantially in the last few months, and particularly rapidly in preparation for this campaign.”
"Finally, the state of voice cloning worldwide for such a multilingual setup leaves a lot to be desired. So, we'd started an in-house project to clone voices. But we expected it to take a few months to get to production-level quality, if it ever got there at all. Even in our wildest dreams we couldn't have imagined that we'd end up deploying and using it less than a month after starting the project," he added.
It is an undeniable fact that the ethical and legal concerns regarding the usage of synthetic media, deepfakes and digital avatars, must face public discourse, as its ability to be misused is widely documented. Nevertheless, one has to accept the reality, that we have entered into a new era of AI-driven advertisements.