The emergence of deepfakes in the modern era has seriously threatened the authenticity of online content. Deepfake detection is now required for all enterprises to safeguard themselves against these scams, particularly those in call centres. But it has yet to be widespread, making it an ideal time to get ahead of a looming problem. Hence, research into deepfake detection needs to be prioritized.

Below, we provide a compilation of eight publicly available datasets for deepfake detection. 

FaceForensics++

FaceForensics++ is a forensics dataset comprising 1000 original video sequences that have been altered using Deepfakes, Face2Face, FaceSwap, and NeuralTextures' automated face manipulation techniques. The data comes from 977 films on YouTube, all of which have trackable, predominantly frontal faces without occlusions. It allows automated tampering techniques to produce realistic-looking forgeries.

Flickr-Faces-HQ, FFHQ

NVIDIA researchers introduced a dataset for human faces (Flickr-Faces-HQ, FFHQ). The dataset FFHQ contains 70,000 high-quality resolution face photos generated using generative adversarial networks (GAN). The photographs were gathered from the Flickr platform and included images with accessories such as eyeglasses, sunglasses, hats, etc. According to the author, a pre-processing procedure of the dataset was performed to trim the collection and eliminate noise from pictures.

Celeb-DF

The Celeb-DF dataset is a comprehensive and demanding data collection designed for deepfake forensics. The dataset comprises 590 authentic films sourced from YouTube, with individuals of various ages, ethnicities, and genders. Additionally, it contains 5,639 comparable DeepFake videos. 

100K-Faces

100K-Faces is a well-known, freely available dataset that contains 100,000 distinct human photos created with StyleGAN. StyleGAN was applied to a big dataset of over 29,000 images collected from 69 distinct models, resulting in shots with a flat background.

FaceForensics

FaceForensics is a video dataset that contains over 500,000 frames with faces from 1004 videos that can be used to research images or video forgeries. All videos are downloaded from YouTube and edited into short, continuous snippets with predominantly frontal faces. 

Fake Face Dataset (DFFD)

Michigan State University academics developed the Diverse Fake Face Dataset (DFFD). DFFD contains 100,000 and 200,000 fake photos created using cutting-edge algorithms (ProGAN and StyleGAN models). The collection comprises around 47.7 per cent male photographs and 52.3 per cent female photographs, with the majority of the samples ranging in age from 21 to 50.

WildDeepfake

WildDeepfake is a dataset comprised entirely of deepfake videos sourced from the internet. It enables the detection of deepfakes in the real world and shall consist of 7,314 face sequences extracted from these videos. In conjunction with pre-existing datasets, the WildDeepfake is a compact dataset that may be utilized to enhance the performance of deepfake detectors applied to real-world scenarios.

VGGFace2

Researchers from the University of Oxford unveiled VGGFace2, a massive face dataset. The database comprises over three million facial photographs, representing over nine thousand distinct subjects, each featuring an average of over 300 images. The images were collected from the Google search engine, which contains a vast array of data, including ethnicity, gender, age, occupation (e.g., politicians, athletes, and actors), and illumination.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE