Get featured on IndiaAI

Contribute your expertise or opinions and become part of the ecosystem!

Facebook, recently, released an Image Similarity data-set. It also announced an associated competition with a $200,000 prize pool, which will be hosted by DrivenData. The competition commenced on June 19, 2021, and will conclude after five months, in late October, in 2021. Other partners that are supporting the challenge are Pinterest, BBC, Getty Images, iStock, and Shutterstock.

The newly released data set has almost 1 million reference images and 50,000 query images, a few of them are manipulated versions of the original reference image. This data-set is the largest known data-set of its kind that will aid image similarity, including human and automated edits representing on-platform behaviour.

Facebook is hopeful that with the help of this dataset and the challenge there will be an exploration of new machine-learning-based systems that can predict similarities between two visual content. This progress will help the industry detect manipulated images at scale. 

Social media networks are increasingly utilising content tracing and image similarity detection to thwart or slow down the spread of harmful, malicious content that has the potential to have a negative social impact. These networks include manual content moderation with automated matching tools. 

Image similarity covers identifying the origin of doctored images in a group of unrelated images. Image similarity covers multiple domains such as scams, misinformation, copyright infringement, etc. 

The Image Similarity data set by Facebook will set a benchmark for image similarity detection with the vast collection of images. The dataset consists of selected specific images with broad licenses from the YFCC100M, still images from the Deepfake Detection Challenge data set and the Casual Conversations data set. 

Facebook further applied the Ciagan deepfake technique to further alter faces and make it harder for AI to identify source images. They also applied a wide range of automated transformations to a subset of the 50,000 query images using the recently open-sourced AugLy library.

Further, Facebook, with the help of third-party annotators, transformed a small subset to ensure more selections representative of how a human user would transform images. The Image Similarity Challenge will now enable participants to put their image-matching techniques to test with the Image Similarity data-set. The Image Similarity Challenge has been accepted for the NeurIPS 2021 competition track.

The contestants, from individual, academic and industrial backgrounds, are tasked to find source reference of images from all queries in the data set. Baseline methods from the instance matching literature are included. The researchers worked together with several image matching experts from the Czech Technical University in Prague to choose the right evaluation metrics. However, the access to the data-set is restricted to participants who agree to the terms of use on how they shall use, store and handle that data. After the competition, the data set and ground truth will be made available on a public website.

Facebook AI is confident that the Image Similarity Challenge will ensure faster progress across the industry in dealing with harmful/malicious content and help advance the similarity detection domain by providing a data set explicitly made to aid researchers in tackling this problem. It also provides a benchmark for work in image similarity detection.

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE