Image Captioning, language model training, and restoring ancient texts

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

The most important AI research articles are listed here. It also links a more in-depth study and a hand-picked list of recent AI and data science developments.

ClipCap: CLIP Prefix for Image Captioning

Image captioning is one of the most critical tasks in vision-language understanding. The model predicts a textual caption that gives information about an image provided as input. In this paper, the researchers show how to do this task. They use a simple mapping network to use CLIP encoding as a prefix to the caption. The recently proposed CLIP model has many semantic features trained with textual context. This approach makes it best for vision-language perception. Their main idea is that they can learn much about textual and visual data using a language model (GPT2). So, their method only needs a short amount of training to make an excellent captioning model. It quickly makes meaningful captions for large and varied datasets without additional annotations or training.

Surprisingly, their method works well even when only the mapping network, CLIP, and language model stay the same. This process makes for a lighter architecture with fewer trainable parameters. Through quantitative evaluation, the researchers show that their model gets similar results to state-of-the-art methods on the difficult Conceptual Captions and no caps datasets. Still, it is simpler, faster, and lighter.

Paper: ClipCap: CLIP Prefix for Image Captioning

Click here for the code

Click here for the Colab Demo

An empirical analysis of compute-optimal large language model training

Researchers look into the best model and dataset size for training a language transformer model with a given amount of computing power. The recent focus on scaling language models has made it so that large language models are now significantly undertrained. By training 400 language models with 5 to 500 billion tokens, whose parameters ranged from 70 million to 10 billion, they found that for compute-optimal training, We should scale the model's size and the training dataset the same way. Every time the model's size doubles, the training dataset's size should also double.

The researchers tested this idea by training a model called Chinchilla that is better at using computing resources. Chinchilla has 70 billion parameters and four times as much data as Gopher, but it uses the same computing budget. As a result, on a wide range of downstream evaluation tasks, Chinchilla always does better than Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG. As a highlight, Chinchilla's average accuracy on the MMLU benchmark is 67.5 per cent, which is more than 7 per cent better than Gopher's.

Paper: An empirical analysis of compute-optimal large language model training

Restoring and attributing ancient texts using deep neural networks

Ancient history on disciplines like epigraphy, which is the study of inscriptions. Epigraphy studies how people thought, spoke, lived, and wrote in the past. But over the years, many notes have been damaged to the point where We can't read them, moved far, and the date is a mystery.

Here, the researchers show Ithaca, a deep neural network for restoring the text, figuring out where The researchers made an ancient Greek inscription. Ithaca helps and expands the work of historians. Ithaca's architecture is built around collaboration, assisting people in making decisions, and being easy to understand. Ithaca can fix damaged texts with 62 per cent accuracy on its own, but when historians use Ithaca, their accuracy goes up from 25 per cent to 72 per cent, proving that this research tool works best when used with other tools. In addition, Ithaca can pinpoint the original location of inscriptions within 71 per cent of their ground-truth ranges and date them to within 30 years of their ground-truth ranges. This approach changes the dates of essential texts from Classical Athens and adds to debates about ancient history.

Furthermore, this research shows how models like Ithaca can unlock the potential for AI and historians to work together. This process could completely change how we study and write about one of the most critical times in human history.

Paper: Restoring and attributing ancient texts using deep neural networks

Want to publish your content?

Publish an article and share your insights to the world.

IndiaAI Recommends

Research Roundup - Image Captioning, language model training, and restoring ancient texts

Want to publish your content?

ALSO EXPLORE