GraphGen, Retrospective Loss and Large Scale Pretraining

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

These are the most exciting AI research papers published in the last year. It combines advances in artificial intelligence (AI) with data science. It is ordered chronologically and includes a link to a longer article.

Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks

Deep neural networks (DNNs) have facilitated advancements in various fields. To better utilise the prior knowledge accessible in previous model states during training, the authors offer a new retrospective loss in this study. Together with the task-specific loss, minimising the retrospective loss pulls the present parameter state away from the optimal parameter state and towards the optimal parameter state from a previous training session. To verify that the suggested loss leads to increased performance across input domains, tasks, and architectures, the researchers analyse the approach and conduct comprehensive experiments in various domains, including images, voice, text, and graphs.

Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining

Model-based dialogue assessment measures, such as ADEM, RUBER, and the more recent BERT-based metrics, are gaining attention. These models aim to weigh responses highly if they are relevant and negatively if not. These models would be trained with various useful and irrelevant replies in an ideal world. As a result, current models are typically trained using a single relevant response and numerous randomly picked responses from unrelated contexts (random negatives) due to the need for more publicly available data.

The researchers present the DailyDialog++ dataset to facilitate improved training and rigorous evaluation of model-based metrics. Using this dataset, the researchers first demonstrated that n-gram-based and embedding-based metrics do not distinguish critical responses from random negatives when there are several correct references. Unlike n-gram and embedding-based metrics, model-based metrics do better on random negatives but significantly worse on adversarial examples. This study proposes a new BERT-based evaluation metric called DEB, which is pre-trained on 727M Reddit interactions and then fine-tuned on our dataset to determine whether or not large-scale pretraining is beneficial. Compared to other models, DEB performs substantially better on random negatives (88.27% accuracy) and correlates more with human evaluations. However, when tested on adversarial replies, its performance drops significantly again, demonstrating that only a massive pre-trained evaluation model can withstand the adversarial cases in their dataset.

GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation

There is a wealth of research on generative graph models in the data mining books. Newer methods have shifted away from relying on a pre-decided distribution and instead, learn this distribution directly from the data. In contrast, older methods rely on generating structures that comply with a pre-decided distribution. Learning-based approaches have increased quality, but some difficulties remain.

The computational complexity of learning graph distributions makes them impractical for extensive graph databases.
Second, while many methods can learn the structure, they often need to pay more attention to the necessity of learning the node and edge labels that encode crucial semantic information and shape the system.
Third, current methods frequently use domain-specific rules that aren't applicable outside of that domain.
Fourth, existing technique testing must be revised because it relies on unreliable evaluation criteria or is limited to fake or tiny datasets.

To address these shortcomings, the authors of this study create a generic method they name "GraphGen." Using minimal DFS codes, GraphGen transforms graphs into sequences. Canonical labels, such as minimum DFS codes, record the graph structure in addition to the label information. A unique LSTM architecture is used to learn the intricate joint distributions of structural and semantic labels. Extensive studies on million-size, real-world graph datasets reveal that GraphGen is four times faster on average than state-of-the-art approaches and superior in quality across a comprehensive range of eleven different measures.

Sources of Article

Image source: Unsplash

IndiaAI Recommends

Research Roundup - GraphGen, Retrospective Loss and Large Scale Pretraining

Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks

Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining

GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation

Sources of Article

Want to publish your content?

ALSO EXPLORE