English-Hindi code-mixing, language models and dialogue state tracking

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

These are the year's most intriguing AI research publications. It integrates advances in artificial intelligence (AI) and data science. It is organized chronologically and contains a link to a lengthier article.

SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing

Code mixing is a linguistic term for when people who speak more than one language tend to switch between them in speech. NLP models have been trained with code-mixed texts from social media as part of recent work on code-mixing in computer settings. In addition, language ID (LID) tags-based measures (CMI) have been made to capture the different kinds of code mixing in and across corpora.

In this work, the researchers look at a set of English-Hindi code-mixed datasets from a syntactic point of view to suggest SyMCoM, an indicator of syntactic variety in code-mixed text with intuitive theoretical bounds. The researchers train the SoTA en-hi PoS tagger, which has an accuracy of 93.4%, to compute PoS tags on a corpus accurately. They also show the usefulness of SyMCoM by applying it to different syntactic categories in a set of datasets and using the measure to compare the datasets.

The Inefficiency of Language Models in Scholarly Retrieval: An Experimental Walk-through

Language models are becoming more common in AI-powered scientific IR systems. This research compares the performance of popular scientific language models in dealing with (a) short-query texts and (b) textual neighbours. Even under the most relaxed conditions, their trials indicate an inability to obtain relevant documents for a short-query text.

Furthermore, the researchers use textual neighbours generated by slight changes to the original text to show that not all changes result in close neighbours in the embedding space. Furthermore, a thorough categorization produces numerous orthographically and semantically linked, somewhat related, and wholly unconnected neighbours. The surface form of the text has a more significant influence on retrieval performance than the semantics of the text.

Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances

Dialogue State Tracking (DST) is judged mainly by its Joint Goal Accuracy (JGA), which is the number of turns where the actual dialogue state matches the prediction correctly. Most of the time, in DST, the conversation or belief state for a given turn shows all of the user's intentions up to that point. However, because the opinion state builds up over time, it is hard to make a correct prediction after a wrong one. So, while this is a valuable metric, it can sometimes be too harsh and understate a DST model's real potential. Also, because annotations are occasionally different, making JGA better can hurt the success of turn-level or non-cumulative belief state prediction. So, using JGA as the only way to choose a model might not be the best idea in all cases.

In this paper, researchers discuss the strengths and weaknesses of the different evaluation methods used for DST. Then, they suggest a new evaluation metric called Flexible Goal Accuracy (FGA) to deal with existing problems. FGA is a more general form of JGA. But unlike JGA, it tries to punish wrong guesses that are locally right, which means that the mistake was caused by a turn before. By doing this, FGA can look at the success of both cumulative and turn-level prediction flexibly and get a better idea of how things are going than the metrics already in place. Researchers also show that FGA is a better way to tell how well a DST model works.

Sources of Article

Image source: Unsplash

IndiaAI Recommends

Research Roundup - English-Hindi code-mixing, language models and dialogue state tracking

Sources of Article

Want to publish your content?

ALSO EXPLORE