For software developers, finding and fixing defects in code is a time-consuming and challenging task. Can deep learning solve this issue and speed up software development?

The researchers demonstrate a potential deep learning model named BugLab in the article, "Self-Supervised Bug Detection and Repair". Through a "hide and seek" game, BugLab is used to find and fix defects without labelled data.

Objective

The purpose of this work is to improve artificial intelligence(AI) so that it can automatically detect and repair flaws like the two pictured above, which appear to be trivial but are frequently challenging to locate. Eliminating this task allows engineers to focus on more crucial (and interesting) software development. However, finding flaws – even seemingly minor ones – is difficult, as most pieces of code do not include a formal explanation of their intended behaviour. Training machines to automatically recognise flaws is exacerbated by a lack of training data. While large volumes of source code are freely available via sites such as GitHub, only a few small datasets containing specifically documented problems exist.

How Does BugLab Work?

To address this issue, the researchers suggest BugLab, which employs two competing models that learn through a "hide and seek" game modelled after generative adversarial networks (GAN). Given some presumptively valid current code, a bug selector model determines whether to introduce a bug, where to submit it and how it should be presented (e.g., replace a specified "+" with a "-"). The code is changed to make the problem happen based on how the selector is chosen. Then, the bug detector attempts to assess whether a bug is present in the code. And, if so, to locate and fix it.

These two models are jointly trained across millions of code snippets using unlabeled data, i.e., self-supervised learning. The bug selector attempts to "hide" intriguing defects within each code snippet, while the detector attempts to outperform the selector by locating and repairing them. The bug detector improves its ability to find and correct defects due to this process, while the bug selector learns to generate progressively difficult training samples.

This approach is conceptually comparable to that of GANs. However, our bug selection does not generate new code but rewrites existing code (assumed to be correct). Additionally, code rewrites are – by definition – discontinuous. Thus one cannot transfer gradients between detector and selector. Contrary to GANs, we are more interested in obtaining a good bug detector (similar to a GAN's discriminator) than getting a good selection (identical to a GAN's generator). 

Conclusions

Developing deep learning models capable of detecting and repairing flaws is critical in AI research. A solution involves human-level comprehension of programme code and contextual signals like variable names and comments. The BugLab research demonstrates that by training two models simultaneously to play a hide-and-seek game, one may educate computers to be suitable bug detectors. However, much more effort is required to make such bug detectors dependable for practical usage.


Sources of Article

https://www.microsoft.com/en-us/research/publication/self-supervised-bug-detection-and-repair-2/

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE