Here are the most important AI research articles. It is a hand-picked list of the most recent advances in AI and data science, with a link to a more in-depth report.

ADOP: Approximate Differentiable One-Pixel Point Rendering

In this paper, the researchers talk about ADOP, a new point-based neural rendering pipeline that can show differences. Like other neural renderers, their system uses calibrated camera images and a stand-in for the scene's geometry, a point cloud. The point cloud is with learned feature vectors such as colours, and a deep neural network fills in the gaps and shades each output pixel. The rasterizer draws points as splotches of one pixel. This approach makes it very fast and lets us quickly calculate gradients for all relevant input parameters. In the same way inverse rendering works, the researchers use their renderer to improve what it gets as input to reduce errors and improve the quality of what it gives.

In particular, the researchers can optimize structural parameters like the camera pose, lens distortions, point positions and features, and a neural environment map. They can also optimize photometric parameters like the camera response function, vignetting, per-image exposure and white balance. Because our pipeline includes photometric parameters like exposure and camera response function, their system can handle images with different exposure and white balance and produce high-dynamic range output.

The researchers show they improved input. They can get high-quality renders even when the input is complex, like when the camera calibration is off, the proxy geometry is wrong, or the exposure changes. So, to reconstruct, all you need is a deep neural network that is simpler and, therefore, faster. With the fast point rasterization, ADOP can render even models with more than 100 million points in real-time.

Paper: ADOP: Approximate Differentiable One-Pixel Point Rendering

Click here for the code

The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks

The goal of the cocktail party problem is to isolate any source of interest in a complex acoustic scene. This problem has long been a source of inspiration for research into audio source separation. Recent work has been chiefly about separating speech from noise, speech from speech, musical instruments from each other, or sound events from each other. But they are separating an audio mix (like a movie soundtrack) into speech, music, and sound effects (which includes ambient noise and natural sound events).

This paper calls this task the "cocktail fork problem" and gives the Divide and Remaster (DnR) dataset to encourage research on this topic. DnR is from three well-known audio datasets (LibriSpeech, FMA, and FSD50k), with care taken to ensure the source overlap and relative loudness are the same as in professionally produced content. It is then made available at CD quality. The researchers test standard source separation algorithms on DnR and then introduce a new multi-resolution model to better deal with the different ways the three types of sources sound. Their best model gives SI-SDR improvements of 11.0 dB for music, 11.2 dB for speech, and 10.8 dB for sound effects compared to the mix.

Paper: The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks

Click here for the code.

(Style)CLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis

This paper describes CLIPDraw, an algorithm that makes new drawings from natural language input. CLIPDraw doesn't need training. Instead, a CLIP language-image encoder is a way to measure how similar a description and a drawing are to each other. Importantly, CLIPDraw works with vector strokes instead of pixel images. This approach makes it more likely that pictures will have simple shapes that people can recognize. Results compare CLIPDraw to other synthesis-through-optimization methods and show some interesting things about CLIPDraw. For example, it can handle ambiguous text in more than one way, reliably make drawings in different styles, and scale from simple to complex visual representations as the number of strokes increases.

With the release of technologies like the CLIP image-text encoder model, machine learning has made it much easier to create images that fit a given text description. However, current methods don't give artists enough control over image style. The researchers came up with StyleCLIPDraw, which adds a style loss to the CLIPDraw text-to-drawing synthesis model so that we can control the drawings by text and style. When decoupled style transfer is on a generated image, it only changes the texture. On the other hand, their proposed coupled approach can capture a style in texture and shape. This research suggests that the drawing style is to the process of drawing.

Paper (CLIPDraw): CLIPDraw: exploring text-to-drawing synthesis through language-image encoders

Paper (StyleCLIPDraw): StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis

CLIPDraw Colab demo

StyleCLIPDraw Colab demo

Image source: Pixabay

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in