The most significant AI research articles are listed here. In addition, it includes a link to a more in-depth analysis and is a hand-picked list of the most recent developments in AI and data science.

EditGAN: High-Precision Semantic Image Editing

Generative adversarial networks, or GANs, are now being used to edit images. But most GAN-based image editing methods need large datasets with semantic segmentation annotations for training, only offer high-level control, or interpolate between different images. Here, the researchers propose EditGAN, a new method for high-quality, high-precision semantic image editing that lets users edit images by changing their highly detailed part segmentation masks, like drawing a new mask for a car's headlight. EditGAN is on a GAN framework that models images and their semantic segmentations. It only needs a few labelled examples, which makes it a tool for editing that we can use on a large scale.

In particular, the researchers put an image into the GAN's "latent space" and optimized the "latent code" based on the segmentation edit, which changes the image. To do this, they look for "editing vectors" in latent space that makes the edits happen. The framework lets us learn any number of editing vectors, which we can use directly on other images at interactive rates. Experiments show that EditGAN can change images with a level of detail and freedom that has never been before while keeping the whole image quality. We can also easily combine different edits and make plausible edits that aren't part of EditGAN's training data. Furthermore, the researchers show how EditGAN works on many different kinds of images and is better than several other editing methods on standard benchmark tasks for editing.

Paper: EditGAN: High-Precision Semantic Image Editing

Click here for the code 

SwinIR: Image restoration using swin transformer

Image restoration is a low-level vision problem that has been around for a long time. Its goal is to make low-quality images look better (e.g., downscaled, noisy and compressed images). Even though the most advanced image restoration methods on convolutional neural networks, there haven't been many attempts to use Transformers that do well on high-level vision tasks.

This paper suggests a robust image restoration model called SwinIR on the Swin Transformer. SwinIR consists of three parts: the extraction of shallow features, in-depth features, and the reconstruction of high-quality images. In particular, the deep feature extraction module is of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers and a residual connection. The researchers run tests on three critical tasks:

  • Image super-resolution (including classical, lightweight, and real-world image super-resolution)
  • Image denoising (including grayscale and colour image denoising)
  • JPEG compression artifact reduction

Paper: SwinIR: Image restoration using swin transformer

Click here for the code

CityNeRF: Building NeRF at City Scale

Neural radiance fields (NeRF) have done a great job modelling 3D objects and controlled scenes, usually at a single scale. In this work, the researchers focus on cases with multiple scales where significant changes can be seen in images of very different sizes. This situation happens a lot in real-world 3D environments, like city scenes, with views ranging from satellite level, which shows the big picture of a city, to ground level, which shows the intricate details of a building. It also happens a lot in landscapes and exemplary Minecraft 3D models. Because there are so many different ways to look at these scenes, they have different levels of detail at different scales. 

This approach makes things very hard for the neural radiance field and makes it more likely to get bad results. To solve these problems, the researchers develop BungeeNeRF, a progressive neural radiance field that can show different levels of detail at very different scales. As training continues, new blocks will fit the growing information in a closer view. At first, a shallow base block was used to fit the distant views. As the training continues, the strategy gradually turns on high-frequency channels in NeRF's positional encoding inputs and reveals more intricate details. Furthermore, they show that it can support high-quality rendering at different levels of detail.

Paper: CityNeRF: Building NeRF at City Scale

Click here for the code

Sources of Article

Image source: Pardeep Bhakar on Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE