Peripheral vision simulation: Enhancing object detection in machine learning models

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Deep neural networks (DNNs) have demonstrated considerable potential as models of human visual perception, allowing for the prediction of both neural response patterns and aspects of visual task performance. However, there are still significant discrepancies in how computer vision DNNs handle information compared with humans. These distinctions are clear in psychophysical experiments and adversarial instances.

Deep neural networks and humans

One distinction between DNNs and humans that has sparked current interest is the presence of peripheral vision in humans. Peripheral vision refers to the mechanism by which human eyesight reflects the world with decreasing fidelity as eccentricities, or distances from the point of fixation, increase. Peripheral vision accounts for more than 99% of the human visual field. While peripheral vision is assumed to be a technique for dealing with capacity limitations caused by the size of the optic nerve and visual cortex, it has also been demonstrated to be an important predictor of human performance for a variety of visual activities.

Peripheral vision

Peripheral vision allows people to see shapes that are not directly in our line of sight, albeit with reduced detail. This skill broadens our field of vision and can be useful in a variety of scenarios, like detecting a vehicle approaching our vehicles from the side. Unlike humans, AI lacks peripheral vision. Equipping computer vision models with this capability may allow them to detect incoming risks more effectively or anticipate whether a human driver will notice an approaching thing.

Image collection

MIT researchers took a step in this direction by creating an image collection that may be used to imitate peripheral vision in machine learning algorithms. They discovered that training models with this dataset enhanced their capacity to recognize objects in the visual periphery, but the models still performed poorly compared to humans. Their findings also demonstrated that, unlike humans, the size of items and the quantity of visual clutter in a scene had no significant impact on the AI's performance.

Simulating peripheral vision

Extend your arm in front of you and raise your thumb; the little area around your thumbnail is visible to your fovea, a small dip in the center of your retina that gives the finest eyesight. Everything else you perceive is in your peripheral vision. Your visual brain represents a scene with decreasing detail and dependability as it goes away from the sharp point of focus.

Training multiple computer vision models

Many existing AI models of peripheral vision portray deteriorating detail by blurring image boundaries, but information loss in the optic nerve and visual brain is significantly more sophisticated. To achieve a more accurate result, the MIT researchers began with a technique used to represent peripheral vision in people. This method, known as the texture tiling model, modifies images to simulate a human's loss of visual information.

They adjusted this model so that it could change images similarly, but in a more flexible manner that does not require knowing ahead of time where the person or AI will look. The researchers utilized this modified technique to create a massive collection of changed photos that appear more textured in specific regions, resembling the loss of detail that occurs when a human stares further into the periphery. They then utilized the dataset to train multiple computer vision models and compare their performance to people on an object detection challenge.

Peculiar performance

Humans and models were shown identical pairs of modified images, with the exception of one image featuring a target object in the peripheral. Then, each participant was asked to select an image containing the target object. The researchers discovered that training models from scratch with their dataset resulted in the biggest performance gains, increasing their capacity to detect and distinguish things. Fine-tuning a model with their dataset, which entails adjusting a pretrained model to perform a new task, resulted in lesser performance gains.

However, the robots were never as good as humans, and they were particularly poor at recognizing items in the far periphery. Their performance did not follow the same trends as humans.

Conclusion

The researchers intend to continue investigating these disparities with the objective of developing a model that can predict human performance in the visual periphery. It could enable AI systems to alert drivers to hazards they may not be aware of, for example. They also hope that this publicly available dataset would inspire other academics to do further computer vision research.

Sources of Article

Image source: Unsplash