Results for ""
Have you ever encountered the word Computer Vision (CV)? Well, it is a computer’s ability to make inferences about a scene.
Now, imagine a transparent glass bowl filled with water on the dining table and a fork leaning against it. As a human, we can easily infer all these things.
Now think of a machine, it can either perceive water in the bowl as a floating plate or fork penetrating the bowl which is actually the case of refraction. The wrong inference in this particular instance may not be a costly affair. On the contrary, what if the machine makes a wrong inference in the case of an autonomous vehicle? – it can be disastrous and can lead to fatal accidents.
Hence, the field calls for a cross-checking mechanism.
To that end, researchers from MIT introduced a new system 3D Scene Perception via Probabilistic Programming (3DP3). The researchers built the framework using probabilistic programming, an AI approach that enables the system to cross-check detected objects against input data.
It further helps to see if the images recorded from a camera are a likely match to any candidate scene. The Probabilistic inference allows the system to infer whether mismatches are likely due to noise or to errors in the scene interpretation that need to be corrected by further processing.
You might have seen an image flagged as objectionable on a social media platform (image classification), identification of faulty machinery in production houses (object detection), or an autonomous vehicle tracking a pedestrian movement and applying brakes (object tracking) - all these are examples of CV. To be precise, it is a field of artificial intelligence that allows computers and systems to extract useful information from digital photos, videos, and other visual inputs, as well as to undertake actions or make recommendations based on that data.
“If artificial intelligence allows computers to think, computer vision allows them to see, watch, and interpret”
As humans, we can identify and classify objects in an image, we can determine how far an object is and can track its motion in a scene. Similarly, the CV trains machines to perform all these functions with the help of cameras, algorithms and data. As humans, we can also infer what is logical or realistic and are capable to point any error with an image. But machines may fail at this, but why? It is largely due to the lack of common sense in machines.
The newly developed system equips machines to infer anything wrong in an image. This common-sense protection enables the system to detect and fix many of the issues that plague "deep-learning" algorithms used in computer vision. Probabilistic programming also allows you to infer probable contact relationships between objects in a scene and utilise common sense reasoning to predict more precise positions for those things.
“If you don’t know about the contact relationships, then you could say that an object is floating above the table — that would be a valid explanation. As humans, it is obvious to us that this is physically unrealistic and the object resting on top of the table is a more likely pose of the object. Because our reasoning system is aware of this sort of knowledge, it can infer more accurate poses. That is a key insight of this work,” says lead author Nishad Gothoskar, an electrical engineering and computer science (EECS) PhD student with the Probabilistic Computing Project.
This research could improve the effectiveness of computer perception systems that must comprehend complex arrangements of things, such as a robot cleaning a cluttered kitchen, in addition to boosting the safety of self-driving cars.