Artificial Intelligence (AI) technology was created in the shadows of human intelligence. Computer scientists have designed and developed complex models and technologies to replicate vision, language, reasoning, motor skills, etc. These advancements have ensured that AI can effectively solve certain problems in controlled environments. However, when exposed to general awareness, these technologies fall short of being effective to exhibit intelligence as seen in humans and animals.

Researchers at the UK-based, Alphabet-owned AI lab DeepMind have submitted a new paper titled “Reward is Enough” to the peer-reviewed Artificial Intelligence Journal. In the paper, which is still in the pre-proof stage, researchers draw parallels between the evolution of natural intelligence and achievements in AI. The researchers propose that instead of developing newer technology, reward maximization and trial-and-error experience are enough to develop artificial general intelligence. This breakthrough would allow for instantaneous calculation and perfect memory, leading to artificial intelligence that would outperform humans at nearly every cognitive task.

Currently, scientists are focusing on narrow AI, ie, designing systems and models to accomplish specific tasks. They further believe that combining these complex technologies will produce a higher intelligence system. For example, different AI modules such as computer vision, natural language processors, voice processing, etc together can come together to become a supercomputer that can solve multitudes of problems. 

However, researchers of Deep Mind propose a different, simpler approach. “[We] consider an alternative hypothesis: that the generic objective of maximising reward is enough to drive behaviour that exhibits most if not all abilities that are studied in natural and artificial intelligence,” the researchers write. They propose aping the ultimate approach, "letting nature, in this case, technology, take its own course. 

“The natural world faced by animals and humans, and presumably also the environments faced in the future by artificial agents, are inherently so complex that they require sophisticated abilities in order to succeed (for example, to survive) within those environments,” the researchers write. “Thus, success, as measured by maximising reward, demands a variety of abilities associated with intelligence. In such environments, any behaviour that maximises reward must necessarily exhibit those abilities. In this sense, the generic objective of reward maximization contains within it many or possibly even all the goals of intelligence.”

For example, a squirrel seeks to minimize hunger. On one hand, its abilities help it locate and collect food. But, only searching for food will lead to the squirrel dying if there is a scarcity of food. Therefore, to truly minimize hunger, the squirrel has the skills and memory to cache the nuts and restore them in winter. If you zoom out, hunger minimization can be a subgoal of “staying alive,” which also requires skills such as detecting and hiding from dangerous animals, protecting oneself from environmental threats, and seeking better habitats with seasonal changes.

“When abilities associated with intelligence arise as solutions to a singular goal of reward maximisation, this may in fact provide a deeper understanding since it explains why such an ability arises,” the researchers write. “In contrast, when each ability is understood as the solution to its own specialised goal, the why question is side-stepped in order to focus upon what that ability does.”

Finally, the researchers argue that the “most general and scalable” way to maximize reward is through agents that learn through interaction with the environment.

In the paper, the AI researchers provide some high-level examples of how “intelligence and associated abilities will implicitly arise in the service of maximising one of many possible reward signals, corresponding to the many pragmatic goals towards which natural or artificial intelligence may be directed.”

Rewards and environments shape and promote knowledge in animals that comes to them naturally. For example, species such as deers and monkeys naturally have specific knowledge to run away from certain threats such as cheetahs and lions, since birth. They also get rewarded through specific knowledge such as which food to each and where to find shelter, etc. 

Here, they draw an analogy between natural intelligence and AGI: “An animal’s stream of experience is sufficiently rich and varied that it may demand a flexible ability to achieve a vast variety of subgoals (such as foraging, fighting, or fleeing), in order to succeed in maximising its overall reward (such as hunger or reproduction). Similarly, if an artificial agent’s stream of experience is sufficiently rich, then many goals (such as battery life or survival) may implicitly require the ability to achieve an equally wide variety of subgoals, and the maximisation of reward should therefore be enough to yield an artificial general intelligence.”

The researchers, in the paper, suggest reinforcement learning is the main algorithm to replicate reward maximization as seen in nature and can eventually lead to artificial general intelligence.

“If an agent can continually adjust its behaviour so as to improve its cumulative reward, then any abilities that are repeatedly demanded by its environment must ultimately be produced in the agent’s behaviour,” the researchers write, adding that, in the course of maximizing for its reward, a good reinforcement learning agent could eventually learn perception, language, social intelligence and so forth.

However, researchers stress that there will be fundamental challenges since reinforcement learning required humongous amounts of data. But AI researchers are still struggling to realise how to create reinforcement learning systems that can generalize their learnings across several domains. Therefore, slight changes to the environment often require the full retraining of the model.

While several scientists appreciate the researchers' proposal that could lead to AI's evolution, they say optimization and reinforcement can not be enough and newer paths need to be charted to truly achieve AGI.  

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in