Machine learning allows computers to replicate human behaviour by training them using historical and predicted data. This section will examine some fascinating machine learning algorithms, such as prefrontal cortex basal ganglia working memory, Proximal Policy Optimization, and Q-learning.

Prefrontal cortex basal ganglia working memory

The algorithm prefrontal cortex basal ganglia working memory (PBWM) represents working memory in the prefrontal cortex and the basal ganglia. Its functionality is comparable to long short-term memory (LSTM) but is more scientifically explainable. Similarly, it employs the primary value learnt value model to train the prefrontal cortex working-memory updating system. Furthermore, it was a component of the Leabra framework and Emergent in 2019.

People have thought for a long time that the prefrontal cortex was responsible for both "executive" functions and "working memory" (deciding how to manipulate working memory and perform the processing). However, even though there are many computational models of working memory, it is still hard to figure out how executive function works.

PBWM is a computer model of how the prefrontal cortex controls itself and other parts of the brain in a strategic way that fits the task at hand. The midbrain, basal ganglia, and amygdala have subcortical structures that work together to form an actor/critic architecture. The critic system trains the actor to learn which prefrontal representations are essential for the task. This approach gives the actor a dynamic gating mechanism for controlling the updating of working memory. The learning mechanism is to solve both the temporal and structural credit assignment problems simultaneously.

Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a series of algorithmic frameworks for model-free reinforcement learning. PPO algorithms are policy gradient methods that search the policy space instead of assigning values to state-action pairs. PPO algorithms have some of the same advantages as trust region policy optimization (TRPO) techniques but are easier to build, more generic, and lower sample complexity.

Recent developments in the field of reinforcement learning have led to the development of PPO, an improvement over TRPO. When OpenAI implemented this technique, it performed remarkably well despite being proposed in 2017. 

A policy is a mapping from action space to state-space in the language of reinforcement learning. To determine how effectively an agent operates while adhering to the specified policy, we evaluate the agent's policy function when we talk about assessing it. Here, the use of Policy Gradient techniques is crucial. When an agent is "learning," it computes the policy gradients since it doesn't know which actions produce the best outcome in the associated states. It functions similarly to a neural network design in that the output gradient, or the log of the probabilities of actions in that specific state, is taken into account by environmental parameters and the change in the policy based on the gradients.

Q-learning

Q-learning is an algorithm for learning the value of an action in a particular state that does not use a model. It doesn't need a model of the environment (which is why it's called "model-free"), and it can handle problems with random changes and rewards without having to be changed.

Q-learning finds the best policy for any finite Markov decision process (FMDP) by maximizing the expected value of the total reward overall steps, starting from the current state. Q-learning can find the best action-selection policy for any FMDP, even if the exploration time is unlimited and the policy is partly random. "Q" is the name of the function that the algorithm uses to determine the expected rewards for a given action in a given state.

Furthermore, reinforcement learning is a type of learning process in which an agent learns over time how to act best in a particular environment by interacting with it over time. The agent is exposed to many different things in its environment during its learning process. Over time, the learning agent learns how to get the most out of these rewards. Furthermore, Q-Learning is a simple form of reinforcement learning that uses Q-values, also called action values, to improve the behaviour of the learning agent over time.

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in