Results for ""
In 2000, Marcus Hutter first had the concept for AIXI. Some of his research on AIXI is in his 2005 book Universal Artificial Intelligence.
Hutter asserts that there are various ways to interpret the term "AIXI". For example, AIXI could mean AI "crossed" (X) with induction or AI based on Solomonoff's distribution (I). Other interpretations exist.
AIXI is an agent for reinforcement learning. It maximizes the total expected rewards from the environment. It intuitively considers every computable hypothesis simultaneously (or environment). At each time step, it examines every possible program and determines how many rewards that program generates based on the following action taken.
The promised rewards are then weighted according to the subjective belief that this program represents the actual environment. This belief is calculated based on the length of the program: more extended programs are less likely, by Occam's razor. The action with the highest expected total reward in the weighted sum of all these programs is by AIXI.
Unknown Environments
The problematic aspect of reinforcement learning is that you have to learn the environment while attempting to collect rewards because, in most cases, we are unaware of the actual environment. The trade-off between exploration and exploitation is present here. Exploration is the act of trying new things and venturing into uncharted territory to find better reward sources.
However, there is a missed opportunity cost since the agent could have spent that time collecting rewards from known areas of the environment instead. However, if you explore too little, you might not pick up on critical environmental details and end up with fewer rewards. If you analyze too much, you spend too little time exploitation and end up with fewer rewards. What constitutes the ideal ratio of exploration to exploitation is unknown.
Optimality
The anticipated total number of rewards received by AIXI serves as a gauge of its performance. The following are some instances where AIXI is ideal.
AIXI, on the other hand, has some limitations. It is limited to maximizing rewards based on perceptions rather than external states. Furthermore, it believes that it can only communicate with its surroundings through action and perception, which prevents it from considering the possibility of being harmed or altered. This limitation indicates that it does not regard the environment with which it interacts as containing it. It also assumes that We can calculate the environment.
Conclusion
AIXI is incomputable, just like the Solomonoff induction. However, approximations of it that we can compute exist. The performance of AIXItl, one such approximation, is at least as good as that of the demonstrably best time- and space-limited agent. MC-AIXI (FAC-CTW), which stands for Monte Carlo AIXI FAC-Context-Tree Weighting, is another approximation to AIXI with a restricted environment class that has had some success with playing straightforward games like partially observable Pac-Man.