Microsoft Research has released Magentic-One, a generalist multi-agent AI system that can solve open-ended tasks across various domains. Magentic-One employs a multi-agent architecture where a lead agent, the Orchestrator, directs four other agents to solve tasks. The Orchestrator plans, tracks progress, and re-plans to recover from errors while directing specialized agents to perform tasks like operating a web browser, navigating local files, or writing and executing Python code.

According to Microsoft, Magentic-One achieves statistically competitive performance to the state-of-the-art on multiple challenging agentic benchmarks without requiring modifications to its core capabilities or architecture. They have made Magnetic-One open-source for researchers and developers. While Magentic-One shows strong generalist capabilities, it's still far from human-level performance and can make mistakes. Moreover, as agentic systems grow more powerful, their risks—like taking undesirable actions or enabling malicious use cases—can also increase. 

Working on the model

Magentic-One's agents provide the Orchestrator with the tools and capabilities needed to solve a wide variety of open-ended problems and the ability to autonomously adapt to and act in dynamic and ever-changing web and file-system environments. The model consists of the following agents:

  • Orchestrator: the lead agent responsible for task decomposition and planning, directing other agents in executing subtasks, tracking overall progress, and taking corrective actions as needed.
  • WebSurfer: This is an LLM-based agent proficient in commanding and managing the state of a Chromium-based web browser. With each incoming request, the WebSurfer acts as the browser and then reports on the new state of the web page.  The action space of the WebSurfer includes navigation (e.g. visiting a URL, performing a web search), web page actions (e.g., clicking and typing), and reading actions (e.g., summarizing or answering questions). The WebSurfer relies on the browser's accessibility tree and marks that prompt it to perform its actions.
  • FileSurfer: This is an LLM-based agent that commands a markdown-based file preview application to read local files of most types. The FileSurfer can also perform everyday navigation tasks, such as listing the contents of directories and navigating a folder structure.
  • Coder: This is an LLM-based agent specialized through its system prompt for writing code, analyzing information collected from the other agents, or creating new artefacts.
  • ComputerTerminal: Finally, ComputerTerminal provides the team access to a console shell where the Coder's programs can be executed, and new programming libraries can be installed.

Based on Microsoft's evaluation, Magentic-One has been established as a strong generalist agentic system for completing complex tasks.

Risks and Mitigation

Magentic-One interacts with a digital world designed for and inhabited by humans. It can take actions that change the state of the world and result in consequences that might be irreversible. This carries inherent and undeniable risks, and we observed examples of emerging risks during our testing.

Microsoft recommends using Magentic-One with models that have strong alignment and pre- and post-generation filtering and closely monitoring logs during and after execution. 

Sources of Article

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE