Meta has released Llama 3.1 405B, which is now available on the market. According to the company, with the release of the 405B model, they are poised to supercharge innovation—with unprecedented opportunities for growth and exploration. They believe that the latest generation of Llama will ignite new applications and modelling paradigms, including synthetic data generation to enable the improvement and training of smaller models, as well as model distillation—a capability that has never been achieved at this scale in open-source.

The company has introduced upgraded 8B and 70B models. These are multilingual and have a significantly longer context length of 128K, state-of-the-art tool use, and stronger reasoning capabilities. This enables their latest models to support advanced use cases, such as long-form text summarization, multilingual conversational agents, and coding assistants. They have also changed the license, allowing developers to use the outputs from Llama models—including the 405B—to improve other models. The company has made these models available to the community for download on llama.meta.com and Hugging Face and made them available for immediate development on their broad ecosystem of partner platforms.

Evaluating the model

Meta evaluated performance on over 150 benchmark datasets that span a wide range of languages. In addition, they performed extensive human evaluations that compared Llama 3.1 with competing models in real-world scenarios. Their experimental evaluation suggests the flagship model is competitive with leading foundation models across various tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. Additionally, their smaller models are competitive with closed and open models that have a similar number of parameters.

The company's major challenge was training the model on over 15 trillion tokens. To maximize training stability, the company opted for a standard decoder-only transformer model architecture with minor adaptations rather than a mixture-of-experts model. They adopted an iterative post-training procedure, where each round uses supervised fine-tuning and direct preference optimization. This enabled them to create the highest-quality synthetic data for each round and improve each capability's performance.

Compared with previous versions of Llama, the team improved the quantity and quality of the data they used for pre- and post-training. These improvements include the development of more careful pre-processing and curation pipelines for pre-training data, more rigorous quality assurance, and filtering approaches for post-training data.

Innovation through openness

Unlike closed models, Llama model weights are available to download. Developers can fully customize the models for their needs and applications, train on new datasets, and conduct additional fine-tuning. This enables the broader developer community and the world to fully realize the power of generative AI. Developers can fully customize for their applications and run in any environment, including on prem, in the cloud, or even locally on a laptop—all without sharing data with Meta.


Disclaimer: INDIAai has not tested the platform. This story is entirely based on the information shared by the company. 

Sources of Article

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE