Microsoft Unveils Trio of Cutting-Edge AI Models in Phi Series

In a bold move to solidify its position as a leader in artificial intelligence, Microsoft has announced the release of three advanced models in its Phi series of language and multimodal AI. This latest development comes at a time when the tech giant, often referred to as Redmond, is not merely content with the success of its partnership with OpenAI but is aggressively pushing the boundaries of AI innovation on its own.

The new models, collectively known as Phi 3.5, include the Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct. Each of these models is designed to cater to specific needs within the AI landscape, from basic reasoning tasks to complex multimodal applications. The models are available for developers to download, use, and customize on Hugging Face under a Microsoft-branded MIT License, which permits commercial usage and modification without restrictions.

The Phi-3.5 Mini Instruct model, with its 3.82 billion parameters, is engineered for instruction adherence while supporting a 128k token context length. This lightweight model is ideal for scenarios requiring strong reasoning capabilities in memory- or compute-constrained environments. Tasks such as code generation, mathematical problem-solving, and logic-based reasoning are well within its wheelhouse. Despite its compact size, the Phi-3.5 Mini Instruct model demonstrates competitive performance in multilingual and multi-turn conversational tasks, reflecting significant improvements from its predecessors.

Notably, the Mini Instruct model boasts near-state-of-the-art performance on various benchmarks. It outperforms other similarly-sized models, such as Llama-3.1-8B-instruct and Mistral-7B-instruct, on the RepoQA benchmark, which measures "long context code understanding."

The Phi-3.5 MoE (Mixture of Experts) model is a pioneering effort by Microsoft in this class, combining multiple specialized model types into one. This model leverages an architecture with 42 billion active parameters and supports a 128k token context length, providing scalable AI performance for demanding applications. However, it operates with only 6.6 billion active parameters, according to documentation.

Designed to excel in various reasoning tasks, the Phi-3.5 MoE offers strong performance in code, math, and multilingual language understanding. It often outperforms larger models in specific benchmarks, including RepoQA. Impressively, it also surpasses GPT-4o mini on the 5-shot MMLU (Massive Multitask Language Understanding) across subjects such as STEM, the humanities, and the social sciences.

The MoE model’s unique architecture allows it to maintain efficiency while handling complex AI tasks across multiple languages, making it a versatile tool for developers.

Rounding out the trio is the Phi-3.5 Vision Instruct model, which integrates both text and image processing capabilities. This multimodal model is particularly suited for tasks such as general image understanding, optical character recognition, chart and table comprehension, and video summarization. Like the other models in the Phi-3.5 series, Vision Instruct supports a 128k token context length, enabling it to manage complex, multi-frame visual tasks.

Microsoft highlights that this model was trained with a combination of synthetic and filtered publicly available datasets, focusing on high-quality, reasoning-dense data. This ensures that the Vision Instruct model can handle a wide range of visual and textual data with remarkable accuracy.

The training process for these models was no small feat. The Phi-3.5 Mini Instruct model was trained on 3.4 trillion tokens using 512 H100-80G GPUs over 10 days. The Vision Instruct model, on the other hand, was trained on 500 billion tokens using 256 A100-80G GPUs over 6 days. The Phi-3.5 MoE model, featuring a mixture-of-experts architecture, was trained on 4.9 trillion tokens with 512 H100-80G GPUs over 23 days.

All three Phi-3.5 models are available under the MIT license, reflecting Microsoft’s commitment to supporting the open-source community. This license allows developers to freely use, modify, merge, publish, distribute, sublicense, or sell copies of the software. The license also includes a disclaimer that the software is provided “as is,” without warranties of any kind. Microsoft and other copyright holders are not liable for any claims, damages, or other liabilities that may arise from the software’s use.

Microsoft’s release of the Phi-3.5 series represents a significant step forward in the development of multilingual and multimodal AI. By offering these models under an open-source license, Microsoft empowers developers to integrate cutting-edge AI capabilities into their applications, fostering innovation across both commercial and research domains.

The tech community has responded positively, with many praising Microsoft for its transparency and commitment to advancing AI technology. This move not only enhances Microsoft's reputation as a leader in AI but also sets a high bar for other tech giants to follow.

Microsoft’s Phi-3.5 series is a testament to the company’s relentless pursuit of AI excellence. With these new models, Microsoft is not just keeping pace with the competition but is setting new standards in the industry. Whether you’re a developer looking to leverage advanced AI capabilities or an organization aiming to integrate sophisticated AI solutions, the Phi-3.5 series offers a robust and versatile toolkit to meet your needs.