Artificial Intelligence: Composition of Experts (CoE) –A Breakthrough in Large Language Models?

By Dick Weisinger

In the dynamic realm of artificial intelligence, a new methodology called Composition of Experts (CoE) has emerged, that has the potential to reshape the landscape of Large Language Models (LLMs). CoE diverges from the conventional monolithic model, offering a modular and cost-efficient alternative.

CoE operates by assembling existing expert models into a cohesive whole. It achieves this through two critical steps: identifying the experts and constructing a router. Each expert excels in specific tasks, while the router dynamically selects the most suitable expert for a given query. It is like an orchestra where each instrument contributes a unique sound to create a harmonious composition. CoE assembles a similar ensemble of models, resulting in a powerful yet adaptable LLM.

The primary developer of CoE research is SambaNova, a company known for its hardware innovations. Their recent breakthrough, Samba-CoE-v0.1, demonstrates the potential of this approach. By ensembling five expert models—ranging from mathematics to common sense reasoning—Samba-CoE-v0.1 outperforms several benchmarks. Notably, it surpasses Mixtral 8x7B, Gemma-7B, Llama2-70B, Qwen-72B, and Falcon-180B across various tasks. Moreover, it achieves this feat at an inference cost equivalent to just two calls to 7-billion-parameter LLMs.

Beyond performance gains, CoE offers agility. Its modular design allows organizations to fine-tune specific components without retraining the entire model. The following areas are expected to improve as the technology develops over the future:

Scalability: Scaling CoE to even larger models could unlock unprecedented capabilities.
Robust Routing: Enhancing the router’s ability to handle diverse prompts and multi-turn conversations.
Broader Adoption: CoE’s adoption beyond SambaNova, as other players explore this methodology.

While SambaNova leads the CoE charge, other companies are likely to follow suit, but, the field remains open for innovation, fostering healthy competition.

Widespread adoption of CoE-based models will take time. As hardware advances and research matures, we can expect CoE to become more prevalent within the next few years. Could hybrid models emerge, blending CoE with other techniques? Perhaps CoE will inspire novel architectures that combine expert models and end-to-end training.

Training trillion-parameter models can cost over $100 million. CoE disrupts this by achieving comparable performance at approximately 1/10th the cost. Organizations can now explore cutting-edge AI without breaking the bank.

May 16th, 2024

Category: Artificial Intelligence

Leave a Reply Cancel reply

Legal Terms & Disclaimers

This blog site is accessed from the website of Formtek, Inc. All visitors to or users of this blog site are subject to the terms and conditions and privacy policy that govern the Formtek website, links for which are provided above.

Some of the individuals posting to this blog site, including the moderators, work for Formtek. Postings by these individuals are the personal opinions of these individuals, not of Formtek. Their posted content is provided for informational purposes only and is not meant to be an endorsement or representation by Formtek or any other party. Postings to this blog site may be outdated, invalid or inaccurate by the time you read them. Individuals posting to this blog site make no statements, representations or warranties as to the timing, validity, accuracy or reliability of their postings.

This blog site may contain links to third party sites. Access to any third party site linked to this blog site is at your own risk. None of Formtek, the blog site moderator(s) and the individuals posting on this blog site that work for Formtek is responsible for the timing, validity, accuracy or reliability of any information, data, opinions, advice or statements made on these third party sites. These links are provided merely as a convenience and do not imply any endorsement.

Postings to this blog site are available to the public. You should not post, link to or otherwise upload any information considered confidential to this blog site. All postings to this blog site are moderated. Postings will appear if and when they are approved by the moderator. Notwithstanding any approval by the moderator, by posting information to this blog site, you agree to be solely responsible for the information you post, link to, or otherwise upload to the blog site. You agree to release Formtek from any liability related to that information or to your use of the blog site. You grant Formtek a worldwide, perpetual, irrevocable, royalty-free, fully-paid, and transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any information you post, link to or otherwise upload to this blog site.

Artificial Intelligence: Composition of Experts (CoE) –A Breakthrough in Large Language Models?

Leave a Reply Cancel reply

Company

Products and Services

News

Resources

Legal Terms & Disclaimers