Artificial Intelligence: The Race to Build Mammoth Natural Language Models

By Dick Weisinger

OpenAI wowed the AI community with its release of GPT-3, a language model that used 175 billion parameters to tune responses for writing short essays, computer code, and dialogue. 175 billion is a lot and at the time seemed like a feat that wuold be difficult for other researchers to achieve. But scratch that idea as other researchers one-up and leap frog GPT-3’s accomplishments.

In December 2021, DeepMind introduced a language model called Gopher that uses 280 billion parameters. Then, Google introduced an AI language model that same month that was built on 1.6 trillion parameters, dwarfing the size of the model used by GPT-3. The bigger models are proving that, in general, bigger is better, in terms of the range and capabilities of the model. But costly hardware, long training times, and enormous volumes of data are also required to achieve these kind of results. There are few entities, mostly large tech companies, that are able to participate.

From NVIDIA: How state-of-the-art NLP models are growing in size over time

Paresh Kharya, product manager at NVIDIA, and Ali Alvi, program manager at Microsoft, wrote on the NVIDIA blog, that “we live in a time where AI advancements are far outpacing Moore’s law. We continue to see more computation power being made available with newer generations of GPUs, interconnected at lightning speeds. At the same time, we continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight.”

Julien Simon, chief evangelist at HuggingFace, wrote that “large language model size has been increasing 10x every year for the last few years. This is starting to look like another Moore’s Law.” But Simon questioned the utility of such enormous models, saying that “instead of chasing trillion-parameter models (place your bets), wouldn’t we all be better off if we built practical and efficient solutions that all developers can use to solve real-world problems?”

Ziv Gidron points out that “although over one trillion parameters can give the impression that this language model was trained on all the data available online, it cannot automatically update itself – meaning Google’s new model only as good as the data it was fed (plentiful as it may be). One variable, one new product, an update of service, or a change in content can topple the entire house of cards and mislead or misguide a user seeking to accomplish a task. Furthermore, these models are not accommodative to minute iterations and judging by GPT-3’s pricing model, any such adjustments will come attached with considerable costs.”

October 10th, 2022

Category: Artificial Intelligence, Machine Learning

Leave a Reply Cancel reply

Legal Terms & Disclaimers

This blog site is accessed from the website of Formtek, Inc. All visitors to or users of this blog site are subject to the terms and conditions and privacy policy that govern the Formtek website, links for which are provided above.

Some of the individuals posting to this blog site, including the moderators, work for Formtek. Postings by these individuals are the personal opinions of these individuals, not of Formtek. Their posted content is provided for informational purposes only and is not meant to be an endorsement or representation by Formtek or any other party. Postings to this blog site may be outdated, invalid or inaccurate by the time you read them. Individuals posting to this blog site make no statements, representations or warranties as to the timing, validity, accuracy or reliability of their postings.

This blog site may contain links to third party sites. Access to any third party site linked to this blog site is at your own risk. None of Formtek, the blog site moderator(s) and the individuals posting on this blog site that work for Formtek is responsible for the timing, validity, accuracy or reliability of any information, data, opinions, advice or statements made on these third party sites. These links are provided merely as a convenience and do not imply any endorsement.

Postings to this blog site are available to the public. You should not post, link to or otherwise upload any information considered confidential to this blog site. All postings to this blog site are moderated. Postings will appear if and when they are approved by the moderator. Notwithstanding any approval by the moderator, by posting information to this blog site, you agree to be solely responsible for the information you post, link to, or otherwise upload to the blog site. You agree to release Formtek from any liability related to that information or to your use of the blog site. You grant Formtek a worldwide, perpetual, irrevocable, royalty-free, fully-paid, and transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any information you post, link to or otherwise upload to this blog site.

Artificial Intelligence: The Race to Build Mammoth Natural Language Models

Leave a Reply Cancel reply

Company

Products and Services

News

Resources

Legal Terms & Disclaimers