Access and Feeds

Artificial Intelligence: The Race to Build Mammoth Natural Language Models

By Dick Weisinger

OpenAI wowed the AI community with its release of GPT-3, a language model that used 175 billion parameters to tune responses for writing short essays, computer code, and dialogue. 175 billion is a lot and at the time seemed like a feat that wuold be difficult for other researchers to achieve. But scratch that idea as other researchers one-up and leap frog GPT-3’s accomplishments.

In December 2021, DeepMind introduced a language model called Gopher that uses 280 billion parameters. Then, Google introduced an AI language model that same month that was built on 1.6 trillion parameters, dwarfing the size of the model used by GPT-3. The bigger models are proving that, in general, bigger is better, in terms of the range and capabilities of the model. But costly hardware, long training times, and enormous volumes of data are also required to achieve these kind of results. There are few entities, mostly large tech companies, that are able to participate.

Paresh Kharya, product manager at NVIDIA, and Ali Alvi, program manager at Microsoft, wrote on the NVIDIA blog, that “we live in a time where AI advancements are far outpacing Moore’s law. We continue to see more computation power being made available with newer generations of GPUs, interconnected at lightning speeds. At the same time, we continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight.”

Julien Simon, chief evangelist at HuggingFace, wrote that “large language model size has been increasing 10x every year for the last few years. This is starting to look like another Moore’s Law.” But Simon questioned the utility of such enormous models, saying that “instead of chasing trillion-parameter models (place your bets), wouldn’t we all be better off if we built practical and efficient solutions that all developers can use to solve real-world problems?”

Ziv Gidron points out that “although over one trillion parameters can give the impression that this language model was trained on all the data available online, it cannot automatically update itself – meaning Google’s new model only as good as the data it was fed (plentiful as it may be). One variable, one new product, an update of service, or a change in content can topple the entire house of cards and mislead or misguide a user seeking to accomplish a task. Furthermore, these models are not accommodative to minute iterations and judging by GPT-3’s pricing model, any such adjustments will come attached with considerable costs.”

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*