Microsoft and Nvidia created the world's largest, most powerful language model to date,...


Posts: 1,246   +24
Staff member
In context: The costs associated with AI model training have dropped more than 100 times between 2017 and 2019, yet they remain prohibitive for most startups to this day. This naturally favors large companies like Nvidia and Microsoft, who are using incredible amounts of engineering talent and money to create ever-larger and more capable AI models for use in natural language processing, enhancing search engine results, improving self-driving technology, and more. Scaling them up is the easy part -- quantifying and removing bias is a problem that has yet to be solved.

Nvidia and Microsoft on Monday revealed they have been working together on something called the “Megatron-Turing Natural Language Generation model.” The two companies claim they’ve created the world’s largest and most capable “monolithic transformer language model trained to date.”

To get an idea of how big this is, the famous GPT-3 that’s been making the news rounds for the past few years currently has 175 billion parameters. By comparison, the new MT-NLG model spans 105 layers and has no less than 530 billion parameters.

MT-NLG is the successor to the Turing NLG 17B and Megatron-LM models and was able to demonstrate “unmatched accuracy” in a variety of natural language tasks such as reading comprehension, common sense reasoning, completion prediction, word sense disambiguation, and natural language inferences.

Nvidia and Microsoft have been training this gargantuan AI model on a supercomputer called Selene. This is a system comprised of 560 Nvidia DGX A100 servers, each holding eight A100 GPUs equipped with 80 gigabytes of VRAM connected via NVLink and NVSwitch interfaces. Microsoft notes this configuration is similar to the reference architecture used in its Azure NDv4 cloud supercomputers.

Interestingly, Selene is also powered by AMD EPYC 7742 processors. According to the folks over at The Next Platform, Selene cost an estimated $85 million to build — $75 million if we assume typical volume discounts for data center equipment.

Microsoft says MT-NLG was trained on 15 datasets containing over 339 billion tokens. The datasets were taken from English-language web sources, such as academic journals, online communities like Wikipedia and Stack Exchange, code repositories like GitHub, news websites, and more. The largest dataset is called The Pile and weighs in at 835 gigabytes.

Dataset Dataset Source Tokens (billions) Weight (percentage) Epochs
Books3 Pile dataset 25.7 14.3 1.5
OpenWebText2 Pile dataset 14.8 19.3 3.6
Stack Exchange Pile dataset 11.6 5.7 1.4
PubMed Abstracts Pile dataset 4.4 2.9 1.8
Wikipedia Pile dataset 4.2 4.8 3.2
Gutenberg (PG-19) Pile dataset 2.7 0.9 0.9
BookCorpus2 Pile dataset 1.5 1.0 1.8
NIH ExPorter Pile dataset 0.3 0.2 1.8
Pile-CC Pile dataset 49.8 9.4 0.5
ArXiv Pile dataset 20.8 1.4 0.2
GitHub Pile dataset 24.3 1.6 0.2
CC-2020-50 Common Crawl (CC) snapshot 68.7 13.0 0.5
CC-2021-04 Common Crawl (CC) snapshot 82.6 15.7 0.5
RealNews RealNews 21.9 9.0 1.1
CC-Stories Common Crawl (CC) stories 5.3 0.9 0.5

Overall, the project revealed that larger AI models need less training to operate sufficiently well. However, the reoccurring problem that remains unsolved is that of bias. It turns out that even when using as much and diverse data from the real world as possible, giant language models pick up bias, stereotypes, and all manner of toxicity during the training process.

Curation can help to some extent, but it's been known for years that AI models tend to amplify the biases in the data that's being fed into them. That's because the data sets have been collected from a variety of online sources where physical, gender, race, and religious prejudices are quickly becoming a common occurrence. The biggest challenge in solving this is quantifying the bias, which is no small task and still very much a work in progress no matter how many resources are thrown at it.

Some of you may recall a previous Microsoft experiment where it unleashed a Twitter chatbot dubbed Tay. It only took a few hours for Tay to pick up all the worst traits that humans could possibly teach it, and the Redmond company had to take it down less than 24 hours after launch.

Nvidia and Microsoft both said they’re committed to addressing this issue and will do their best to support research in this direction. At the same time, they warn that organizations who want to use MT-NLG in production must make sure the proper measures are put in place to mitigate and minimize potential harm to users. Microsoft noted that any use of AI should follow the reliability, security, privacy, transparency, and accountability principles outlined in its "Responsible AI" guide.

Permalink to story.


Squid Surprise

Posts: 5,335   +4,984
Soon it will be emailing various NFL execs racist and homophobic comments... and getting fired from the Raiders...

Do you think Gruden could claim that it was this AI that sent his emails?


Posts: 8,408   +7,843
Maybe it isn't bias, maybe AI is telling us that political correctness is wrong
I think its picking up how humans habitually treat other humans. IMO, it does not say much for the current state of human affairs.
Some of you may recall a previous Microsoft experiment where it unleashed a Twitter chatbot dubbed Tay. It only took a few hours for Tay to pick up all the worst traits that humans could possibly teach it, and the Redmond company had to take it down less than 24 hours after launch.
Given the source of Tay's information, I cannot say that I am surprised.


Posts: 230   +127
They named Selene the hardware and Megatron the language model... :)
I think in that big models the queries (when they involve a lot of nodes) take hours to answered so I suppose that’s the reason why there isn’t any interactive demo page. They have to find a way to reduce that time because in practice you can’t have a conversation if you ask something today and you have to wait to receive the answer the next day.


Posts: 446   +768
Reminds me of a great 1970s cartoon:

Two scientists are in the "Artificial Intelligence Lab". One says "We're making progress... at least now when one computer makes a mistake, it blames the other computer".