Laptops keep getting thinner and lighter, but this new business device from Fujitsu takes the crown for the lightest laptop in the world. The Lifebook WU-X/G2, currently only available in Japan, weighs just 634 grams. That’s less than 1.4 pounds.
As TechRadar points, out that’s less than a 12.9-pound iPad Pro, which weighs just 682 grams.
Unlike the iPad Pro, this laptop has a larger screen, keyboard, and trackpad for its weight, with dimensions of 12.1 by 7.8 by 0.6 inches. Its display is a 13.3-inch full HD screen with an accompanying HD webcam that also includes a privacy shutter.
Overall, the WU-X/G2 has fairly basic hardware, including a 12-generation Intel Core processor with 1235U or 1255U options, 8GB, 16GB, or 32GB LPDDR4x memory, and 256GB to 2TB internal storage options. Japanese customers also have the option of customizing the WU-X/G2 with an external optical drive. Connectivity-wise, the device includes an Intel AX201 chipset, which powers Bluetooth 5.1 and WiFi-6.
The laptop is light on battery power at 25 watt-hours, which the brand equates to about 11 hours of battery life. Fujitsu potentially sacrificed some battery life to get the weight of the WU-X/G2 down as well.
Its port setup includes a card reader, three USB ports, a full-size HDMI port, and a Gigabit Ethernet port, in addition to a fingerprint reader in its power button.
The Fujitsu Lifebook WU-X/G2 remains a pretty traditional laptop by all accounts. What gives it the unique edge is an outer design made mostly of carbon fiber.
Often, thin and light laptops will come with impressive price tags to match, which is true of the Fujitsu Lifebook WU-X/G2 too. It sells for 317,222 yen (about $2,400) for its highest-end model featuring an Intel Core i7 chip, 32GB RAM, 2TB SSD, Windows 11 Pro. This is likely due to the product’s business branding. However, the company is offering a three-year warranty and a three-year subscription to McAfee Livesafe security software with the purchase of a WU-X/G2 model.
Many of the lightest laptops available in North America might not be less than 1.4 pounds, but they are less than two pounds and have much more affordable price tags.
The Acer Swift 7 weighs 1.96 pounds and sells for $2,100 on Newegg. The Asus ExpertBook B9 weighs 1.92 pounds and sells for $1,400 on Amazon. The Lenovo ThinkPad X1 Nano is 1.99 pounds and sells for $1,500 on Amazon.
A possible security attack has just been revealed by researchers, and while difficult to carry out, it could potentially endanger some of the most sensitive data in the world.
Dubbed “SATAn,” the hack turns a typical SATA cable into a radio transmitter. This permits the transfer of data even from devices that would otherwise not allow it at all.
As data protection measures grow more advanced and cyberattacks become more frequent, researchers and vicious attackers alike reach new heights of creativity in finding possible flaws in software and hardware. Dr. Mordechai Guri from the Ben-Gurion University of the Negev in Israel just published new findings that, once again, show us that even air-gapped systems aren’t completely secure.
An air-gapped system or network is completely isolated from any and all connections to the rest of the world. This means no networks, no internet connections, no Bluetooth — zero connectivity. The systems are purposely built without any hardware that can communicate wirelessly, all in an effort to keep them secure from various cyberattacks. All of these security measures are in place for one reason: To protect the most vulnerable and sensitive data in the world.
Hacking into these air-gapped systems is exceedingly difficult and often requires direct access in order to plant malware. Removable media, such as USB stealers, can also be used. Dr. Guri has now found yet another way to breach the security of an air-gapped system. SATAn relies on the use of a SATA connection, widely used in countless devices all over the globe, in order to infiltrate the targetted system and steal its data.
Through this technique, Dr. Guri was able to turn a SATA cable into a radio transmitter and send it over to a personal laptop located less than 1 meter away. This can be done without making any physical modifications to the cable itself or the rest of the targeted hardware. Feel free to dive into the paper penned by Dr. Guri (first spotted by Tom’s Hardware) if you want to learn the ins and outs of this tech.
In a quick summary of how SATAn is able to extract data from seemingly ultra-secure systems, it all comes down to manipulating the electromagnetic interference generated by the SATA bus. Through that, data can be transmitted elsewhere. The researcher manipulated this and used the SATA cable as a makeshift wireless antenna operating on the 6GHz frequency band. In the video shown above, Dr. Guri was able to steal a message from the target computer and then display it on his laptop.
“The receiver monitors the 6GHz spectrum for a potential transmission, demodulates the data, decodes it, and sends it to the attacker,” said the researcher in his paper.
The attack can only be carried out if the target device has malicious software installed on it beforehand. This, of course, takes the danger levels down a notch — but not all too much, seeing as USB devices can be used for this. Without that, the attacker would need to obtain physical access to the system to implant the malware before attempting to steal data through SATAn.
Rounding up the paper, Dr. Guri detailed some ways in which this type of attack can be mitigated, such as the implementation of internal policies that strengthen defenses and prevent the initial penetration of the air-gapped system. Making radio receivers forbidden inside facilities where such top-secret data is stored seems like a sensible move right now. Adding electromagnetic shielding to the case of the machine, or even just to the SATA cable itself, is also recommended.
This attack is certainly scary, but we regular folk most likely don’t need to worry. Given the complexity of the attack, it’s only worthy of a high-stakes game with nationwide secrets being the target. On the other hand, for those facilities and their air-gapped systems, alarm bells should be ringing — it’s time to tighten up the security.
Nvidia has announced a slew of AI-focused enterprise products at its annual GTC conference. They include details of its new silicon architecture, Hopper; the first datacenter GPU built using that architecture, the H100; a new Grace CPU “superchip”; and vague plans to build what the company claims will be the world’s fastest AI supercomputer, named Eos.
Nvidia has benefited hugely from the AI boom of the last decade, with its GPUs proving a perfect match for popular, data-intensive deep learning methods. As the AI sector’s demand for data compute grows, says Nvidia, it wants to provide more firepower.
In particular, the company stressed the popularity of a type of machine learning system known as a Transformer. This method has been incredibly fruitful, powering everything from language models like OpenAI’s GPT-3 to medical systems like DeepMind’s AlphaFold. Such models have increased exponentially in size over the space of a few years. When OpenAI launched GPT-2 in 2019, for example, it contained 1.5 billion parameters (or connections). When Google trained a similar model just two years later, it used 1.6 trillion parameters.
“Training these giant models still takes months,” said Nvidia senior director of product management Paresh Kharya in a press briefing. “So you fire a job and wait for one and half months to see what happens. A key challenge to reducing this time to train is that performance gains start to decline as you increase the number of GPUs in a data center.”
Nvidia says its new Hopper architecture will help ameliorate these difficulties. Named after pioneering computer scientist and US Navy Rear Admiral Grace Hopper, the architecture is specialized to accelerate the training of Transformer models on H100 GPUs by six times compared to previous-generation chips, while the new fourth-generation Nivida NVlink can connect up to 256 H100 GPUs at nine times higher bandwidth than the previous generation.
The H100 GPU itself contains 80 billion transistors and is the first GPU to support PCle Gen5 and utilize HBM3, enabling memory bandwidth of 3TB/s. Nvidia says an H100 GPU is three times faster than its previous-generation A100 at FP16, FP32, and FP64 compute, and six times faster at 8-bit floating point math.
“For the training of giant Transformer models, H100 will offer up to nine times higher performance, training in days what used to take weeks,” said Kharya.
The company also announced a new data center CPU, the Grace CPU Superchip, which consists of two CPUs connected directly via a new low-latency NVLink-C2C. The chip is designed to “serve giant-scale HPC and AI applications” alongside the new Hopper-based GPUs, and can be used for CPU-only systems or GPU-accelerated servers. It has 144 Arm cores and 1TB/s of memory bandwidth.
In addition to hardware and infrastructure news, Nvidia also announced updates to its various enterprise AI software services, including Maxine (an SDK to deliver audio and video enhancements, intended to power things like virtual avatars) and Riva (an SDK used for both speech recognition and text-to-speech).
The company also teased that it was building a new AI supercomputer, which it claims will be the world’s fastest when deployed. The supercomputer, named Eos, will be built using the Hopper architecture and contain some 4,600 H100 GPUs to offer 18.4 exaflops of “AI performance.” The system will be used for Nvidia’s internal research only, and the company said it would be online in a few months’ time.
Over the past few years, a number of companies with strong interest in AI have built or announced their own in-house “AI supercomputers” for internal research, including Microsoft, Tesla, and Meta. These systems are not directly comparable with regular supercomputers as they run at a lower level of accuracy, which has allowed a number of firms to quickly leapfrog one another by announcing the world’s fastest.
However, during his keynote address, Nvidia CEO Jensen Huang did say that Eos, when running traditional supercomputer tasks, would rack 275 petaFLOPS of compute — 1.4 times faster than “the fastest science computer in the US” (the Summit). “We expect Eos to be the fastest AI computer in the world,” said Huang. “Eos will be the blueprint for the most advanced AI infrastructure for our OEMs and cloud partners.”
Meta is testing an artificial intelligence system that lets people build parts of virtual worlds by describing them, and CEO Mark Zuckerberg showed off a prototype at a live event today. Proof of the concept, called Builder Bot, could eventually draw more people into Meta’s Horizon “metaverse” virtual reality experiences. It could also advance creative AI tech that powers machine-generated art.
In a prerecorded demo video, Zuckerberg walked viewers through the process of making a virtual space with Builder Bot, starting with commands like “let’s go to the beach,” which prompts the bot to create a cartoonish 3D landscape of sand and water around him. (Zuckerberg describes this as “all AI-generated.”) Later commands range from broad demands like creating an island to extremely specific requests like adding altocumulus clouds and — in a joke poking fun at himself — a model of a hydrofoil. They also include playing sound effects like “tropical music,” which Zuckerberg suggests is coming from a boombox that Builder Bot created, although it could also have been general background audio. The video doesn’t specify whether Builder Bot draws on a limited library of human-created models or if the AI plays a role in generating the designs.
Several AI projects have demonstrated image generation based on text descriptions, including OpenAI’s DALL-E, Nvidia’s GauGAN2, and VQGAN+CLIP, as well as more accessible applications like Dream by Wombo. But these well-known projects involve creating 2D images (sometimes very surreal ones) without interactive components, although some researchers are working on 3D object generation.
As described by Meta and shown in the demo, Builder Bot appears to be using voice input to add 3D objects that users can walk around, and Meta is aiming for more ambitious interactions. “You’ll be able to create nuanced worlds to explore and share experiences with others with just your voice,” Zuckerberg promised during the event keynote. Meta made several other AI announcements during the event, including plans for a universal language translator, a new version of a conversational AI system, and an initiative to build new translation models for languages without large written data sets.
Zuckerberg acknowledged that sophisticated interactivity, including the kinds of usable virtual objects many VR users take for granted, poses major challenges. AI generation can pose unique moderation problems if users ask for offensive content or the AI’s training reproduces human biases and stereotypes about the world. And we don’t know the limits of the current system. So for now, you shouldn’t expect to see Builder Bot pop up in Meta’s social VR platform — but you can get a taste of Meta’s plans for its AI future.
Update 12:50PM ET: Added details about later event announcements from Meta.
Social media conglomerate Meta is the latest tech company to build an “AI supercomputer” — a high-speed computer designed specifically to train machine learning systems. The company says its new AI Research SuperCluster, or RSC, is already among the fastest machines of its type and, when complete in mid-2022, will be the world’s fastest.
“Meta has developed what we believe is the world’s fastest AI supercomputer,” said Meta CEO Mark Zuckerberg in a statement. “We’re calling it RSC for AI Research SuperCluster and it’ll be complete later this year.”
The news demonstrates the absolute centrality of AI research to companies like Meta. Rivals like Microsoft and Nvidia have already announced their own “AI supercomputers,” which are slightly different from what we think of as regular supercomputers. RSC will be used to train a range of systems across Meta’s businesses: from content moderation algorithms used to detect hate speech on Facebook and Instagram to augmented reality features that will one day be available in the company’s future AR hardware. And, yes, Meta says RSC will be used to design experiences for the metaverse — the company’s insistent branding for an interconnected series of virtual spaces, from offices to online arenas.
“RSC will help Meta’s AI researchers build new and better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyze text, images, and video together; develop new augmented reality tools; and much more,” write Meta engineers Kevin Lee and Shubho Sengupta in a blog post outlining the news.
“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together.”
Work on RSC began a year and a half ago, with Meta’s engineers designing the machine’s various systems — cooling, power, networking, and cabling — entirely from scratch. Phase one of RSC is already up and running and consists of 760 Nvidia GGX A100 systems containing 6,080 connected GPUs (a type of processor that’s particularly good at tackling machine learning problems). Meta says it’s already providing up to 20 times improved performance on its standard machine vision research tasks.
Before the end of 2022, though, phase two of RSC will be complete. At that point, it’ll contain some 16,000 total GPUs and will be able to train AI systems “with more than a trillion parameters on data sets as large as an exabyte.” (This raw number of GPUs only provides a narrow metric for a system’s overall performance, but, for comparison’s sake, Microsoft’s AI supercomputer built with research lab OpenAI is built from 10,000 GPUs.)
These numbers are all very impressive, but they do invite the question: what is an AI supercomputer anyway? And how does it compare to what we usually think of as supercomputers — vast machines deployed by universities and governments to crunch numbers in complex domains like space, nuclear physics, and climate change?
The two types of systems, known as high-performance computers or HPCs, are certainly more similar than they are different. Both are closer to datacenters than individual computers in size and appearance and rely on large numbers of interconnected processors to exchange data at blisteringly fast speeds. But there are key differences between the two, as HPC analyst Bob Sorensen of Hyperion Research explains to The Verge. “AI-based HPCs live in a somewhat different world than their traditional HPC counterparts,” says Sorensen, and the big distinction is all about accuracy.
The brief explanation is that machine learning requires less accuracy than the tasks put to traditional supercomputers, and so “AI supercomputers” (a bit of recent branding) can carry out more calculations per second than their regular brethren using the same hardware. That means when Meta says it’s built the “world’s fastest AI supercomputer,” it’s not necessarily a direct comparison to the supercomputers you often see in the news (rankings of which are compiled by the independent Top500.org and published twice a year).
To explain this a little more, you need to know that both supercomputers and AI supercomputers make calculations using what is known as floating-point arithmetic — a mathematical shorthand that’s extremely useful for making calculations using very large and very small numbers (the “floating point” in question is the decimal point, which “floats” between significant figures). The degree of accuracy deployed in floating-point calculations can be adjusted based on different formats, and the speed of most supercomputers is calculated using what are known as 64-bit floating-point operations per second, or FLOPs. However, because AI calculations require less accuracy, AI supercomputers are often measured in 32-bit or even 16-bit FLOPs. That’s why comparing the two types of systems is not necessarily apples to apples, though this caveat doesn’t diminish the incredible power and capacity of AI supercomputers.
Sorensen offers one extra word of caution, too. As is often the case with the “speeds and feeds” approach to assessing hardware, vaunted top speeds are not always representative. “HPC vendors typically quote performance numbers that indicate the absolute fastest their machine can run. We call that the theoretical peak performance,” says Sorensen. “However, the real measure of a good system design is one that can run fast on the jobs they are designed to do. Indeed, it is not uncommon for some HPCs to achieve less than 25 percent of their so-called peak performance when running real-world applications.”
In other words: the true utility of supercomputers is to be found in the work they do, not their theoretical peak performance. For Meta, that work means building moderation systems at a time when trust in the company is at an all-time low and means creating a new computing platform — whether based on augmented reality glasses or the metaverse — that it can dominate in the face of rivals like Google, Microsoft, and Apple. An AI supercomputer offers the company raw power, but Meta still needs to find the winning strategy on its own.
Nvidia has announced a new platform for creating virtual agents named Omniverse Avatar. The platform combines a number of discrete technologies — including speech recognition, synthetic speech, facial tracking, and 3D avatar animation — which Nvidia says can be used to power a range of virtual agents.
In a presentation at the company’s annual GTC conference, Nvidia CEO Jensen Huang showed off a few demos using Omniverse Avatar tech. In one, a cute animated character in a digital kiosk talks a couple through the menu at a fast food restaurant, answering questions like which items are vegetarian. The character uses facial-tracking technology to maintain eye-contact with the customers and respond to their facial expressions. “This will be useful for smart retail, drive-throughs, and customer service,” said Huang of the tech.
In another demo, an animated toy version of Huang answered questions about topics including climate change and protein production, and in a third, someone used a realistic animated avatar of themselves as a stand-in during a conference call. The caller was wearing casual clothes in a busy cafe, but their virtual avatar was dressed smartly and spoke without any background noise impinging. This last example builds on Nvidia’s Project Maxine work, which aims to improve common problems with video conferencing (like low quality streams and maintaining eye contact) with the help of machine learning fixes.
(You can see the toy version of Huang in the video below, starting at 28 minutes. Or skip forward to 1 hour, 22 minutes to see the kiosk demo.)
The Omniverse Avatar announcement is part of Nvidia’s inescapable “omniverse” vision — a grandiose bit of branding for a nebulous collection of technologies. Like the “metaverse,” the “omniverse” is basically about shared virtual worlds that allow for remote collaboration. But compared to the vision put forward by Facebook-owner Meta, Nvidia is less concerned with transporting your office meetings into virtual reality and more about replicating industrial environments with virtual counterparts and — in the case of its avatar work — creating avatars that interact with people in the physical world.
As ever with these presentations, Nvidia’s demos looked fairly slick, but it’s not clear how useful this technology will be in the real world. With the kiosk character, for example, it’s not clear if customers will actually prefer this sort of interactive experience to simply selecting the items they want from a menu. Huang noted in the presentation that the avatar has a two-second response time — slower than a human, and bound to cause frustrations if customers are in a rush. Similarly, although the company’s Project Maxine tech looks flash, we’ve yet to see it make a significant impact in the real world.
Epic has made acquisitions and otherwise signalled plans for a Fortnite metaverse, but its latest move is one of the most obvious yet. The developer has introducedFortnite Party Worlds, or maps that are solely intended as social spaces to meet friends and play mini games. Unlike Hubs, these environments don’t link to other islands — think of them as final destinations.
The company has collaborated with creators fivewalnut and TreyJTH to offer a pair of example Party Worlds (a theme park and a lounge). However, the company is encouraging anyone to create and submit their own so long as they focus on the same goal of peaceful socialization.
This doesn’t strictly represent a metaverse when Party Worlds live in isolation. At the same time, this shows how far Fortnite has shifted away from its original focuses on battle royale and co-op gaming — there are now islands devoted solely to making friends, not to mention other non-combat experiences like virtual museums and trial courses. We wouldn’t expect brawls to disappear any time soon, but they’re quickly becoming just one part of a much larger experience.
All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Earlier this month, Amazon launched a public test realm for New World. Alongside that PTR launch, Amazon began testing a massive patch for New World that, among other things, added a new weapon, new enemies, and new quests to the game. Fast forward a week later and now that patch is ready for prime time, with Amazon launching it on live servers today.
What’s changing with New World’s big November update
It’s hard to understate just how big this update is in terms of content. The lengthy patch notes tell the tale better than we can, and there are more changes, fixes, and tweaks than we can hope to cover here in a single article. However, there are some highlights to this update, and perhaps the biggest one is the introduction of the Void Gauntlet.
Introduced last week when the PTR launched, the Void Gauntlet is a new weapon that blends offense and support. Since it scales using both Intelligence and Focus, it could be a good choice for Life Staff users, who previously had no other Focus-scaling weapon option. With that in mind, don’t be surprised to see the Void Gauntlet become a popular weapon right off the bat.
As outlined last week, this update also introduces new enemies called the Varangian Knights, though it seems they’ll mostly be encountered by lower-level players. Amazon is trying to prompt players to flag for PvP when leaving cities more often, too, as those who are flagged will receive a whopping 10% luck bonus and a 30% gathering luck bonus as they run around the world.
The update also launches new PvP missions and links all of the trading posts in the world, which could have a rather dramatic effect on the economy. Gone are the days when you’d need to run between cities in search of a good deal, as you can now purchase any listed item from any trading post in the world. We’ll see how this changes prices, but we expect them to fluctuate somewhat in the wake of this massive change.
Tweaks for every weapon in the game
Not only does this patch bring a lot of additions, but it also contains many weapon tweaks. In fact, every weapon in the game has been changed in some way by this update, with many weapons having their less popular abilities buffed and popular ones nerfed. For instance, the Ice Gauntlet’s Ice Storm has been nerfed, with the time between damage ticks increased. Several bugs associated with the ability have been quashed as well.
That’s really just scratching the surface of the changes, though. Each weapon has seen a wide variety of changes, so check out the patch notes to see how your personal weapon combo has been impacted. In general, however, it seems that magic users have been nerfed a bit, so if you’re an Ice Gauntlet or Fire Staff user, you’ll definitely want to take a close look at the patch notes.
Today’s update is quite possibly the biggest one that New World has received since its launch. New World came up from maintenance just a short time ago, which means the update is now live. So, if you’re a New World player, dive in and take it for a spin.
Microsoft and Nvidia today announced that they trained what they claim is the largest and most capable AI-powered language model to date: Megatron-Turing Natural Language Generation (MT-NLP). The successor to the companies’ Turing NLG 17B and Megatron-LM models, MT-NLP contains 530 billion parameters and achieves “unmatched” accuracy in a broad set of natural language tasks, Microsoft and Nvidia say — including reading comprehension, commonsense reasoning, and natural language inferences.
“The quality and results that we have obtained today are a big step forward in the journey towards unlocking the full promise of AI in natural language. The innovations of DeepSpeed and Megatron-LM will benefit existing and future AI model development and make large AI models cheaper and faster to train,” Nvidia’s senior director of product management and marketing for accelerated computing, Paresh Kharya, and group program manager for the Microsoft Turing team, Ali Alvi wrote in a blog post. “We look forward to how MT-NLG will shape tomorrow’s products and motivate the community to push the boundaries of natural language processing (NLP) even further. The journey is long and far from complete, but we are excited by what is possible and what lies ahead.”
Training massive language models
In machine learning, parameters are the part of the model that’s learned from historical training data. Generally speaking, in the language domain, the correlation between the number of parameters and sophistication has held up remarkably well. Language models with large numbers of parameters, more data, and more training time have been shown to acquire a richer, more nuanced understanding of language, for example gaining the ability to summarize books and even complete programming code.
To train MT-NLG, Microsoft and Nvidia say that they created a training dataset with 270 billion tokens from English-language websites. Tokens, a way of separating pieces of text into smaller units in natural language, can either be words, characters, or parts of words. Like all AI models, MT-NLP had to “train” by ingesting a set of examples to learn patterns among data points, like grammatical and syntactical rules.
The dataset largely came from The Pile, an 835GB collection of 22 smaller datasets created by the open source AI research effort EleutherAI. The Pile spans academic sources (e.g., Arxiv, PubMed), communities (StackExchange, Wikipedia), code repositories (Github), and more, which Microsoft and Nvidia say they curated and combined with filtered snapshots of the Common Crawl, a large collection of webpages including news stories and social media posts.
Above: The data used to train MT-NLP.
Training took place across 560 Nvidia DGX A100 servers, each containing 8 Nvidia A100 80GB GPUs.
When benchmarked, Microsoft says that MT-NLP can infer basic mathematical operations even when the symbols are “badly obfuscated.” While not extremely accurate, the model seems to go beyond memorization for arithmetic and manages to complete tasks containing questions that prompt it for an answer, a major challenge in NLP.
It’s well-established that models like MT-NLP can amplify the biases in data on which they were trained, and indeed, Microsoft and Nvidia acknowledge that the model “picks up stereotypes and biases from the [training] data.” That’s likely because a portion of the dataset was sourced from communities with pervasive gender, race, physical, and religious prejudices, which curation can’t completely address.
In a paper, the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism claim that GPT-3 and similar models can generate “informational” and “influential” text that might radicalize people into far-right extremist ideologies and behaviors. A group at Georgetown University has used GPT-3 to generate misinformation, including stories around a false narrative, articles altered to push a bogus perspective, and tweets riffing on particular points of disinformation. Other studies, like one published by Intel, MIT, and Canadian AI initiative CIFAR researchers in April, have found high levels of stereotypical bias from some of the most popular open source models, including Google’s BERT, XLNet, and Facebook’s RoBERTa.
Microsoft trains a 530billion parameter GPT3-style language model. This is the largest LM in existence. (There’s also the mysterious multi-modal 1.5trillion+ ‘Wu Dao’ MOE model but little known about it). Microsoft trains on ‘The Pile’ dataset. https://t.co/md03QzqlxA
Microsoft and Nvidia claim that they’re “committed to working on addressing [the] problem” and encourage “continued research to help in quantifying the bias of the model.” They also say that any use of Megatron-Turing in production “must ensure that proper measures are put in place to mitigate and minimize potential harm to users,” and follow tenets such as those outlined in Microsoft’s Responsible AI Principles.
“We live in a time [when] AI advancements are far outpacing Moore’s law. We continue to see more computation power being made available with newer generations of GPUs, interconnected at lightning speeds. At the same time, we continue to see hyper-scaling of AI models leading to better performance, with seemingly no end in sight,” Kharya and Alvi continued. “Marrying these two trends together are software innovations that push the boundaries of optimization and efficiency.”
The cost of large models
Projects like MT-NLP, AI21 Labs’ Jurassic-1, Huawei’s PanGu-Alpha, Naver’s HyperCLOVA, and the Beijing Academy of Artificial Intelligence’s Wu Dao 2.0 are impressive from an academic standpoint, but building them doesn’t come cheap. For example, the training dataset for OpenAI’s GPT-3 — one of the world’s largest language models — was 45 terabytes in size, enough to fill 90 500GB hard drives.
AI training costs dropped 100-fold between 2017 and 2019, according to one source, but the totals still exceed the compute budgets of most startups. The inequity favors corporations with extraordinary access to resources at the expense of small-time entrepreneurs, cementing incumbent advantages.
For example, OpenAI’s GPT-3 required an estimated 3.1423^23 floating-point operations per second (FLOPS) of compute during training. In computer science, FLOPS is a measure of raw processing performance, typically used to compare different types of hardware. Assuming OpenAI reserved 28 teraflops — 28 trillion floating-point operations per second — of compute across a bank of Nvidia V100 GPUs, a common GPU available through cloud services, it’d take $4.6 million for a single training run. One Nvidia RTX 8000 GPU with 15 teraflops of compute would be substantially cheaper — but it’d take 665 years to finish the training.
Microsoft and Nvidia says that it observed between 113 to 126 teraflops per second per GPU while training MT-NLP. The cost is likely to have been in the millions of dollars.
A Synced report estimated that a fake news detection model developed by researchers at the University of Washington cost $25,000 to train, and Google spent around $6,912 to train a language model called BERT that it used to improve the quality of Google Search results. Storage costs also quickly mount when dealing with datasets at the terabyte — or petabyte — scale. To take an extreme example, one of the datasets accumulated by Tesla’s self-driving team — 1.5 petabytes of video footage — would cost over $67,500 to store in Azure for three months, according to CrowdStorage.
The effects of AI and machine learning model training on the environment have also been brought into relief. In June 2020, researchers at the University of Massachusetts at Amherst released a report estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide, equivalent to nearly five times the lifetime emissions of the average U.S. car. OpenAI itself has conceded that models like Codex require significant amounts of compute — on the order of hundreds of petaflops per day — which contributes to carbon emissions.
In a sliver of good news, the cost for FLOPS and basic machine learning operations has been falling over the past few years. A 2020 OpenAI survey found that since 2012, the amount of compute needed to train a model to the same performance on classifying images in a popular benchmark — ImageNet — has been decreasing by a factor of two every 16 months. Other recent research suggests that large language models aren’t always more complex than smaller models, depending on the techniques used to train them.
Maria Antoniak, a natural language processing researcher and data scientist at Cornell University, says when it comes to natural language, it’s an open question whether larger models are the right approach. While some of the best benchmark performance scores today come from large datasets and models, the payoff from dumping enormous amounts of data into models is uncertain.
“The current structure of the field is task-focused, where the community gathers together to try to solve specific problems on specific datasets,” Antoniak told VentureBeat in a previous interview. “These tasks are usually very structured and can have their own weaknesses, so while they help our field move forward in some ways, they can also constrain us. Large models perform well on these tasks, but whether these tasks can ultimately lead us to any true language understanding is up for debate.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
up-to-date information on the subjects of interest to you
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
Just as Respawn teased over the weekend, today the first gameplay trailer for Apex Legends: Emergence dropped. While the launch trailer for the Emergence season – which was released last week – introduced us to Seer, the season’s new Legend, this one shows us Seer (and everyone else) in action. These Apex Legends gameplay trailers are generally a mix of in-game cinematics and gameplay rather than being straight gameplay, but we still get a few good looks in this one.
In addition to showing us Seer in action, the Emergence gameplay trailer also shows off the revamped World’s Edge map. As it turns out, big changes are coming to World’s End in the Emergence season, as the map is now scarred with lava-filled fractures thanks to aggressive resource mining and harvesting.
The cataclysm has created new points of interest on the World’s Edge map. For instance, Sorting Factory has been swallowed by a lava-filled sinkhole, creating the new Lava Siphon POI. In a blog post to the Apex Legends website, Respawn also confirms that the Refinery has been replaced by a much larger structure called the Climatizer and that a lava-filled fissure runs from the Climatizer to Fragment East. Both Lava Siphon and the Climatizer will offer ridable gondolas, too, so we expect plenty of interesting fights to play out on those.
While the trailer does give us a look at Seer in battle, we’re still unsure what he’s capable of doing exactly. Respawn hasn’t detailed his kit yet, so his flashy combat in the trailer doesn’t have much context to it until that happens. The trailer also features a look at the Rampage LMG, a new gun debuting in the Emergence season.
At the tail end of the trailer, we even get a glimpse at some of the battle pass rewards for the new season. More specifics on Seer, the Rampage LMG, and the Emergence battle pass are likely to be revealed in the coming days, so we’ll let you know when Respawn shares more.