Researchers find new vulnerability with Apple Silicon chips

Researchers have released details of an Apple Silicon vulnerability dubbed “Augury.” However, it doesn’t seem to be a huge issue at the moment.

Jose Rodrigo Sanchez Vicarte from the University of Illinois at Urbana-Champaign and Michael Flanders of the University of Washington published their findings of a flaw within Apple Silicon. The vulnerability itself is due to a flaw in Apple’s implementation of the Data-Memory Dependent Prefetcher (DMP).

In short, a DMP looks at memory to determine what content to “prefetch” for the CPU. The researchers found that Apple’s M1, M1 Max, and A14 chips used an “array of pointers” pattern that loops through an array and dereferences the contents.

This could possibly leak data that’s not read because it gets dereferenced by the prefetcher. Apple’s implementation is different from a traditional prefetcher as explained by the paper.

“Once it has seen *arr[0] … *arr[2] occur (even speculatively!) it will begin prefetching *arr[3] onward. That is, it will first prefetch ahead the contents of arr and then dereference those contents. In contrast, a conventional prefetcher would not perform the second step/dereference operation.”

Because the CPU cores never read the data, defenses that try to track access to the data don’t work against the Augery vulnerability.

David Kohlbrenner, assistant professor at the University of Washington, downplayed the impact of Augery, noting that Apple’s DMP “is about the weakest DMP an attacker can get.”

The good news here is that this is about the weakest DMP an attacker can get. It only prefetches when content is a valid virtual address, and has number of odd limitations. We show this can be used to leak pointers and break ASLR.

We believe there are better attacks possible.

— David Kohlbrenner (@dkohlbre) April 29, 2022

For now, researchers say that only the pointers can be accessed and even then via the research sandbox environment used to research the vulnerability. Apple was also notified about the vulnerability before the public disclosure, so a patch is likely incoming soon.

Apple issued a March 2022 patch for MacOS Monterey that fixed some nasty Bluetooth and display bugs. It also patched two vulnerabilities that allowed an application to execute code with kernel-level privileges.

Other critical fixes to Apple’s desktop operating system include one that patched a vulnerability that exposed browsing data in the Safari browser.

Finding bugs in Apple’s hardware can sometimes net a pretty profit. A Ph.D. student from Georgia Tech found a major vulnerability that allowed unauthorized access to the webcam. Apple handsomely rewarded him about $100,000 for his efforts.

Editors’ Choice

Repost: Original Source and Author Link


Engadget Podcast: Dorsey leaves Twitter and a dive into Qualcomm’s new Snapdragon chips

This week, Cherlynn and Devindra discuss the significance of Jack Dorsey leaving Twitter. Will the social network thrive, or stumble, after losing its co-founder for a second time? Also, Cherlynn explains what’s up with all of Qualcomm’s new Snapdragon chips for phones, computers and… portable gaming consoles? Is it enough to take on Apple’s M1 chips? Or will Windows once again hold Snapdragon PCs back?

Listen below, or subscribe on your podcast app of choice. If you’ve got suggestions or topics you’d like covered on the show, be sure to email us or drop a note in the comments! And be sure to check out our other podcasts, the Morning After and Engadget News!



  • What is Twitter without founder Jack Dorsey? – 1:21

  • Qualcomm’s Snapdragon 8 Gen 1 chip – 22:38

  • Updates from the Theranos / Elizabeth Holmes trial – 45:44

  • Spotify’s Wrapped feature is available this week – 51:41

  • Working on – 56:18

  • Pop culture picks – 59:20

Video livestream

Hosts: Cherlynn Low and Devindra Hardawar
Producer: Ben Ellman
Livestream producers: Julio Barrientos, Luke Brooks
Graphics artists: Luke Brooks, Kyle Maack
Music: Dale North and Terrence O’Brien

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Repost: Original Source and Author Link


Google is using AI to design its next generation of AI chips more quickly than humans can

Google is using machine learning to help design its next generation of machine learning chips. The algorithm’s designs are “comparable or superior” to those created by humans, say Google’s engineers, but can be generated much, much faster. According to the tech giant, work that takes months for humans can be accomplished by AI in under six hours.

Google has been working on how to use machine learning to create chips for years, but this recent effort — described this week in a paper in the journal Natureseems to be the first time its research has been applied to a commercial product: an upcoming version of Google’s own TPU (tensor processing unit) chips, which are optimized for AI computation.

“Our method has been used in production to design the next generation of Google TPU,” write the paper’s authors, co-led by Google research scientists Azalia Mirhoseini and Anna Goldie.

AI, in other words, is helping accelerate the future of AI development.

In the paper, Google’s engineers note that this work has “major implications” for the chip industry. It should allow companies to more quickly explore the possible architecture space for upcoming designs and more easily customize chips for specific workloads.

An editorial in Nature calls the research an “important achievement,” and notes that such work could help offset the forecasted end of Moore’s Law — an axiom of chip design from the 1970s that states that the number of transistors on a chip doubles every two years. AI won’t necessarily solve the physical challenges of squeezing more and more transistors onto chips, but it could help find other paths to increasing performance at the same rate.

Google’s TPU chips are offered as part of its cloud services and used internally for AI research.
Photo: Google

The specific task that Google’s algorithms tackled is known as “floorplanning.” This usually requires human designers who work with the aid of computer tools to find the optimal layout on a silicon die for a chip’s sub-systems. These components include things like CPUs, GPUs, and memory cores, which are connected together using tens of kilometers of minuscule wiring. Deciding where to place each component on a die affects the eventual speed and efficiency of the chip. And, given both the scale of chip manufacture and computational cycles, nanometer-changes in placement can end up having huge effects.

Google’s engineers note that designing floor plans takes “months of intense effort” for humans, but, from a machine learning perspective, there is a familiar way to tackle this problem: as a game.

AI has proven time and time again it can outperform humans at board games like chess and Go, and Google’s engineers note that floorplanning is analogous to such challenges. Instead of a game board, you have a silicon die. Instead of pieces like knights and rooks, you have components like CPUs and GPUs. The task, then, is to simply find each board’s “win conditions.” In chess that might be checkmate, in chip design it’s computational efficiency.

Google’s engineers trained a reinforcement learning algorithm on a dataset of 10,000 chip floor plans of varying quality, some of which had been randomly generated. Each design was tagged with a specific “reward” function based on its success across different metrics like the length of wire required and power usage. The algorithm then used this data to distinguish between good and bad floor plans and generate its own designs in turn.

As we’ve seen when AI systems take on humans at board games, machines don’t necessarily think like humans and often arrive at unexpected solutions to familiar problems. When DeepMind’s AlphaGo played human champion Lee Sedol at Go, this dynamic led to the infamous “move 37” — a seemingly illogical piece placement by the AI that nevertheless led to victory.

Nothing quite so dramatic happened with Google’s chip-designing algorithm, but its floor plans nevertheless look quite different to those created by a human. Instead of neat rows of components laid out on the die, sub-systems look like they’ve almost been scattered across the silicon at random. An illustration from Nature shows the difference, with the human design on the left and machine learning design on the right. You can also see the general difference in the image below from Google’s paper (orderly humans on the left; jumbled AI on the right), though the layout has been blurred as it’s confidential:

A human-designed chip floor plan is on the left, and the AI-designed floor plan on the right. The images have been blurred by the paper’s authors as they represent confidential designs.
Image: Mirhoseini, A. et al

This paper is noteworthy, particularly because its research is now being used commercially by Google. But it’s far from the only aspect of AI-assisted chip design. Google itself has explored using AI in other parts of the process like “architecture exploration,” and rivals like Nvidia are looking into other methods to speed up the workflow. The virtuous cycle of AI designing chips for AI looks like it’s only just getting started.

Update, Thursday Jun 10th, 3:17PM ET: Updated to clarify that Google’s Azalia Mirhoseini and Anna Goldie are co-lead authors of the paper.

Repost: Original Source and Author Link


KeepTruckin uses Ambarella AI chips to monitor truck drivers

All the sessions from Transform 2021 are available on-demand now. Watch now.

KeepTruckin, a fleet management company, said today it will use Ambarella AI processors in its dashboard camera to monitor both road conditions and truck driver awareness.

KeepTruckin will use the chips in its AI Dashcam, which can signal alerts if a driver is unaware of an upcoming hazard, or if the driver is simply distracted or drowsy.

The system uses Ambarella’s CV22 CVflow Edge AI vision system on chip (SoC). The AI Dashcam uses a single CV22 SoC to simultaneously provide AI and image processing for its dual-camera system, which integrates one camera for the front advanced driver assistance system (ADAS) with incident recording, and a second RGB-infrared camera for the driver-monitoring system (DMS) with driver recording.

“The more the number of cameras, the higher the processing needs are,” Udit Budhia, director of marketing at Ambarella, said in an interview with VentureBeat.

Ambarella designs the chips to be able to handle AI processing at the edge, but without consuming a lot of power. The CV22 will run KeepTruckin’s proprietary AI algorithms for real-time high-risk behavior detection and active warnings directly on the small form factor device, with minimal heat dissipation.

Ambarella's CVflow architecture is the basis for a lot of chip families.

Above: Ambarella’s CVflow architecture is the basis for a lot of chip families.

Image Credit: Ambarella

The camera that looks out on the road can use CV22 to produce warnings if a driver is following too close, is drifting out of a lane, may have a pending collision, is speeding, or is violating traffic laws. Using the same chip and running multiple simultaneous neural network models, the in-cabin camera can monitor for driver fatigue, distraction, and policy violations, such as contextual cell phone use or seatbelt monitoring, in combination with data from the front camera.

“One of the other advantages of having the power and efficiency and the developer support has been we can detect more with the same capabilities,” said Abhishek Gupta, group product manager at KeepTruckin, in an interview with VentureBeat. “If you have more power efficiency, you can run more AI models. So if you run more AI models, you can actually show customers a lot more behavior that needs to be corrected in real time. This brings a lot more value long term to customers.”

Jai Ranganathan, senior vice president of product at KeepTruckin, said in an interview that the fleet management innovation is enabled by Ambarella’s scalable range of CVflow AI vision SoCs, which are all supported by a common software development kit (SDK).

“We do this for people moving goods, people doing construction work, people in oil and gas — all kinds of different applications,” Ranganathan said. “We are bringing them a new generation of camera, in partnership with Ambarella.”

Above: Jai Ranganathan of KeepTruckin.

Image Credit: Ambarella

The AI Dashcam is connected to the KeepTruckin Vehicle Gateway, which uploads the pre-analyzed data, video, and still images to KeepTruckin’s cloud-based fleet management software in real time. The CV22 SoC integrates Ambarella’s image signal processor, which provides 1440p resolution HDR videos across all lighting conditions, while utilizing its on-chip H.264/H.265 encoding to reduce transmission bandwidth and storage costs.

“We have a bunch of requirements that are fairly onerous and more than most AI services do,” Ranganathan said.

Updating tech

KeepTruckin can upload additional features to the CV22 over time, via over-the-air software updates, to deliver incremental value to clients that invest in the platform. Moreover, KeepTruckin’s model training becomes increasingly more precise due to its in-house safety team assessing quality in real time, adding more risk context and providing input that makes model training and development cycles shorter.

In a truck, the conditions are tough for electronics, since the dashboards get hot and power efficiency is important.

“The more inferences we can do per watt, the much better it is for our deep learning models,” Ranganathan said. “That’s a big element of what we care about. There’s obviously a cost issue too. We’re in a pretty cutthroat industry. Our customers care about having really good value for the money. Doing all this in real time at the edge is why we favored Ambarella.”

KeepTruckin has more than 2,500 employees. It has raised $450 million from investors including Google Ventures (GV), and it supports around 400,000 trucks in its network.

Above: KeepTruckin’s camera can see the road and the driver too.

Image Credit: Ambarella

Needless to say, trucks carry a lot of force, and accidents are terrible for anyone involved. Anything the company can do to reduce accident rates and improve safety is valuable, Ranganathan said. In the long term, this could reduce insurance costs and help the overall economy.

It’s about reducing accidents on the road.

“If you’re using your cell phone, if your seatbelt is off, these are risky behaviors that need to be corrected because the continuation of all these things will lead to accidents,” Gupta said. “Alerting the driver directly at the edge, whether it’s through the mobile app experience or coaching the driver after the fact, this is really where the entire platform comes together. You want to be able to impact the driver who’s the one who’s actually making all these decisions in real time.”

Protecting drivers

The cameras have to be as reliable as any security camera. The system has to have accurate AI, and it also has to capture images with high resolution. Coming out with 1440p resolution as well as night mode IR tech puts the camera at the leading edge. The tech can help discern the highest-risk drivers from the lower-risk drivers.

Above: KeepTruckin uses Ambarella’s AI chips to detect driver alerts.

Image Credit: Ambarella

“We found that our score is five times more predictive than the leading safety score you’ll see from the industry,” Gupta said.

It can reward positive behaviors and also figure out how to retain drivers better, as driver turnover is a difficult problem. If KeepTruckin wants to upgrade to better performance, it can upgrade its software or upgrade to a different CVflow processor, said Budhia.

“The camera is really small, and the power consumption needs to be extremely low. There is no fan in this design. So that also helps,” Budhia said. “The KeepTruckin team can easily import their models trained on different networks to our device.”

Asked how KeepTruckin protects drivers’ privacy, Ranganathan said, “We take data and driver privacy very seriously. Our AI Dashcam provides both front and dual camera options, as well as privacy covers. We encourage drivers to check with their fleet managers for more detailed information on how their videos may be used or shared by their fleet.”


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link


New AMD Ryzen 5000G Chips Solve a Big PC Building Problem

AMD just released two new Ryzen 5000G processors — the Ryzen 5 5600G and Ryzen 7 5700G. Although budget-focused APUs are par for the course with newer architectures, these two chips arrive at a very opportune time. The GPU shortage is still in effect, and both APUs fill a gap in the PC building space.

Over the past several months, the price of last-gen APUs has gone up in response to the GPU shortage. For example, we recommended the Ryzen 5 3400G in our best $500 gaming PC build at twice the price it should sell for. The two-year-old chip should sell for $150, but it’s nearly $330 at the time of publication.

These new chips from AMD hit on two fronts. In addition to featuring the new Zen 3 architecture, the chips are priced in line with how they should perform. The $360 Ryzen 7 5700G, for example, outclasses the 3400G in Fortnite by 23% at 1080p, according to AMD’s numbers. AMD also says it provides a 1.45x increase in Cinebench R20 and a 1.44x increase in PCMark 10.

Here are the specs of the new chips:

Ryzen 5 5600G Ryzen 7 5700G
Cores 6 8
Threads 12 16
Base clock 3.9GHz 3.8GHz
Boost clock 4.4GHz 4.6GHz
Total cache 19MB 20MB
Graphics compute units 8 7
Graphics speed 1.9GHz 2GHz
TDP 65W 65W
Price $259 $359

Thanks to the GPU pricing crisis, many builders have turned to picking up an APU. Although integrated graphics are never a sure bet for gaming, they’re still capable of running games with trimmed-down settings at lower resolutions. The logic is pretty straightforward — buy an APU for now to scratch the gaming itch, and add in a graphics card later once prices have dropped.

The problem was that APUs became the hot ticket, leading to issues like the vastly overpriced 3400G. The 5600G and 5700G fill that gap nicely, offering builders the opportunity to put together a gaming PC that can actually play games without taking out a new line of credit.

As for the gaming performance you can expect, AMD says the 5600G is capable of 79 frames per second (fps) in Civilization VI, 33 fps in Assassin’s Creed Odyssey, and 98 fps in Fortnite, all at 1080p with Low settings. The 5700G is only slightly more powerful in gaming, matching the 5600G in Assassin’s Creed Odyssey and Fortnite while moving up to 84 fps in Civilization VI. 

AMD originally announced these processors at Computex, and they’re now making it to store shelves. Both parts are available today across retailers at their list price. If all things go well, they should be available at that list price for a while, but it’s too soon to say if they’ll suffer a similar fate as the 3400G.

Although we haven’t had the chance to test the chips ourselves, they look like the perfect addition to a budget build without a dedicated graphics card. And that’s something PC builders have needed for a while.

Editors’ Choice

Repost: Original Source and Author Link


Intel’s revised roadmap looks beyond 1 nanometer chips

Forget about “SuperFin Enhanced,” the previous name for the node powering Intel’s upcoming 10nm Alder Lake processors. Now, that node is just called “Intel 7,” according to the company’s revised roadmap. But don’t go thinking that means Intel is somehow delivering a 7nm processor early — its long-delayed “Rocket Lake” 7nm chip still won’t ship until 2023, and its node has been renamed to “Intel 4.” Confused yet? It’s almost like Intel is trying to attach a new number to these upcoming products, so we’ll forget it’s losing the shrinking transistor war against AMD.

But Intel’s prospects are more interesting as we look ahead to 2024, when the company expects to finalize the design for its first chips with transistors smaller than 1 nanometer. They’ll be measured by angstroms, instead. The “Intel 20A” node will be powered by “RibbonFET” transistors, the company’s first new architecture since the arrival FinFET in 2011. It’ll be coupled with PowerVia, a technology that can move power delivery to the rear of a chip wafer, which should make signal transmission more efficient.

Pat Gelsinger Intel


“Building on Intel’s unquestioned leadership in advanced packaging, we are accelerating our innovation roadmap to ensure we are on a clear path to process performance leadership by 2025,” Intel’s new CEO Pat Gelsinger (above) said during the “Intel Accelerated” livestream today. “We are leveraging our unparalleled pipeline of innovation to deliver technology advances from the transistor up to the system level. Until the periodic table is exhausted, we will be relentless in our pursuit of Moore’s Law and our path to innovate with the magic of silicon.”

Before it reaches the angstrom era of chips, though, the company also plans to release a processor with an “Intel 3” node in 2023. You can think of it as a super-powered version of its 7nm architecture, with around an 18 percent performance power watt improvement over Intel 4. It’ll likely fill the timing gap between Rocket Lake chips in 2023 and the Intel 20A products in 2024. Intel is also daring to call its shot beyond 2024: it’s also working on an “Intel 18A” node that’ll further improve on its RibbonFET design.

For consumers, this roadmap means you can expect chips to get steadily faster and more efficient over the next five years. If anything, the announcements today show that Intel is trying to move beyond the 10nm and 7nm delays that have dogged it for ages. 

As we’ve previously argued, it’s ultimately a good thing for the tech industry if Intel can finally regain its footing. Its $20 billion investment in two Arizona-based fabrication plants was a clear sign that Gelsinger aimed to bring the company into new territory. But now that it’s laid out a new timeline, there’ll be even more pressure for Intel not to let things slip once again. 

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Repost: Original Source and Author Link


AMD May Launch 64-Core Threadripper 5000 Chips in August

A new report suggests that AMD’s upcoming and as-yet-unannounced Threadripper 5000 processors will arrive in August, with general availability coming in September. The updated high-end desktop range is reportedly built on the Zen 3 architecture powering Ryzen 5000 chips, which should offer a significant performance improvement over the current Threadripper 3000 chips.

AMD will announce Threadripper 5000 chips, codenamed Chagall, in August, according to a report from MoePC. The biggest change compared to Threadripper 3000 chips is the updated Zen 3 architecture, which boasts up to a 19% instruction per clock (IPC) improvement compared to Zen 2. We saw a similar shift with Ryzen 5000, which massively improved on the single-core performance of Ryzen 3000 chips.

High-end desktop enthusiasts have been eagerly waiting for the updated design, which has been rumored to launch in August for a few months. The latest report confirms the launch window, barring any delays from AMD. The Zen 3 cores in Ryzen 5000 chips use a refined 7nm process from chipmaker TSMC, and Threadripper 5000 shouldn’t be any different.

Outside of the updated Zen 3 cores, Threadripper 5000 is almost identical to Threadripper 3000. The flagship processor will reportedly feature 64 cores and 128 threads, support for PCIe 4.0, and support DDR4 memory. It will also reportedly use the same TRX4 socket as Threadripper 3000, unlike the next-gen Ryzen chips, which will likely use a different socket design.

Although Threadripper 5000 won’t use a different socket design, the report suggests that AMD will release them alongside a new motherboard chipset. It’s not clear now, however, if the updated motherboard design will be ready at launch.

Threadripper 5000 chips will also reportedly feature double the L3 cache — 32MB compared to 16MB on Threadripper 3000. The report says the chips won’t feature AMD’s recently announced 3D V-Cache technology, which allows AMD to stack a layer of L3 cache on top of the chip package. The first chips featuring this technology are set to arrive in early 2022 with Zen 3-based Ryzen CPUs.

A person holding the AMD Ryzen Threadripper 1950X.
Bill Roberson/Digital Trends

Although AMD hasn’t confirmed anything yet, the launch of Threadripper 5000 chips should come soon. AMD has already brought the Zen 3 architecture to the consumer Ryzen range and the server-grade Epyc range. Ryzen 5000 APUs featuring Zen 3 cores are also set to launch in August. That leaves the Threadripper range, which hasn’t seen an update in two and a half years.

Threadripper 5000 chips are exciting, though they may not see the same gen-on-gen upgrade as Ryzen 5000. The improvements to single-core performance are clear, but Threadripper chips excel in workloads that demand a lot of cores. And with 64 of them available, that shouldn’t be an issue.

We don’t have any pricing details for Threadripper 5000 yet, but AMD will likely carry the same pricing and naming structure over from Threadripper 3000 chips. If that’s the case, you can expect to pay about $4,000 for the flagship 64-core chip and about $1,400 for the cheapest 24-core chip.

Editors’ Choice

Repost: Original Source and Author Link


Is Google A.I. Better at Designing Chips Than Engineers?

Could artificial intelligence be better at designing chips than human experts? A group of researchers from Google’s Brain Team attempted to answer this question and came back with interesting findings. It turns out that a well-trained A.I. is capable of designing computer microchips — and with great results. So great, in fact, that Google’s next generation of A.I. computer systems will include microchips created with the help of this experiment.

Azalia Mirhoseini, one of the computer scientists of Google Research’s Brain Team, explained the approach in an issue of Nature together with several colleagues. Artificial intelligence usually has an easy time beating a human mind when it comes to games such as chess. Some might say that A.I. can’t think like a human, but in the case of microchips, this proved to be the key to finding some out-of-the-box solutions.

Designing a microchip involves “floor planning,” a lengthy process that involves the work of human experts with the help of computer tools. The goal is to find the optimal layout for all the subsystems on a chip, thus providing the best possible performance. Minuscule changes to the placement of each component can have a massive impact on how powerful the chip is going to be, be it a processor, a graphics card, or a memory core.

Google’s engineers admit that designing floor plans for a new microchip takes “months of intense effort” for a whole team of people. However, Google Research’s Brain Team based in Mountain View, California, seems to have cracked the code that makes the whole process simpler. The answer? Treating floor planning as a game.

As reported by Azalia Mirhoseini and Anna Goldie, both co-leaders of the research team, the A.I. was trained to play a game of finding the most efficient chip design. Using a dataset of 10,000 microchip floor plans, the team used a reinforcement learning algorithm to set apart the good and the bad floor plans. Metrics such as the length of wire, power usage, chip size, and more were taken into consideration.

The more the A.I. was able to discern the most optimal chip configurations, the more it was also able to produce its own. In the process, it found some unique approaches as to the placement of parts. This has worked as an inspiration for the experts to try something new, such as reducing the distance between the components by placing them in doughnut shapes.

Although previous attempts of simplifying the process have been made, five decades worth of research hasn’t brought any solutions. Until now, all automated planning techniques were unable to replicate the kind of performance human-made chips provided.

According to Anna Goldie, this is because the algorithm learns from experience. “Previous approaches didn’t learn anything with each chip,” said Goldie, pointing out the use of machine learning.

What used to take a team of experts several months can now be replicated by artificial intelligence in under six hours. The resulting microchip floor plans are either of the same quality as those made by humans or, in some cases, superior to them. As such, Google’s new findings could save hundreds, if not thousands, of work hours for each new generation of computer chips.

The company is now using these A.I.-made chips for further studies. The scientists suggest that the use of these more powerful chips may contribute to further advances in the research, including the use of A.I. for things such as vaccine testing or city planning. As A.I. becomes more and more widespread, there will certainly be even more big discoveries to watch out for in the near future.

Editors’ Choice

Repost: Original Source and Author Link


Google used reinforcement learning to design next-gen AI accelerator chips

Elevate your enterprise data technology and strategy at Transform 2021.

In a preprint paper published a year ago, scientists at Google Research including Google AI lead Jeff Dean described an AI-based approach to chip design that could learn from past experience and improve over time, becoming better at generating architectures for unseen components. They claimed it completed designs in under six hours on average, which is significantly faster than the weeks it takes human experts in the loop.

While the work wasn’t entirely novel — it built upon a technique Google engineers proposed in a paper published in March 2020 — it advanced the state of the art in that it implied the placement of on-chip transistors can be largely automated. Now, in a paper published in the journal Nature, the original team of Google researchers claim they’ve fine-tuned the technique to design an upcoming, previously unannounced generation of Google’s tensor processing units (TPU), application-specific integrated circuits (ASICs) developed specifically to accelerate AI.

If made publicly available, the Google researchers’ technique could enable cash-strapped startups to develop their own chips for AI and other specialized purposes. Moreover, it could help to shorten the chip design cycle to allow hardware to better adapt to rapidly evolving research.

“Basically, right now in the design process, you have design tools that can help do some layout, but you have human placement and routing experts work with those design tools to kind of iterate many, many times over,” Dean told VentureBeat in a previous interview. “It’s a multi-week process to actually go from the design you want to actually having it physically laid out on a chip with the right constraints in area and power and wire length and meeting all the design roles or whatever fabrication process you’re doing. We can essentially have a machine learning model that learns to play the game of [component] placement for a particular chip.”

AI chip design

A computer chip is divided into dozens of blocks, each of which is an individual module, such as a memory subsystem, compute unit, or control logic system. These wire-connected blocks can be described by a netlist, a graph of circuit components like memory components and standard cells including logic gates (e.g., NAND, NOR, and XOR). Chip “floorplanning” involves placing netlists onto two-dimensional grids called canvases so that performance metrics like power consumption, timing, area, and wirelength are optimized while adhering to constraints on density and routing congestion.

Since the 1960s, many automated approaches to chip floorplanning have been proposed, but none has achieved human-level performance. Moreover, the exponential growth in chip complexity has rendered these techniques unusable on modern chips. Human chip designers must instead iterate for months with electronic design automation (EDA) tools, taking a register transfer level (RTL) description of the chip netlist and generating a manual placement of that netlist onto the chip canvas. On the basis of this feedback, which can take up to 72 hours, the designer either concludes that the design criteria have been achieved or provides feedback to upstream RTL designers, who then modify low-level code to make the placement task easier.

The Google team’s solution is a reinforcement learning method capable of generalizing across chips, meaning that it can learn from experience to become both better and faster at placing new chips.

Gaming the system

Training AI-driven design systems that generalize across chips is challenging because it requires learning to optimize the placement of all possible chip netlists onto all possible canvases. In point of fact, chip floorplanning is analogous to a game with various pieces (e.g., netlist topologies, macro counts, macro sizes and aspect ratios), boards (canvas sizes and aspect ratios), and win conditions (the relative importance of different evaluation metrics or different density and routing congestion constraints). Even one instance of this “game” — placing a particular netlist onto a particular canvas — has more possible moves than the Chinese board game Go.

The researchers’ system aims to place a “netlist” graph of logic gates, memory, and more onto a chip canvas, such that the design optimizes power, performance, and area (PPA) while adhering to constraints on placement density and routing congestion. The graphs range in size from millions to billions of nodes grouped in thousands of clusters, and typically, evaluating the target metrics takes from hours to over a day.

Starting with an empty chip, the Google team’s system places components sequentially until it completes the netlist. To guide the system in selecting which components to place first, components are sorted by descending size; placing larger components first reduces the chance there’s no feasible placement for it later.

Google chip AI

Above: Macro placements of Ariane, an open source RISC-V processor, as training progresses. On the left, the policy is being trained from scratch, and on the right, a pre-trained policy is being fine-tuned for this chip. Each rectangle represents an individual macro placement.

Image Credit: Google

Training the system required creating a dataset of 10,000 chip placements, where the input is the state associated with the given placement and the label is the reward for the placement (i.e., wirelength and congestion). The researchers built it by first picking five different chip netlists, to which an AI algorithm was applied to create 2,000 diverse placements for each netlist.

The system took 48 hours to “pre-train” on an Nvidia Volta graphics card and 10 CPUs, each with 2GB of RAM. Fine-tuning initially took up to 6 hours, but applying the pre-trained system to a new netlist without fine-tuning generated placement in less than a second on a single GPU in later benchmarks.

In one test, the Google researchers compared their system’s recommendations with a manual baseline: the production design of a previous-generation TPU chip created by Google’s TPU physical design team. Both the system and the human experts consistently generated viable placements that met timing and congestion requirements, but the AI system also outperformed or matched manual placements in area, power, and wirelength while taking far less time to meet design criteria.

Future work

Google says that its system’s ability to generalize and generate “high-quality” solutions has “major implications,” unlocking opportunities for co-optimization with earlier stages of the chip design process. Large-scale architectural explorations were previously impossible because it took months of effort to evaluate a given architectural candidate. However, modifying a chip’s design can have an outsized impact on performance, the Google team notes, and might lay the groundwork for full automation of the chip design process.

Moreover, because the Google team’s system simply learns to map the nodes of a graph onto a set of resources, it might be applicable to range of applications including city planning, vaccine testing and distribution, and cerebral cortex mapping. “[While] our method has been used in production to design the next generation of Google TPU … [we] believe that [it] can be applied to impactful placement problems beyond chip design,” the researchers wrote in the paper.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link


AMD is Borrowing a Key Intel Feature For its Next-Gen Chips

AMD’s upcoming AM5 socket will reportedly use a different socket design. The next generation of AMD processors will use a land grid array (LGA) socket instead of a pin grid array (PGA) socket, which AMD is currently using with its Ryzen 5000 range and has traditionally used throughout processor generations.

The information comes with ExecutableFix on Twitter, who has leaked other details about upcoming AMD products in the past. They also claim that the next-gen AM5 socket will be centered on the 600-series chipset, which is said to support dual-channel DDR5 memory and PCIe 4.0. That last bit runs counter to earlier rumors about AMD’s next-gen processors, which were originally rumored to support PCIe 5.0.

AM5 ????
– LGA-1718
– Dual-channel DDR5
– PCI-e 4.0
– 600 series chipset

— ExecutableFix (@ExecuFix) May 22, 2021

Intel started using LGA sockets in 2004 and has used them since. For LGA sockets, the CPU has contact pads on the bottom of the chip. These make contact with an array of pins that are located on the motherboard. PGA sockets are the exact opposite. The pins are located on the CPU itself, while the motherboard socket includes holes for those pins to slide into.

The AM5 socket will reportedly use an LGA1718 design, meaning there will be 1,718 pins on the motherboard. That’s nearly 400 more pins than the current AM4 design, and according to ExecutableFix, AMD will continue using the 40mm x 40mm size that AM4 is based on. LGA sockets offer a higher pin density, so it makes sense for AMD to finally jump to using an LGA socket.

AMD has already moved some of its product line to LGA sockets. Threadripper processors that use the TR4 and sTRX4 socket are based on an LGA design, as is the SP3 socket that AMD uses for its server-grade Epyc processors.

With the launch of Ryzen 5000 processors, AMD announced that it will move away from the AM4 socket that it has used since the original Ryzen range. The new AM5 socket will launch alongside AMD’s next desktop platform, which is based on a 5nm manufacturing process and rumored to launch in 2022.

Editors’ Choice

Repost: Original Source and Author Link