Categories
AI

AMD boosts performance of datacenters, technical computing, HPC, and AI

Santa Clara, California-based company, Advanced Micro Devices (AMD), says its “Milan-X” AMD EPYC processors with 3D V-Cache, to launch in early 2022, will deliver a “50% average uplift” to technical computing workloads.

It also said its Instinct MI200 GPUs, to also launch in early 2022, will boost high-performance computing (HPC) and AI. AMD made the announcements today at its Accelerated Data Center Premier virtual event.

HPC is one area where AMD has bragging rights, given that its designs were chosen for Oak Ridge National Laboratory’s Frontier supercomputer, one of the first exascale systems capable of exceeding a quintillion, or 1018, calculations per second. Frontier pairs Cray’s new Shasta architecture and Slingshot interconnect with AMD EPYC and Instinct processors assembled with 4 GPUs to 1 CPU in each node, according to the project website. Currently, under construction, Frontier is scheduled to be available to scientists in early next year.

“We’re bringing the CPUs, GPUs, and software together into a unified system architecture to power exascale computing,” Ram Peddibhotla, AMD corporate vice president, product management, said in a preview briefing for journalists.

While few businesses today aspire to exabyte performance, those with technical computing workloads like electronics design, structural analysis, computational fluid dynamics, and finite element analysis techniques used in engineering simulations will benefit from improvements to EPYC, according to AMD. For example, EPYC shows a 66% performance improvement for RTL verification, a critical process in electronic design automation.

“Verification proves that each structure and the design does what it’s supposed to do,” Peddibhotla explained. “It helps catch defects early in the process before a chip is baked into silicon.” Designers taking advantage of this improvement will get the choice of finishing verification faster and getting to market faster or packing more tests into the same amount of time to improve quality, he said.

Frontier supercomputer

Above: Frontier supercomputer

AMD says EPYC benefits from continued improvements in its 3D chiplet manufacturing process and boosting the amount of L3 cache per complex core (CCD) from 32 to 96 megabytes. In an 8-CCD module that includes other types of cache, the total is “804 megabytes of cache per socket at the top of the stack — an incredible amount of cache,” Peddibhotla said. That means the processor can manage more information internally, without relying on other server memory or storage.

AMD says its latest GPU for datacenters will perform 9.5 times faster for high-performance computing (HPC) and 1.2 times faster for AI workloads than competing GPUs — like those from Nvidia. The Instinct MI200 is the latest in a line of GPUs specifically designed for datacenters, as opposed to gaming and desktop graphics. For this update, AMD particularly focused on improving performance for double-precision floating-point operations, which is why the performance improvements claimed are bigger for HPC than for AI processing. “We targeted this device to do really, really well on the toughest scientific problems requiring double-precision math, and that’s where we made the biggest step forward,” said Brad McCreadie, corporate VP of datacenter GPU accelerators at AMD.

The performance improvement varies between types of HPC workloads, for example, McCreadie said the Instinct MI200 performs 2.5 times faster for the types of vector operations used for vaccine simulations.

More targeted toward AI developers is the release of the ROCm 5.0 open source software for GPU computing, which integrates with popular frameworks such as Pytorch and TensorFlow, and the launch of the Infinity Hub collection of code and templated containers to help developers get started.

AMD also announced the third generation of its Infinity architecture for interconnecting CPUs and GPUs, which it says can deliver up to 80Gbps of total aggregate bandwidth to reduce data movement and simplify memory management.

Despite fierce competition with Nvidia, Intel, and others, AMD reported 55% revenue growth in the most recent quarter.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

The untapped potential of HPC + graph computing

In the past few years, AI has crossed the threshold from hype to reality. Today, with unstructured data growing by 23% annually in an average organization, the combination of knowledge graphs and high performance computing (HPC) is enabling organizations to exploit AI on massive datasets.

Full disclosure: Before I talk about how critical graph computing +HPC is going to be, I should tell you that I’m CEO of a graph computing, AI and analytics company, so I certainly have a vested interest and perspective here. But I’ll also tell you that our company is one of many in this space — DGraph, MemGraph, TigerGraph, Neo4j, Amazon Neptune, and Microsoft’s CosmosDB, for example, all use some form of HPC + graph computing. And there are many other graph companies and open-source graph options, including OrientDB, Titan, ArangoDB, Nebula Graph, and JanusGraph. So there’s a bigger movement here, and it’s one you’ll want to know about.

Knowledge graphs organize data from seemingly disparate sources to highlight relationships between entities. While knowledge graphs themselves are not new (Facebook, Amazon, and Google have invested a lot of money over the years in knowledge graphs that can understand user intents and preferences), its coupling with HPC gives organizations the ability to understand anomalies and other patterns in data at unparalleled rates of scale and speed.

There are two main reasons for this.

First, graphs can be very large: Data sizes of 10-100TB are not uncommon. Organizations today may have graphs with billions of nodes and hundreds of billions of edges. In addition, nodes and edges can have a lot of property data associated with them. Using HPC techniques, a knowledge graph can be sharded across the machines of a large cluster and processed in parallel.

The second reason HPC techniques are essential for large-scale computing on graphs is the need for fast analytics and inference in many application domains. One of the earliest use cases I encountered was with the Defense Advanced Research Projects Agency (DARPA), which first used knowledge graphs enhanced by HPC for real-time intrusion detection in their computer networks. This application entailed constructing a particular kind of knowledge graph called an interaction graph, which was then analyzed using machine learning algorithms to identify anomalies. Given that cyberattacks can go undetected for months (hackers in the recent SolarWinds breach lurked for at least nine months), the need for suspicious patterns to be pinpointed immediately is evident.

Today, I’m seeing a number of other fast-growing use cases emerge that are highly relevant and compelling for data scientists, including the following.

Financial services — fraud, risk management and customer 360

Digital payments are gaining more and more traction — more than three-quarters of people in the US use some form of digital payments. However, the amount of fraudulent activity is growing as well. Last year the dollar amount of attempted fraud grew 35%. Many financial institutions still rely on rules-based systems, which fraudsters can bypass relatively easily. Even those institutions that do rely on AI techniques can typically analyze only the data collected in a short period of time due to the large number of transactions happening every day. Current mitigation measures therefore lack a global view of the data and fail to adequately address the growing financial fraud problem.

A high-performance graph computing platform can efficiently ingest data corresponding to billions of transactions through a cluster of machines, and then run a sophisticated pipeline of graph analytics such as centrality metrics and graph AI algorithms for tasks like clustering and node classification, often using Graph Neural Networks (GNN) to generate vector space representations for the entities in the graph. These enable the system to identify fraudulent behaviors and prevent anti-money laundering activities more robustly. GNN computations are very floating-point intensive and can be sped up by exploiting tensor computation accelerators.

Secondly, HPC and knowledge graphs coupled with graph AI are essential to conduct risk assessment and monitoring, which has become more challenging with the escalating size and complexity of interconnected global financial markets. Risk management systems built on traditional relational databases are inadequately equipped to identify hidden risks across a vast pool of transactions, accounts, and users because they often ignore relationships among entities. In contrast, a graph AI solution learns from the connectivity data and not only identifies risks more accurately but also explains why they are considered risks. It is essential that the solution leverage HPC to reveal the risks in a timely manner before they turn more serious.

Finally, a financial services organization can aggregate various customer touchpoints and integrate this into a consolidated, 360-degree view of the customer journey. With millions of disparate transactions and interactions by end users — and across different bank branches – financial services institutions can evolve their customer engagement strategies, better identify credit risk, personalize product offerings, and implement retention strategies.

Pharmaceutical industry — accelerating drug discovery and precision medicine

Between 2009 to 2018, U.S. biopharmaceutical companies spent about $1 billion to bring new drugs to market. A significant fraction of that money is wasted in exploring potential treatments in the laboratory that ultimately do not pan out. As a result, it can take 12 years or more to complete the drug discovery and development process. In particular, the COVID-19 pandemic has thrust the importance of cost-effective and swift drug discovery into the spotlight.

A high-performance graph computing platform can enable researchers in bioinformatics and cheminformatics to store, query, mine, and develop AI models using heterogeneous data sources to reveal breakthrough insights faster. Timely and actionable insights can not only save money and resources but also save human lives.

Challenges in this data and AI-fueled drug discovery have centered on three main factors — the difficulty of ingesting and integrating complex networks of biological data, the struggle to contextualize relations within this data, and the complications in extracting insights across the sheer volume of data in a scalable way. As in the financial sector, HPC is essential to solving these problems in a reasonable time frame.

The main use cases under active investigation at all major pharmaceutical companies include drug hypothesis generation and precision medicine for cancer treatment, using heterogeneous data sources such as bioinformatics and cheminformatic knowledge graphs along with gene expression, imaging, patient clinical data, and epidemiological information to train graph AI models. While there are many algorithms to solve these problems, one popular approach is to use Graph Convolutional Networks (GCN) to embed the nodes in a high-dimensional space, and then use the geometry in that space to solve problems like link prediction and node classification.

Another important aspect is the explainability of graph AI models. AI models cannot be treated as black boxes in the pharmaceutical industry as actions can have dire consequences. Cutting-edge explainability methods such as GNNExplainer and Guided Gradient (GGD) methods are very compute-intensive therefore require high-performance graph computing platforms.

The bottom line

Graph technologies are becoming more prevalent, and organizations and industries are learning how to make the most of them effectively. While there are several approaches to using knowledge graphs, pairing them with high performance computing is transforming this space and equipping data scientists with the tools to take full advantage of corporate data.

Keshav Pingali is CEO and co-founder of Katana Graph, a high-performance graph intelligence company. He holds the W.A.”Tex” Moncrief Chair of Computing at the University of Texas at Austin, is a Fellow of the ACM, IEEE and AAAS, and is a Foreign Member of the Academia Europeana.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Liqid integrates HPC management tool with Slurm orchestration engine

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


Liqid has integrated its software for dynamically composing compute and storage resources on high performance computing (HPC) environments with open source Slurm Workload Manager software used to orchestrate jobs on these platforms.

The integration of Liqid Matrix Software with the open source orchestration engine will make it easier for IT organizations to dynamically scale HPC workloads up and down as needed, Liqid CEO Sumit Puri said. That capability has become more critical as IT teams increasingly run AI workloads on HPC platforms configured with graphical processor units (GPUs), Puri added.

Liqid Matrix Software makes it possible to dynamically aggregate bare-metal resources — such as GPUs, x86 and Arm processors, NVMe storage, network integration cards (NICs), host bus adaptors, field-programmable gate arrays, and memory — and then assign them to a specific workload. It also provides peer-to-peer connectivity that enables those resources to be aggregated across multiple HPC systems.

Slurm, meanwhile, is an orchestration engine widely employed in HPC environments to dynamically scale resources in much the same way Kubernetes does in IT environments running containers. The one prerequisite is systems running Liqid Matrix Software need to support the Peripheral Component Interconnect (PCI) Express 3.0 expansion bus standard, which provides I/O virtualization capabilities. Most recently, Liqid revealed it is collaborating with Broadcom to created reference kits for the 4.0 of PCI Express, which doubles the overall throughput available.

“For the first time in history, every device in the datacenter speaks a common language,” Puri said.

Liqid iso also working with VMware to make its software available via the console VMware provides to manage virtual infrastructure. VMware most recently expanded its alliance to Nvidia to make GPUs more accessible to the average IT administrator.

Organizations are looking to maximize utilization rates on HPC platforms to increase the value of investments they have made in existing platforms, Puri noted. Most recently, Liqid won a $32 million contract from the U.S. Department of Defense to maximize utilization of a pair of supercomputers located at the Supercomputing Resource Center at Aberdeen Proving Ground in Maryland, which provide access to 15 petaflops of performance. Those systems are based on Intel Xeon Platinum 9200 CPUs featuring Intel DL Boost technology and Nvidia A100 Tensor Core GPUs.

Rather than having to rely on HPC platforms built using proprietary processors found in, for example, a Cray supercomputer, Liqid is betting that more HPC workloads will wind up being deployed on lower-cost commercial processors from Intel, Arm, and Nvidia. The software Liqid provides makes it possible to manage systems based on those processors as if they were one logical entity.

It’s not clear to what degree AI workloads will be running on-premises versus on the cloud, where orchestration is generally managed by the cloud service providers. However, given the prevalence of HPC platforms that have already been paid for and deployed, it’s highly probable that many organizations will prefer to leverage what amounts to an already sunk cost. In other cases, security and compliance concerns require IT organizations to continue to invest in on-premises systems.

Regardless of approach, HPC platforms are about to become a mainstay of many IT environments as the number of AI workloads continues to increase. Longer-term, those workloads are going to migrate to the network edge, Puri said. As that trend continues to evolve, Puri said will become crucial for IT teams to manage bare-metal infrastructure at higher levels of abstraction.

But given the cost of GPUs, most IT organizations will likely remain anxious to optimize any platform that makes use of them for the foreseeable future.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link