Categories
AI

Google releases TF-GNN for creating graph neural networks in TensorFlow Google has released

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Google today released TensorFlow Graph Neural Networks (TF-GNN) in alpha, a library designed to make it easier to work with graph structured data using TensorFlow, its machine learning framework. Used in production at Google for spam and anomaly detection, traffic estimation, and YouTube content labeling, Google says that TF-GNN is designed to “encourage collaborations with researchers in industry.”

Graphs are a set of objects, places, or people and the connections between them. A graph represents the relations (edges) between a collection of entities (nodes or vertices), all of which can store data. Directionality can be ascribed to the edges to describe information, traffic flow, and more.

More often than not, the data in machine learning problems is structured or relational and thus can be described with a graph. Fundamental research on GNNs is decades old, but recent advances have led to great achievements in many domains, like modeling the transition of glass from a liquid to a solid and predicting pedestrian, cyclist, and driver behavior on the road.

TF-GNN

Above: Graphs can model the relationships between many different types of data, including web pages (left), social connections (center), or molecules (right).

Image Credit: Google

Indeed, GNNs can be used to answer questions about multiple characteristics of graphs. By working at the graph level, they can try to predict aspects of the entire graph, for example identifying the presence of certain “shapes” like circles in a graph that might represent close social relationships. GNNs can also be used on node-level tasks to classify the nodes of a graph or at the edge level to discover connections between entities.

TF-GNN

TF-GNN provides building blocks for implementing GNN models in TensorFlow. Beyond the modeling APIs, the library also delivers tooling around the task of working with graph data, including a data-handling pipeline and example models.

Also included with TF-GNN is an API to create GNN models that can be composed with other types of AI models. In addition to this, TF-GNN ships with a schema to declare the topology of a graph (and tools to validate it), helping to describe the shape of training data.

“Graphs are all around us, in the real world and in our engineered systems … In particular, given the myriad types of data at Google, our library was designed with heterogeneous graphs in mind,” Google’s Sibon Li, Jan Pfeifer, Bryan Perozzi, and Douglas Yarrington wrote in the blog post introducing TF-GNN.

TF-GNN adds to Google’s growing collection of TensorFlow libraries, which spans TensorFlow Privacy, TensorFlow Federated, and TensorFlow.Text. More recently, the company open-sourced TensorFlow Similarity, which trains models that search for related items — for example, finding similar-looking clothes and identifying currently playing songs.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

The untapped potential of HPC + graph computing

In the past few years, AI has crossed the threshold from hype to reality. Today, with unstructured data growing by 23% annually in an average organization, the combination of knowledge graphs and high performance computing (HPC) is enabling organizations to exploit AI on massive datasets.

Full disclosure: Before I talk about how critical graph computing +HPC is going to be, I should tell you that I’m CEO of a graph computing, AI and analytics company, so I certainly have a vested interest and perspective here. But I’ll also tell you that our company is one of many in this space — DGraph, MemGraph, TigerGraph, Neo4j, Amazon Neptune, and Microsoft’s CosmosDB, for example, all use some form of HPC + graph computing. And there are many other graph companies and open-source graph options, including OrientDB, Titan, ArangoDB, Nebula Graph, and JanusGraph. So there’s a bigger movement here, and it’s one you’ll want to know about.

Knowledge graphs organize data from seemingly disparate sources to highlight relationships between entities. While knowledge graphs themselves are not new (Facebook, Amazon, and Google have invested a lot of money over the years in knowledge graphs that can understand user intents and preferences), its coupling with HPC gives organizations the ability to understand anomalies and other patterns in data at unparalleled rates of scale and speed.

There are two main reasons for this.

First, graphs can be very large: Data sizes of 10-100TB are not uncommon. Organizations today may have graphs with billions of nodes and hundreds of billions of edges. In addition, nodes and edges can have a lot of property data associated with them. Using HPC techniques, a knowledge graph can be sharded across the machines of a large cluster and processed in parallel.

The second reason HPC techniques are essential for large-scale computing on graphs is the need for fast analytics and inference in many application domains. One of the earliest use cases I encountered was with the Defense Advanced Research Projects Agency (DARPA), which first used knowledge graphs enhanced by HPC for real-time intrusion detection in their computer networks. This application entailed constructing a particular kind of knowledge graph called an interaction graph, which was then analyzed using machine learning algorithms to identify anomalies. Given that cyberattacks can go undetected for months (hackers in the recent SolarWinds breach lurked for at least nine months), the need for suspicious patterns to be pinpointed immediately is evident.

Today, I’m seeing a number of other fast-growing use cases emerge that are highly relevant and compelling for data scientists, including the following.

Financial services — fraud, risk management and customer 360

Digital payments are gaining more and more traction — more than three-quarters of people in the US use some form of digital payments. However, the amount of fraudulent activity is growing as well. Last year the dollar amount of attempted fraud grew 35%. Many financial institutions still rely on rules-based systems, which fraudsters can bypass relatively easily. Even those institutions that do rely on AI techniques can typically analyze only the data collected in a short period of time due to the large number of transactions happening every day. Current mitigation measures therefore lack a global view of the data and fail to adequately address the growing financial fraud problem.

A high-performance graph computing platform can efficiently ingest data corresponding to billions of transactions through a cluster of machines, and then run a sophisticated pipeline of graph analytics such as centrality metrics and graph AI algorithms for tasks like clustering and node classification, often using Graph Neural Networks (GNN) to generate vector space representations for the entities in the graph. These enable the system to identify fraudulent behaviors and prevent anti-money laundering activities more robustly. GNN computations are very floating-point intensive and can be sped up by exploiting tensor computation accelerators.

Secondly, HPC and knowledge graphs coupled with graph AI are essential to conduct risk assessment and monitoring, which has become more challenging with the escalating size and complexity of interconnected global financial markets. Risk management systems built on traditional relational databases are inadequately equipped to identify hidden risks across a vast pool of transactions, accounts, and users because they often ignore relationships among entities. In contrast, a graph AI solution learns from the connectivity data and not only identifies risks more accurately but also explains why they are considered risks. It is essential that the solution leverage HPC to reveal the risks in a timely manner before they turn more serious.

Finally, a financial services organization can aggregate various customer touchpoints and integrate this into a consolidated, 360-degree view of the customer journey. With millions of disparate transactions and interactions by end users — and across different bank branches – financial services institutions can evolve their customer engagement strategies, better identify credit risk, personalize product offerings, and implement retention strategies.

Pharmaceutical industry — accelerating drug discovery and precision medicine

Between 2009 to 2018, U.S. biopharmaceutical companies spent about $1 billion to bring new drugs to market. A significant fraction of that money is wasted in exploring potential treatments in the laboratory that ultimately do not pan out. As a result, it can take 12 years or more to complete the drug discovery and development process. In particular, the COVID-19 pandemic has thrust the importance of cost-effective and swift drug discovery into the spotlight.

A high-performance graph computing platform can enable researchers in bioinformatics and cheminformatics to store, query, mine, and develop AI models using heterogeneous data sources to reveal breakthrough insights faster. Timely and actionable insights can not only save money and resources but also save human lives.

Challenges in this data and AI-fueled drug discovery have centered on three main factors — the difficulty of ingesting and integrating complex networks of biological data, the struggle to contextualize relations within this data, and the complications in extracting insights across the sheer volume of data in a scalable way. As in the financial sector, HPC is essential to solving these problems in a reasonable time frame.

The main use cases under active investigation at all major pharmaceutical companies include drug hypothesis generation and precision medicine for cancer treatment, using heterogeneous data sources such as bioinformatics and cheminformatic knowledge graphs along with gene expression, imaging, patient clinical data, and epidemiological information to train graph AI models. While there are many algorithms to solve these problems, one popular approach is to use Graph Convolutional Networks (GCN) to embed the nodes in a high-dimensional space, and then use the geometry in that space to solve problems like link prediction and node classification.

Another important aspect is the explainability of graph AI models. AI models cannot be treated as black boxes in the pharmaceutical industry as actions can have dire consequences. Cutting-edge explainability methods such as GNNExplainer and Guided Gradient (GGD) methods are very compute-intensive therefore require high-performance graph computing platforms.

The bottom line

Graph technologies are becoming more prevalent, and organizations and industries are learning how to make the most of them effectively. While there are several approaches to using knowledge graphs, pairing them with high performance computing is transforming this space and equipping data scientists with the tools to take full advantage of corporate data.

Keshav Pingali is CEO and co-founder of Katana Graph, a high-performance graph intelligence company. He holds the W.A.”Tex” Moncrief Chair of Computing at the University of Texas at Austin, is a Fellow of the ACM, IEEE and AAAS, and is a Foreign Member of the Academia Europeana.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

What are graph neural networks (GNN)?

The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!


Graphs are everywhere around us. Your social network is a graph of people and relations. So is your family. The roads you take to go from point A to point B constitute a graph. The links that connect this webpage to others form a graph. When your employer pays you, your payment goes through a graph of financial institutions.

Basically, anything that is composed of linked entities can be represented as a graph. Graphs are excellent tools to visualize relations between people, objects, and concepts. Beyond visualizing information, however, graphs can also be good sources of data to train machine learning models for complicated tasks.

Graph neural networks (GNN) are a type of machine learning algorithm that can extract important information from graphs and make useful predictions. With graphs becoming more pervasive and richer with information, and artificial neural networks becoming more popular and capable, GNNs have become a powerful tool for many important applications.

Transforming graphs for neural network processing

An image of interconnected nodes set against a marble background.

Every graph is composed of nodes and edges. For example, in a social network, nodes can represent users and their characteristics (e.g., name, gender, age, city), while edges can represent the relations between the users. A more complex social graph can include other types of nodes, such as cities, sports teams, news outlets, as well as edges that describe the relations between the users and those nodes.

Unfortunately, the graph structure is not well suited for machine learning. Neural networks expect to receive their data in a uniform format. Multi-layer perceptrons expect a fixed number of input features. Convolutional neural networks expect a grid that represents the different dimensions of the data they process (e.g., width, height, and color channels of images).

Graphs can come in different structures and sizes, which does not conform to the rectangular arrays that neural networks expect. Graphs also have other characteristics that make them different from the type of information that classic neural networks are designed for. For instance, graphs are “permutation invariant,” which means changing the order and position of nodes doesn’t make a difference as long as their relations remain the same. In contrast, changing the order of pixels results in a different image and will cause the neural network that processes them to behave differently.

To make graphs useful to deep learning algorithms, their data must be transformed into a format that can be processed by a neural network. The type of formatting used to represent graph data can vary depending on the type of graph and the intended application, but in general, the key is to represent the information as a series of matrices.

A series of images set against a grainy, sand-colored background. The first is a series of people's profiles interconnected by nodes. The next are two graphs with a series of people's first names, and basic biographical information.

For example, consider a social network graph. The nodes can be represented as a table of user characteristics. The node table, where each row contains information about one entity (e.g., user, customer, bank transaction), is the type of information that you would provide a normal neural network.

But graph neural networks can also learn from other information that the graph contains. The edges, the lines that connect the nodes, can be represented in the same way, with each row containing the IDs of the users and additional information such as date of friendship, type of relationship, etc. Finally, the general connectivity of the graph can be represented as an adjacency matrix that shows which nodes are connected to each other.

When all of this information is provided to the neural network, it can extract patterns and insights that go beyond the simple information contained in the individual components of the graph.

Graph embeddings

Three images set against a blue marble background. The first: a series of graphs with users' names and personal information. Second image: bar graph entitled "Graph Embedding." Third image: a spreadsheet with users and numbers titled "Graph Embeddings."

Graph neural networks can be created like any other neural network, using fully connected layers, convolutional layers, pooling layers, etc. The type and number of layers depend on the type and complexity of the graph data and the desired output.

The GNN receives the formatted graph data as input and produces a vector of numerical values that represent relevant information about nodes and their relations.

This vector representation is called “graph embedding.” Embeddings are often used in machine learning to transform complicated information into a structure that can be differentiated and learned. For example, natural language processing systems use word embeddings to create numerical representations of words and their relations together.

How does the GNN create the graph embedding? When the graph data is passed to the GNN, the features of each node are combined with those of its neighboring nodes. This is called “message passing.” If the GNN is composed of more than one layer, then subsequent layers repeat the message-passing operation, gathering data from neighbors of neighbors and aggregating them with the values obtained from the previous layer. For example, in a social network, the first layer of the GNN would combine the data of the user with those of their friends, and the next layer would add data from the friends of friends and so on. Finally, the output layer of the GNN produces the embedding, which is a vector representation of the node’s data and its knowledge of other nodes in the graph.

Interestingly, this process is very similar to how convolutional neural networks extract features from pixel data. Accordingly, one very popular GNN architecture is the graph convolutional neural network (GCN), which uses convolution layers to create graph embeddings.

Applications of graph neural networks

An image of three separate neural networks set against a grey background.

Once you have a neural network that can learn the embeddings of a graph, you can use it to accomplish different tasks.

Here are a few applications for graph neural networks:

Node classification: One of the powerful applications of GNNs is adding new information to nodes or filling gaps where information is missing. For example, say you are running a social network and you have spotted a few bot accounts. Now you want to find out if there are other bot accounts in your network. You can train a GNN to classify other users in the social network as “bot” or “not bot” based on how close their graph embeddings are to those of the known bots.

Edge prediction: Another way to put GNNs to use is to find new edges that can add value to the graph. Going back to our social network, a GNN can find users (nodes) who are close to you in embedding space but who aren’t your friends yet (i.e., there isn’t an edge connecting you to each other). These users can then be introduced to you as friend suggestions.

Clustering: GNNs can glean new structural information from graphs. For example, in a social network where everyone is in one way or another related to others (through friends, or friends of friends, etc.), the GNN can find nodes that form clusters in the embedding space. These clusters can point to groups of users who share similar interests, activities, or other inconspicuous characteristics, regardless of how close their relations are. Clustering is one of the main tools used in machine learning–based marketing.

Graph neural networks are very powerful tools. They have already found powerful applications in domains such as route planning, fraud detection, network optimization, and drug research. Wherever there is a graph of related entities, GNNs can help get the most value from the existing data.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Airbnb CTO says graph neural networks will be big in 2021

All the sessions from Transform 2021 are available on-demand now. Watch now.


Executives have to prioritize whether to experiment with cutting-edge technologies or wait to see results from other implementations first, Airbnb chief technology officer Vanja Josifovski said in a conversation with VentureBeat founder and CEO Matt Marshall at VentureBeat’s Transform 2021 virtual conference. Most enterprises — even large ones — have constrained resources, so they have to decide which technologies to invest in and which to wait out.

Typically, the decision is to use state of the art technologies in critical areas and avoid experimental or emerging technology in all the other areas, Josifovski said.

“It’s one of the hardest parts of my job because I do want to hire the best and smartest people, but then I do want to channel that ability into the areas that will provide business impact,” Josifovski said. “In some cases, [we] refrain from using state of the art until we think that we’ll get the return back.”

Josifovski and Marshall discussed some of the innovative trends in artificial intelligence (AI). “If we look at what’s happening today, there are some amazing technologies coming up,” Josifovski said, such as graph neural networks, transformer models, and language models.

Graph neural networks

Graph neural networks will be a major trend in 2021, Josifovski predicted. At its core, the deep learning paradigm is a different way of structuring data, like images, and sequencing data, like text. However, the data’s usage and the structure needed for the model to work can be rigid. Graph neural networks, in contrast, allow a more flexible architecture because the data defines the architecture of the model.

“Graph neural networks is a next iteration that allows us to use a lot more data within the deep learning framework in a much more natural way,” Josifovski said. “I feel that they will open a whole new area, where you’re going to be able to apply the deep learning paradigm a lot easier on a whole different set of data.”

Pinterest has used the model to build a recommendation feature, and Uber built a fraud detection model, for example.

Language models

While it is “an amazing technological achievement,” it may be too soon to work with large language models, Josifovski said. Being able to scale these models is a relatively new concept, but the challenge is finding the data to train the model. However, he added that using models in the production process requires predictability. There have been good examples of using the models to generate text and webpages. This is a good place to use the models because they aren’t “mission critical,” Josifovski said. In contrast, this type of work won’t fit well initially in machines like self-driving cars.

While language models don’t currently work with chatbots, Josifovski believes they will in the future.

Center for innovation?

In the early years, academia was the center of innovation and research for AI, with large companies developing some proprietary technologies, Josifovski said. Over time, waves of innovation in AI came from bigger companies, like Google, Amazon, Microsoft, and Facebook. As many of the technologies become commoditized, Josifovski predicts another shift, this time to smaller, independent companies. In areas like storage and cloud infrastructure management, well-resourced companies will provide the infrastructure to allow smaller players to develop AI.

“The center of gravity will slightly shift from the larger companies into smaller independent companies,” Josifovski said. “We will see a full ecosystem of companies that [have] been developed now and will shape the future.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Lucata raises $11.9M to accelerate graph analytics with specialized hardware

All the sessions from Transform 2021 are available on-demand now. Watch now.


Graph analytics startup Lucata today announced that it raised $11.9 million in series B funding, bringing its total raised to nearly $30 million. Notre Dame, Middleburg Capital Development, Blu Ventures, Hunt Holdings, Maulick Capital, Varian Capital, Samsung Ventures, and Irish Angels participated in the round, which CEO Michael Mallick says will be put toward commercializing the company’s computing architecture for graph analytics and AI use cases.

Graph analytics is a set of techniques that allows companies to drill down into the interrelationships between organizations, people, and things. The applications span cybersecurity, logistics, neural networks, natural language processing, and ecommerce, but one increasingly popular use case is fraud detection. For large credit card issuers, financial fraud can cost tens of billions of dollars a year. If these companies could run real-time graph analytics on large graph databases, some experts assert, they could detect fraud hours sooner than what’s possible today.

New York-based Lucata offers a hardware platform — Pathfinder — that ostensibly enables organizations to better support large graph analytics workloads. The company leverages “migrating threads” to conduct high-performance, “multi-hop” analytics, including on databases with over 1 trillion vertices. Organizations can use existing graph database software or custom solutions to analyze deep connections on expanded graphs.

“Lucata was founded in 2008 as Emu Technology by Peter Kogge, Jay Brockman, and Ed Upchurch. The company was [started] to commercialize migrating thread technology, which was developed and patented by the founders to address the scale and performance limitations of traditional computing architectures for big data,” Maulick told VentureBeat via email. “Migrating thread technology enables the creation of shared RAM and CPU pools that allow users to process monolithic big data datasets in real-time with no data pruning or database sharding.”

Graph technology

According to Maulick, current machine learning and AI model training on large, sparse datasets often leverage approaches that can skew the results. One method is to reduce the size of the dataset by pruning — i.e., deleting — significant amounts of data during loading that are thought to be unimportant. The other technique is to “shard” the loaded data into smaller subsets of data, which the model training process sequentially processes.

Bias or skew can creep into the models if important data is deleted during pruning. But with Lucata’s technology, Maulick argues, users can avoid this by loading entire datasets into a single RAM image, leading to improved accuracy during training.

“Companies that would potentially benefit from using [our] Lucata computing architecture to improve the performance of their software include Redis Labs, TigerGraph, Neo4j, and other graph database vendors. In addition, software vendors and cloud providers that offer solutions which leverage common machine learning and AI processing frameworks such as PyTorch, TensorFlow and Apache Spark would potentially benefit from using the Lucata Pathfinder platform,” Maulick said.

Because Lucata’s hardware relies on DRAM chips that are in short supply, owing to the worldwide semiconductor shortage, the company anticipates its production schedule will be impacted going forward. But even with this being the case, year-over-year revenue from 2020 to 2021 is internally projected to grow 100%.

Lucata has a workforce of 20 people across its offices in Palo Alto, New York City, and South Bend, Indiana, which it expects will expand to 34 by 2021. “The pandemic impacted the work of our employees in our physical office in New York City but has not had a significant impact on those of our employees who work remotely,” Maulick said.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Graph database platform Neo4j raises $325M to inform decision-making

Elevate your enterprise data technology and strategy at Transform 2021.


Graph platform Neo4j today announced that it raised $325 million at an over $2 billion valuation in a series F round led by Eurazeo, with additional investment from GV. The capital, which brings the company’s total raised to date to over $500 million, will be put toward expanding Neo4j’s platform, workforce, and customer base, the company says.

Markets and Markets anticipates the graph database market will reach $2.4 billion by 2023 from $821.8 million in 2018. And analysts at Gartner expect that enterprise graph processing and graph databases will grow 100% annually through 2022, facilitating decision-making in 30% of organizations by 2023. Graph databases and graph-oriented databases leverage graph structures for semantic queries, with nodes, edges, and properties that store and represent data. They’re a type of non-relational technology that depicts the relationships connecting various entities — like two people in a social network, for instance — and that can analyze interconnected data.

Neo4j offers an open source NoSQL graph database written in Java and Scala with a declarative query language called Cypher. It supports a number of applications, including identity and access management, knowledge graph augmentation, and network and database infrastructure monitoring, as well as risk reporting compliance and social media graphs.

Neo4j’s founders encountered performance problems with relational database management systems, which inspired their decision to build the first Neo4j prototype. Emil Eifrem, the founder and CEO of the company, sketched what today is known as the property graph model on an airplane napkin during a flight to Mumbai in 2000. A property graph is a type of graph where relationships are not only connections but carry a name and some properties.

“Neo4j has been downloaded more than 120 million times by over 200 million developers, more than 50,000 of which are trained. Our main competition is legacy SQL systems that are bogged down by low-performance queries,” Eifrem told VentureBeat via email. “We see competition as a good thing, as smaller companies tend to stake out market niches that might go unidentified by the larger leaders. Competition fuels innovation, as it motivates every vendor to be better, and that’s good news for customers. ”

On the backend

Neo4j features constant time traversals that can scale up to billions of nodes, a flexible property graph schema that adapts over time, and drivers for popular programming languages like JavaScript, .NET, Go, and Python. It’s compliant with ACID (atomicity, consistency, isolation, and durability) requirements, meaning it guarantees database transactions even in the event of power failures and errors. And on the AI front, it supports high-performance graph queries on large datasets.

Neo4j

Above: An example of a graph database created with the Neo4j platform.

Image Credit: Neo4j

Development on Neo4j began in 2003, and it’s been publicly available since 2007 in two editions: a free Community edition and an Enterprise edition. The Enterprise edition adds hot backups, parallel graph algorithms, LDAP and active directory integration, multi-clustering, larger graphs, and more.

“Graph technologies are a purpose-built method for adding and leveraging context from data and are increasingly integrated with machine learning and AI solutions in order to add contextual information … Graphs also serve as a source of truth for AI-related data and components for greater reliability. This is especially important for AI bias. Providing these context and connections to AI systems to have more situationally appropriate outcomes mirrors the decisions in the same way humans do,” Eifrem said. “Graphs can also greatly increase the accuracy of machine learning models with the data you already have. Graphs increase the dimensionality of your data by adding relationships which we know are highly predictive of behavior.”

Graph database growth

Gartner predicts that graph processing and graph databases “will grow at 100% annually over the next few years to accelerate data preparation and enable more complex and adaptive data [analytics].” In a Neo Technology survey conducted by Evans Data Corporation, 49% of companies said that they anticipate taking on real-time recommendations through graph databases in the next two years. Fifty-eight percent said that they’re already using graph databases at scale.

Data analytics is the science of analyzing raw data to extract meaningful insights. A range of organizations can use data to boost their marketing strategies, increase their bottom line, personalize their content, and better understand their customers. Businesses that use big data increase their profits by an average of 8%, according to a survey conducted by BARC.

Startups like TigerGraph, MongoDB, Cambridge Semantics, DataStax, and others compete with Neo4j in a graph database market expected to be worth $2.4 billion by 2023, in addition to incumbents like Microsoft and Oracle. Even Amazon threw its hat in the graph database ring in November 2017 with the launch of Neptune, a fully managed graph database powered by its Amazon Web Services division.

But Neo4j — which has over 500 employees — has achieved a few pretty impressive milestones, including more than 3 million downloads as of November 2018 and over 300 enterprise subscription users. The company counts among its current and previous customers Lyft, Walmart, eBay, Adobe, Orange, Monsanto, IBM, Microsoft, Cisco, Medium, Airbnb, NASA, and the U.S. Army.

Neo4j customer Meredith Corporation says it scaled its Neo4j graph to analyze 30 billion nodes of digital traffic and has tested capacity to accommodate 100 billion in the future. Recently, Neo4j itself demonstrated real-time query performance against a graph with over 200 billion nodes and more than a trillion relationships running on over a thousand machines.

Last year, Neo4j introduced Neo4j for Graph Data Science, which the company claims is the first data science environment built to harness the predictive power of relationships for scenarios like fraud detection, customer and patient journey tracking, and drug discovery. It arrived alongside Neo4j Aura Professional on Google Cloud Platform, a fully integrated graph database service on the Google Cloud Marketplace designed for small and medium-size businesses. Neo4j also recently debuted the Neo4j BI Connector, which presents live graph datasets for analysis within popular business intelligence technologies including Tableau and Looker. And the company rolled out the Neo4j Connector for Apache Spark, an integration tool to move data bi-directionally between the Neo4j Graph Platform and Apache Spark.

In addition to Eurazeo and GV, Creandum also participated in San Mateo, California-based Neo4j’s latest fundraising round, as did Greenbridge Partners, DTCO, Lightrock, and One Peak Partners. Neo4j previously closed a $40 million venture round led by One Peak.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Katana Graph raises $28.5 million to handle unstructured data at scale

Katana Graph, a startup that helps businesses analyze and manage unstructured data at scale, today announced a $28.5 million series A round led by Intel Capital.

Katana Graph was founded by University of Texas at Austin computer science professor Keshav Pingali and assistant professor Chris Rossbach. The company helps businesses ingest large amounts of data into memory, CEO Pingali told VentureBeat in a phone interview. The UT-Austin research group started working with graph processing and unstructured data two years ago and began by advising DARPA on projects that deal with data at scale. Katana Graph works in Python and compiles data using C++.

Like companies that deal with algorithm auditing, AIOps, and model monitoring and management services, startups have emerged to help businesses analyze and label data, which may be why Labelbox raised $40 million and Databricks raised $1 billion.

Katana Graph is currently working with customers in health, pharmaceuticals, and security.

“One of the customers we’re engaged with has a graph with 4.3 trillion pages, and that is an enormous amount of data. So ingesting that kind of data into the memory of a cluster is a big problem, and what we were able to do with the ingest time is reduce the ingest time from a couple of days to about 20 minutes,” Pingali said.

Today’s round included participation from WRVI Capital, Nepenthe Capital, Dell Technologies Capital, and Redline Capital.

Katana Graph was founded in March 2020 and is based in Austin, Texas. The company has 25 employees and is using the funding to expand its marketing, sales, and engineering teams.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Repost: Original Source and Author Link