The untapped potential of HPC + graphs computation


Over the past few years, AI has crossed the threshold from the hype to reality. Today, with unstructured data growth of 23% per year in an average organization, the combination of knowledge graphs and high performance computing (HPC) enables organizations to leverage AI on massive data sets .

Full Disclosure: Before I talk about the importance of Graphics Computing + HPC, I have to tell you that I am the CEO of a Graphics Computing, AI and Analytics company, so I definitely have an interest and a perspective here. But I’ll also tell you that our company is one of many in this space – DGraph, MemGraph, TigerGraph, Neo4j, Amazon Neptune, and Microsoft’s CosmosDB, for example, all use some form of HPC + graphics computation. And there are plenty of other graph companies and open source graph options out there, including OrientDB, Titan, ArangoDB, Nebula Graph, and JanusGraph. So there is a bigger movement here, and this is one that you will want to know about.

Knowledge graphs organize data from seemingly disparate sources to highlight relationships between entities. While knowledge graphs themselves are not new (Facebook, Amazon, and Google have invested a lot of money over the years in knowledge graphs that can understand user intentions and preferences), its coupling with the HPC empowers organizations to understand anomalies and other patterns in data at unmatched rates of scale and speed.

There are two main reasons for this.

First, graphics can be very large: data sizes of 10 to 100 TB are not uncommon. Organizations today can have charts with billions of nodes and hundreds of billions of edges. In addition, nodes and edges can be associated with a lot of property data. Using HPC techniques, a knowledge graph can be shared among machines in a large cluster and processed in parallel.

The second reason HPC techniques are essential for large-scale computation on graphs is the need for rapid analysis and inference in many application areas. One of the first use cases I encountered was with the Defense Advanced Research Projects Agency (DARPA), which first used HPC-enhanced knowledge graphs for real-time intrusion detection. in their computer networks. This application involved building a special type of knowledge graph called an interaction graph, which was then analyzed using machine learning algorithms to identify anomalies. Since cyberattacks can go undetected for months (hackers in the recent SolarWinds breach have been in hiding for at least nine months), the need to immediately identify suspicious models is obvious.

Today, I see a number of other fast growing use cases emerging that are very relevant and compelling to data scientists, including the following.

Financial Services – Fraud, Risk Management and Client 360

Digital payments are gaining more and more ground – more than three-quarters of people in the United States use some form of digital payment. However, the number of fraudulent activities is also increasing. Last year, the dollar amount of attempted fraud increased by 35%. Many financial institutions still rely on rules-based systems, which fraudsters can bypass quite easily. Even institutions that rely on AI techniques can usually only analyze data collected over a short period of time due to the large number of transactions that occur every day. Current mitigation measures therefore lack a holistic view of the data and fail to adequately address the growing problem of financial fraud.

A high-performance graph computing platform can efficiently ingest data corresponding to billions of transactions through a cluster of machines, then run a sophisticated pipeline of graph analyzes such as centrality metrics and AI algorithms. graphs for tasks such as clustering and classifying nodes, often using Graph. Neural networks (GNN) to generate vector spatial representations for graph entities. These allow the system to identify fraudulent behavior and more effectively prevent anti-money laundering activities. GNN calculations are very hungry in floating point and can be accelerated by exploiting tensor calculation accelerators.

Second, HPC and knowledge graphs associated with graphical AI are essential for performing risk assessment and monitoring, which has become more difficult with the increasing size and complexity of the interconnected global financial markets. Risk management systems built on traditional relational databases are insufficiently equipped to identify risks hidden in a large pool of transactions, accounts and users, as they often ignore the relationships between entities. In contrast, a graphical AI solution learns from connectivity data and not only identifies risks more accurately, but also explains why they are considered risks. It is critical that the solution leverages HPC to reveal risks in a timely manner before they escalate.

Finally, a financial services organization can aggregate various customer touchpoints and integrate them into a consolidated 360-degree view of the customer journey. With millions of disparate transactions and interactions by end users – and across different bank branches – financial services institutions can evolve their customer engagement strategies, better identify credit risk, personalize product offerings, and implementing loyalty strategies.

Pharmaceutical industry – accelerating drug discovery and precision medicine

Between 2009 and 2018, US biopharmaceutical companies spent around $ 1 billion to bring new drugs to market. A significant fraction of that money is wasted exploring potential lab treatments that ultimately don’t work. Therefore, the process of drug discovery and development can take 12 years or more. In particular, the COVID-19 pandemic has highlighted the importance of cost-effective and rapid drug discovery.

A high-performance graphics computing platform can enable bioinformatics and chemistry researchers to store, query, mine, and develop AI models using heterogeneous data sources to reveal more quickly. revolutionary information. Timely and actionable information can not only save money and resources, but also save human lives.

The challenges with this data and AI-powered drug discovery have focused on three main factors: the difficulty of ingesting and integrating complex networks of biological data, the struggle to contextualize relationships within this data and the complications of extracting information through the volume of data. in an evolutionary way. As in the financial industry, HPC is essential to resolve these issues within a reasonable timeframe.

Top use cases actively investigated across all major pharmaceutical companies include drug hypothesis generation and precision medicine for cancer treatment, using heterogeneous data sources such as charts of bioinformatics and chemico-informatics knowledge as well as gene expression, imaging, patient clinical data and epidemiological information to form AI graphics models. While there are many algorithms to solve these problems, a popular approach is to use convolutional graph networks (GCNs) to integrate the nodes into a high-dimensional space, and then use the geometry of that space to solve problems. such as link prediction and node classification. .

Another important aspect is the explainability of AI graph models. AI models cannot be treated like black boxes in the pharmaceutical industry, as actions can have dire consequences. State-of-the-art explainability methods such as the GNNExplainer and Guided Gradient (GGD) methods are computationally intensive and therefore require high performance graphics computing platforms.

The bottom line

Graphics technologies are becoming more prevalent and organizations and industries are learning how to make the most of them efficiently. While there are several approaches to working with knowledge graphs, pairing them with high performance computing transforms that space and gives data scientists the tools to take full advantage of business data.

Keshav Pingali is CEO and co-founder of Katana Graph, a high performance graphics intelligence company. He is the WA “Tex” Moncrief Chair in Computer Science at the University of Texas at Austin, is a member of ACM, IEEE and AAAS, and is a foreign member of Academia Europeana.


VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the topics that interest you
  • our newsletters
  • Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
  • networking features, and more

Become a member

Leave A Reply

Your email address will not be published.