Graph Acceleration: Multi Node with Multi GPU(MNMG) TigerGraph (CPU) vs. CuGraph (GPU)

Introduction

This report presents a detailed analysis of the performance of various graph algorithms executed on TigerGraph (CPU) and CuGraph (GPU) platforms. The purpose is to evaluate the efficiency and speedup gained by utilising GPU acceleration through CuGraph in comparison to the traditional CPU-based execution in TigerGraph. The benchmarked algorithms include Pagerank,Jaccard and Louvain, and the analysis focuses on execution times, speedup, and potential discrepancies between the two platforms.

Experimental Setup

Cluster Resources
Challenges Faced

During the performance evaluation, several challenges were encountered that required addressing:

  1. Hiccups during TigerGraph Installation: The installation of TigerGraph on the cluster faced several obstacles. These included improper core utilisation, experimentation with multiple TigerGraph versions, and resolving issues with the seg_size parameter. Ultimately, the TigerGraph 3.9.x version with seg_size 10 was settled upon.
  2. Configuration of Dask cuGraph: Configuring Dask cuGraph for efficient GPU utilisation proved to be complex. A number of parameters, such as GPUS_PER_NODE,WORKER_RMM_POOL_SIZE,and DASK_CUDA_INTERFACE, were experimented with to determine the optimal settings.
  3. Cross-Party Communication for Core Utilisation: It was noted that TigerGraph was not utilising cores effectively, leading to suboptimal performance. Effective communication and collaboration were established between different parties involved to diagnose and address the low core utilisation issue.
  4. Multi-Node GPU Server Availability: To enhance GPU-accelerated performance, communication was initiated with server providers, specifically Dell, to discuss the availability of multi-node GPU servers that could potentially provide more GPU resources for improved CuGraph execution.

Algorithm Performance

PageRank Algorithm

The Pagerank algorithm measures the importance of nodes in a graph. The following table presents the execution times and speedup achieved when running Pagerank on both platforms:

The GPU execution time is significantly lower than the CPU execution time, resulting in notable speedup for all graph sizes.

Note: We are not able to support graph more than ldbc-26 for PageRank, due to OOM

Louvain Algorithm

The Louvain algorithm identifies communities within a graph. The following table presents the execution times and speedup achieved when running Louvain on both platforms:

Again, the GPU execution time is significantly lower, resulting in substantial speedup across all graph sizes.

Note: We are not able to support graph more than ldbc-26 for Louvain, due to OOM

Jaccard Similarity Algorithm

The following table presents the execution times and speedup achieved when running Jaccard on both platforms

Existing Issues

Conclusion

The experimental results clearly demonstrate the superior performance of GPU-accelerated CuGraph over traditional CPU-based TigerGraph for both Pagerank and Louvain algorithms. The execution times on GPU are drastically reduced, leading to significant speedup. This indicates the potential of GPU acceleration to enhance the efficiency and performance of graph algorithms in various applications. However, it's important to note that algorithm-specific optimizations and platform compatibility may impact the actual speedup achieved in different scenarios.

Let's Talk
GET YOUR DIGITAL TRANSFORMATION STARTED