Skip to main navigation Skip to search Skip to main content

Network-aware optimization of communications for parallel matrix multiplication on hierarchical HPC platforms

Research output: Contribution to journalArticlepeer-review

Abstract

Communications on hierarchical heterogeneous high-performance computing platforms can be optimized based on topology and performance information. For MPI, as a major programming tool for such platforms, a number of topology-aware and performance-aware implementations of collective operations have been proposed for optimal scheduling of messages. This approach improves performance of application and does not require to modify application source code. However, it is applicable to collective operations only and does not affect the parts of the application that are based on point-to-point exchanges. In this paper, we address the problem of efficient execution of data-parallel applications on interconnected clusters and present optimizations that improve data partition by taking into account the entire communication flow of the application. This approach is also non-intrusive to the source code but application specific. For illustration, we use parallel matrix multiplication, where the matrices are partitioned into irregular two-dimensional rectangles assigned to different processors and arranged in columns, and the processors communicate over this partition vertically and horizontally. By rearranging the rectangles, we can minimize communications between different levels of the network hierarchy. Finding the optimal arrangement is NP-complete; therefore, we propose two heuristic approaches based on evaluation of the communication flow on the given network topology. We demonstrate the correctness and efficiency of the proposed approaches by experimental results on multicore nodes and interconnected heterogeneous clusters.
Original languageEnglish
Pages (from-to)802-821
Number of pages20
JournalConcurrency Computation
Volume28
Issue number3
DOIs
Publication statusPublished - 2016

Keywords

  • communication performance models
  • data partitioning
  • heterogeneous computing
  • matrix multiplication
  • performance-aware optimization
  • topology-aware optimization

Fingerprint

Dive into the research topics of 'Network-aware optimization of communications for parallel matrix multiplication on hierarchical HPC platforms'. Together they form a unique fingerprint.

Cite this