Abstract
Optimal partitioning of a square computational domain over several heterogeneous processors, balancing the load of the processors and minimizing the inter-processor communication cost, is crucial for data parallel dense linear algebra and other applications having similar communication pattern on modern hybrid servers. Although a solution has been found for two processors, the cases of three and more processors are still open. The state of-the-art solution for three processors uses an approximation communication cost function which fails to accurately account for the total amount of data moved between processors, leaving thus the question of its global optimality unanswered. In this work, we formulate and solve a mathematical problem of optimal partitioning a real-valued square over three heterogeneous processors using a new cost function, which accurately accounts for the total amount of data communicated between processors. We also develop an original method for accurate experimental evaluation of the communication time of data movement between memories of the compute devices in the hybrid platform during the execution of data parallel applications. We successfully use this method in the experimental validation of our mathematical results. Finally, we propose a communication energy model predicting the dynamic energy consumption of data movement between processors and experimentally validate its accuracy. This model predicts, and the experiments confirm, that the performance-optimal partition is not necessarily energy optimal.
Original language | English |
---|---|
Article number | 9328411 |
Pages (from-to) | 17229-17244 |
Number of pages | 16 |
Journal | IEEE Access |
Volume | 9 |
DOIs | |
Publication status | Published - 2021 |
Keywords
- communication optimization
- data parallelism
- Data partitioning
- energy model
- energy of communication
- heterogeneous computing
- matrix multiplication
- non-rectangular partitioning
- performance model