Abstract
Collaborative deep neural network (DNN) inference over edge and cloud is emerging as an effective approach for enabling several Internet of Things (IoT) applications. Edge devices are mainly resource-constrained and hence can not afford the computational complexity manifested by DNNs. Thereby, researchers have resorted to a collaborative computing approach, where a DNN is partitioned between edge and cloud. Recent art on DNN partitioning has either focused on bandwidth-specific partitioning or relied on offline benchmarking of DNN layers. However, edge devices are inherently heterogeneous and possess inconsistent levels and types of resources. Therefore, in this work, we propose a resource-aware partitioning of DNNs for accelerating collaborative inference over edge-cloud. The proposed approach provides the flexibility of partitioning a DNN with respect to the available nature and scale of resources for a certain edge device. Unlike state-of-the-art, we exploit different types of DNN complexities for partitioning them on heterogeneous edge devices. For example, in a bandwidth-constrained scenario, our approach gained 40% efficiency as compared to the offline benchmarking approach. Therefore, given the different nature of edge devices' computational, storage, and energy requirements, this approach provides a suitable configuration for edge-cloud synergetic inference.
Original language | English |
---|---|
Pages (from-to) | 5649-5655 |
Number of pages | 7 |
Journal | Proceedings - IEEE Global Communications Conference, GLOBECOM |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Event | 2022 IEEE Global Communications Conference, GLOBECOM 2022 - Virtual, Online, Brazil Duration: 4 Dec 2022 → 8 Dec 2022 |
Keywords
- Collaborative DNN inference
- Edge intelligence
- Internet of things
- Resource efficiency