TY - JOUR
T1 - Investigating the Impact of Encoder Architectures and Batch Size on Depth Estimation through Semantic Consistency
AU - Nosheen, Iqra
AU - Iqbal, Talha
AU - Ullah, Ihsan
AU - Ennis, Cathy
AU - Madden, Michael G.
N1 - Publisher Copyright:
© This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/)
PY - 2024
Y1 - 2024
N2 - Traditional methods for depth estimation rely on supervised learning with resource-intensive LiDAR data. Virtual synthetic datasets provide a cost-effective alternative, but bridging the domain gap between synthetic and real-world data remains a significant challenge. In existing work, this gap is addressed through domain adaptation techniques, aligning the feature distributions of synthetic (source) and real-world (target) domains. Our study explores the efficacy of different encoder architectures (ResNet variants with 35, 50, 101, 101-with-attention, and 152 convolution layers) and two batch sizes (2 and 4) for the depth estimation task. Our experiments show that ResNet101 without and with attention mechanisms provide the best performance across 2 and 4 batch sizes, respectively, compared to the other models. Conversely, the deeper architecture considered, ResNet152, shows the lowest performance, indicating that increasing the network depth does not necessarily lead to improved results for depth estimation tasks. This study's findings provide valuable insights for developing more effective depth estimation algorithms, and it suggests future directions in hyperparameter optimization and semantic consistency modeling.
AB - Traditional methods for depth estimation rely on supervised learning with resource-intensive LiDAR data. Virtual synthetic datasets provide a cost-effective alternative, but bridging the domain gap between synthetic and real-world data remains a significant challenge. In existing work, this gap is addressed through domain adaptation techniques, aligning the feature distributions of synthetic (source) and real-world (target) domains. Our study explores the efficacy of different encoder architectures (ResNet variants with 35, 50, 101, 101-with-attention, and 152 convolution layers) and two batch sizes (2 and 4) for the depth estimation task. Our experiments show that ResNet101 without and with attention mechanisms provide the best performance across 2 and 4 batch sizes, respectively, compared to the other models. Conversely, the deeper architecture considered, ResNet152, shows the lowest performance, indicating that increasing the network depth does not necessarily lead to improved results for depth estimation tasks. This study's findings provide valuable insights for developing more effective depth estimation algorithms, and it suggests future directions in hyperparameter optimization and semantic consistency modeling.
KW - Batch sizes
KW - Depth Estimation
KW - Encoder Architectures
KW - Image translation
KW - Semantic Consistency
UR - https://www.scopus.com/pages/publications/85216749976
U2 - 10.1049/icp.2024.3295
DO - 10.1049/icp.2024.3295
M3 - Conference article
AN - SCOPUS:85216749976
SN - 2732-4494
VL - 2024
SP - 134
EP - 137
JO - IET Conference Proceedings
JF - IET Conference Proceedings
IS - 10
T2 - 26th Irish Machine Vision and Image Processing Conference, IMVIP 2024
Y2 - 21 August 2024 through 23 August 2024
ER -