TY - GEN
T1 - MULTI-MODAL SELF-SUPERVISED REPRESENTATION LEARNING FOR EARTH OBSERVATION
AU - Jain, Pallavi
AU - Schoen-Phelan, Bianca
AU - Ross, Robert
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Self-Supervised learning (SSL) has reduced the performance gap between supervised and unsupervised learning, due to its ability to learn invariant representations. This is a boon to the domains like Earth Observation (EO), where labelled data availability is scarce but unlabelled data is freely available. While Transfer Learning from generic RGB pre-trained models is still common-place in EO, we argue that, it is essential to have good EO domain specific pre-trained model in order to use with downstream tasks with limited labelled data. Hence, we explored the applicability of SSL with multi-modal satellite imagery for downstream tasks. For this we utilised the state-of-art SSL architectures i.e. BYOL and SimSiam to train on EO data. Also to obtain better invariant representations, we considered multi-spectral (MS) images and synthetic aperture radar (SAR) images as separate augmented views of an image to maximise their similarity. Our work shows that by learning single channel representations through non-contrastive learning, our approach can outperform ImageNet pre-trained models significantly on a scene classification task. We further explored the usefulness of a momentum encoder by comparing the two architectures i.e. BYOL and SimSiam but did not identify a significant improvement in performance between the models.
AB - Self-Supervised learning (SSL) has reduced the performance gap between supervised and unsupervised learning, due to its ability to learn invariant representations. This is a boon to the domains like Earth Observation (EO), where labelled data availability is scarce but unlabelled data is freely available. While Transfer Learning from generic RGB pre-trained models is still common-place in EO, we argue that, it is essential to have good EO domain specific pre-trained model in order to use with downstream tasks with limited labelled data. Hence, we explored the applicability of SSL with multi-modal satellite imagery for downstream tasks. For this we utilised the state-of-art SSL architectures i.e. BYOL and SimSiam to train on EO data. Also to obtain better invariant representations, we considered multi-spectral (MS) images and synthetic aperture radar (SAR) images as separate augmented views of an image to maximise their similarity. Our work shows that by learning single channel representations through non-contrastive learning, our approach can outperform ImageNet pre-trained models significantly on a scene classification task. We further explored the usefulness of a momentum encoder by comparing the two architectures i.e. BYOL and SimSiam but did not identify a significant improvement in performance between the models.
KW - Satellite images
KW - Self-supervised learning
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85126055646&partnerID=8YFLogxK
U2 - 10.1109/IGARSS47720.2021.9553741
DO - 10.1109/IGARSS47720.2021.9553741
M3 - Conference contribution
AN - SCOPUS:85126055646
T3 - International Geoscience and Remote Sensing Symposium (IGARSS)
SP - 3241
EP - 3244
BT - IGARSS 2021 - 2021 IEEE International Geoscience and Remote Sensing Symposium, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2021
Y2 - 12 July 2021 through 16 July 2021
ER -