TY - JOUR
T1 - SeTransformer
T2 - A Transformer-Based Code Semantic Parser for Code Comment Generation
AU - Li, Zheng
AU - Wu, Yonghao
AU - Peng, Bin
AU - Chen, Xiang
AU - Sun, Zeyu
AU - Liu, Yong
AU - Paul, Doyle
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2023/3/1
Y1 - 2023/3/1
N2 - Automated code comment generation technologies can help developers understand code intent, which can significantly reduce the cost of software maintenance and revision. The latest studies in this field mainly depend on deep neural networks, such as convolutional neural networks and recurrent neural network. However, these methods may not generate high-quality and readable code comments due to the long-Term dependence problem, which means that the code blocks used to summarize information are far from each other. Owing to the long-Term dependence problem, these methods forget the previous input data's feature information during the training process. In this article, to solve the long-Term dependence problem and extract both the text and structure information from the program code, we propose a novel improved-Transformer-based comment generation method, named SeTransformer. Specifically, the SeTransformer utilizes the code tokens and an abstract syntax tree (AST) of programs to extract information as the inputs, and then, it leverages the self-Attention mechanism to analyze the text and structural features of code simultaneously. Experimental results based on public corpus gathered from large-scale open-source projects show that our method can significantly outperform five state-of-The-Art baselines (such as Hybrid-DeepCom and AST-Attendgru). Furthermore, we also conduct a questionnaire survey for developers, and the results show that the SeTransformer can generate higher quality comments than those of other baselines.
AB - Automated code comment generation technologies can help developers understand code intent, which can significantly reduce the cost of software maintenance and revision. The latest studies in this field mainly depend on deep neural networks, such as convolutional neural networks and recurrent neural network. However, these methods may not generate high-quality and readable code comments due to the long-Term dependence problem, which means that the code blocks used to summarize information are far from each other. Owing to the long-Term dependence problem, these methods forget the previous input data's feature information during the training process. In this article, to solve the long-Term dependence problem and extract both the text and structure information from the program code, we propose a novel improved-Transformer-based comment generation method, named SeTransformer. Specifically, the SeTransformer utilizes the code tokens and an abstract syntax tree (AST) of programs to extract information as the inputs, and then, it leverages the self-Attention mechanism to analyze the text and structural features of code simultaneously. Experimental results based on public corpus gathered from large-scale open-source projects show that our method can significantly outperform five state-of-The-Art baselines (such as Hybrid-DeepCom and AST-Attendgru). Furthermore, we also conduct a questionnaire survey for developers, and the results show that the SeTransformer can generate higher quality comments than those of other baselines.
KW - Code comment generation
KW - convolutional neural network (CNN)
KW - deep learning
KW - program comprehension
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85127463067&partnerID=8YFLogxK
U2 - 10.1109/TR.2022.3154773
DO - 10.1109/TR.2022.3154773
M3 - Article
AN - SCOPUS:85127463067
SN - 0018-9529
VL - 72
SP - 258
EP - 273
JO - IEEE Transactions on Reliability
JF - IEEE Transactions on Reliability
IS - 1
ER -