TY - GEN
T1 - Named Entity Recognition in Spanish Biomedical Literature
T2 - 26th Conference of Open Innovations Association FRUCT, FRUCT 2020
AU - Akhtyamova, Liliya
N1 - Publisher Copyright:
© 2020 FRUCT.
PY - 2020/4
Y1 - 2020/4
N2 - Entity Recognition (NER) is the first step for knowledge acquisition when we deal with an unknown corpus of texts. Having received these entities, we have an opportunity to form parameters space and to solve problems of text mining as concept normalization, speech recognition, etc. The recent advances in NER are related to the technology of contextualized word embeddings, which transforms text to the form being effective for Deep Learning. In the paper, we show how NER model detects pharmacological substances, compounds, and proteins in the dataset obtained from the Spanish Clinical Case Corpus (SPACCC). To achieve this goal, we train from scratch the BERT language representation model and fine-tune it for our problem. As it is expected, this model shows better results than the NER model trained over the standard word embeddings. We further conduct an error analysis showing the origins of models' errors and proposing strategies to further improve the model's quality.
AB - Entity Recognition (NER) is the first step for knowledge acquisition when we deal with an unknown corpus of texts. Having received these entities, we have an opportunity to form parameters space and to solve problems of text mining as concept normalization, speech recognition, etc. The recent advances in NER are related to the technology of contextualized word embeddings, which transforms text to the form being effective for Deep Learning. In the paper, we show how NER model detects pharmacological substances, compounds, and proteins in the dataset obtained from the Spanish Clinical Case Corpus (SPACCC). To achieve this goal, we train from scratch the BERT language representation model and fine-tune it for our problem. As it is expected, this model shows better results than the NER model trained over the standard word embeddings. We further conduct an error analysis showing the origins of models' errors and proposing strategies to further improve the model's quality.
UR - http://www.scopus.com/inward/record.url?scp=85085035588&partnerID=8YFLogxK
U2 - 10.23919/FRUCT48808.2020.9087359
DO - 10.23919/FRUCT48808.2020.9087359
M3 - Conference contribution
AN - SCOPUS:85085035588
T3 - Conference of Open Innovation Association, FRUCT
SP - 3
EP - 9
BT - Proceedings of the 26th Conference of Open Innovations Association FRUCT, FRUCT 2020
A2 - Balandin, Sergey
A2 - Paramonov, Ilya
A2 - Tyutina, Tatiana
PB - IEEE Computer Society
Y2 - 23 April 2020 through 24 April 2020
ER -