TY - JOUR
T1 - Testing contextualized word embeddings to improve NER in Spanish clinical case narratives
AU - Akhtyamova, Liliya
AU - Martínez, Paloma
AU - Verspoor, Karin
AU - Cardiff, John
N1 - Publisher Copyright:
© 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2020
Y1 - 2020
N2 - In the Big Data era, there is an increasing need to fully exploit and analyze the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute by extracting relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patients' discharge summaries and radiology reports. The extracted information can help in health-related decision making processes. The Named Entity Recognition (NER) task, which detects important concepts in texts (e.g., diseases, symptoms, drugs, etc.), is crucial in the information extraction process yet has received little attention in languages other than English. In this work, we develop a deep learning-based NLP pipeline for biomedical entity extraction in Spanish clinical narratives. We explore the use of contextualized word embeddings, which incorporate context variation into word representations, to enhance named entity recognition in Spanish language clinical text, particularly of pharmacological substances, compounds, and proteins. Various combinations of word and sense embeddings were tested on the evaluation corpus of the PharmacoNER 2019 task, the Spanish Clinical Case Corpus (SPACCC). This data set consists of clinical case sections extracted from open access Spanish-language medical publications. Our study shows that our deep-learning-based system with domain-specific contextualized embeddings coupled with stacking of complementary embeddings yields superior performance over a system with integrated standard and general-domain word embeddings. With this system, we achieve performance competitive with the state-of-the-art.
AB - In the Big Data era, there is an increasing need to fully exploit and analyze the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute by extracting relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patients' discharge summaries and radiology reports. The extracted information can help in health-related decision making processes. The Named Entity Recognition (NER) task, which detects important concepts in texts (e.g., diseases, symptoms, drugs, etc.), is crucial in the information extraction process yet has received little attention in languages other than English. In this work, we develop a deep learning-based NLP pipeline for biomedical entity extraction in Spanish clinical narratives. We explore the use of contextualized word embeddings, which incorporate context variation into word representations, to enhance named entity recognition in Spanish language clinical text, particularly of pharmacological substances, compounds, and proteins. Various combinations of word and sense embeddings were tested on the evaluation corpus of the PharmacoNER 2019 task, the Spanish Clinical Case Corpus (SPACCC). This data set consists of clinical case sections extracted from open access Spanish-language medical publications. Our study shows that our deep-learning-based system with domain-specific contextualized embeddings coupled with stacking of complementary embeddings yields superior performance over a system with integrated standard and general-domain word embeddings. With this system, we achieve performance competitive with the state-of-the-art.
KW - Clinical case narratives
KW - Contextualized word embeddings
KW - Deep learning
KW - Language representations
KW - Named entity recognition
KW - Natural language processing
KW - Spanish language
UR - https://www.scopus.com/pages/publications/85101421017
U2 - 10.1109/ACCESS.2020.3018688
DO - 10.1109/ACCESS.2020.3018688
M3 - Article
AN - SCOPUS:85101421017
SN - 2169-3536
VL - 8
SP - 164717
EP - 164726
JO - IEEE Access
JF - IEEE Access
ER -