TY - JOUR
T1 - Measuring semantic similarity of documents with weighted cosine and fuzzy logic
AU - Huetle-Figueroa, Juan
AU - Perez-Tellez, Fernando
AU - Pinto, David
N1 - Publisher Copyright:
© 2020 - IOS Press and the authors. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Currently, the semantic analysis is used by different fields, such as information retrieval, the biomedical domain, and natural language processing. The primary focus of this research work is on using semantic methods, the cosine similarity algorithm, and fuzzy logic to improve the matching of documents. The algorithms were applied to plain texts in this case CVs (resumes) and job descriptions. Synsets of WordNet were used to enrich the semantic similarity methods such as the Wu-Palmer Similarity (WUP), Leacock-Chodorow similarity (LCH), and path similarity (hypernym/hyponym). Additionally, keyword extraction was used to create a postings list where keywords were weighted. The task of recruiting new personnel in the companies that publish job descriptions and reciprocally finding a company when workers publish their resumes is discussed in this research work. The creation of a new gold standard was required to achieve a comparison of the proposed methods. A web application was designed to match the documents manually, creating the new gold standard. Thereby the new gold standard confirming benefits of enriching the cosine algorithm semantically. Finally, the results were compared with the new gold standard to check the efficiency of the new methods proposed. The measures used for the analysis were precision, recall, and f-measure, concluding that the cosine similarity weighted semantically can be used to get better similarity scores.
AB - Currently, the semantic analysis is used by different fields, such as information retrieval, the biomedical domain, and natural language processing. The primary focus of this research work is on using semantic methods, the cosine similarity algorithm, and fuzzy logic to improve the matching of documents. The algorithms were applied to plain texts in this case CVs (resumes) and job descriptions. Synsets of WordNet were used to enrich the semantic similarity methods such as the Wu-Palmer Similarity (WUP), Leacock-Chodorow similarity (LCH), and path similarity (hypernym/hyponym). Additionally, keyword extraction was used to create a postings list where keywords were weighted. The task of recruiting new personnel in the companies that publish job descriptions and reciprocally finding a company when workers publish their resumes is discussed in this research work. The creation of a new gold standard was required to achieve a comparison of the proposed methods. A web application was designed to match the documents manually, creating the new gold standard. Thereby the new gold standard confirming benefits of enriching the cosine algorithm semantically. Finally, the results were compared with the new gold standard to check the efficiency of the new methods proposed. The measures used for the analysis were precision, recall, and f-measure, concluding that the cosine similarity weighted semantically can be used to get better similarity scores.
KW - Semantic similarity
KW - cosine enrichment
KW - document similarity
KW - keyword enrichment
KW - semantic matching
UR - http://www.scopus.com/inward/record.url?scp=85091064690&partnerID=8YFLogxK
U2 - 10.3233/JIFS-179889
DO - 10.3233/JIFS-179889
M3 - Article
AN - SCOPUS:85091064690
SN - 1064-1246
VL - 39
SP - 2263
EP - 2278
JO - Journal of Intelligent and Fuzzy Systems
JF - Journal of Intelligent and Fuzzy Systems
IS - 2
ER -