Measuring semantic similarity of documents with weighted cosine and fuzzy logic

Juan Huetle-Figueroa, Fernando Perez-Tellez, David Pinto

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Currently, the semantic analysis is used by different fields, such as information retrieval, the biomedical domain, and natural language processing. The primary focus of this research work is on using semantic methods, the cosine similarity algorithm, and fuzzy logic to improve the matching of documents. The algorithms were applied to plain texts in this case CVs (resumes) and job descriptions. Synsets of WordNet were used to enrich the semantic similarity methods such as the Wu-Palmer Similarity (WUP), Leacock-Chodorow similarity (LCH), and path similarity (hypernym/hyponym). Additionally, keyword extraction was used to create a postings list where keywords were weighted. The task of recruiting new personnel in the companies that publish job descriptions and reciprocally finding a company when workers publish their resumes is discussed in this research work. The creation of a new gold standard was required to achieve a comparison of the proposed methods. A web application was designed to match the documents manually, creating the new gold standard. Thereby the new gold standard confirming benefits of enriching the cosine algorithm semantically. Finally, the results were compared with the new gold standard to check the efficiency of the new methods proposed. The measures used for the analysis were precision, recall, and f-measure, concluding that the cosine similarity weighted semantically can be used to get better similarity scores.

Original languageEnglish
Pages (from-to)2263-2278
Number of pages16
JournalJournal of Intelligent and Fuzzy Systems
Volume39
Issue number2
DOIs
Publication statusPublished - 2020

Keywords

  • Semantic similarity
  • cosine enrichment
  • document similarity
  • keyword enrichment
  • semantic matching

Fingerprint

Dive into the research topics of 'Measuring semantic similarity of documents with weighted cosine and fuzzy logic'. Together they form a unique fingerprint.

Cite this