English WordNet Taxonomic Random Walk Pseudo-Corpora

Filip Klubick, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher

Research output: Contribution to conferencePaperpeer-review

Abstract

This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of the walk’s hyperparameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.
Original languageEnglish
DOIs
Publication statusPublished - 2020
EventThe 12th Language Resources and Evaluation Conference - Marseilles, France
Duration: 11 May 202016 May 2020

Conference

ConferenceThe 12th Language Resources and Evaluation Conference
Country/TerritoryFrance
CityMarseilles
Period11/05/2016/05/20

Keywords

  • pseudo-corpora
  • random walk
  • English WordNet
  • taxonomy
  • hyperparameters
  • taxonomic word embeddings
  • word embedding space

Fingerprint

Dive into the research topics of 'English WordNet Taxonomic Random Walk Pseudo-Corpora'. Together they form a unique fingerprint.

Cite this