Abstract
This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of the walk’s hyperparameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.
| Original language | English |
|---|---|
| DOIs | |
| Publication status | Published - 2020 |
| Event | The 12th Language Resources and Evaluation Conference - Marseilles, France Duration: 11 May 2020 → 16 May 2020 |
Conference
| Conference | The 12th Language Resources and Evaluation Conference |
|---|---|
| Country/Territory | France |
| City | Marseilles |
| Period | 11/05/20 → 16/05/20 |
Keywords
- pseudo-corpora
- random walk
- English WordNet
- taxonomy
- hyperparameters
- taxonomic word embeddings
- word embedding space