Assessing the usefulness of different feature sets for predicting the comprehension difficulty of text

Brian Mac Namee, John D. Kelleher, Noel Fitzpatrick

Research output: Contribution to journalConference articlepeer-review

Abstract

Within English second language acquisition there is an enthusiasm for using authentic text as learning materials in classroom and online settings. This enthusiasm, however, is tempered by the difficulty in finding authentic texts at suitable levels of comprehension difficulty for specific groups of learners. An automated way to rate the comprehension difficulty of a text would make finding suitable texts a much more manageable task. While readability metrics have been in use for over 50 years now they only capture a small amount of what constitutes comprehension difficulty. In this paper we examine other features of texts that are related to comprehension difficulty and assess their usefulness in building automated prediction models. We investigate readability metrics, vocabulary-based features, and syntax-based features, and show that the best prediction accuracies are possible with a combination of all three.

Original languageEnglish
Pages (from-to)12-25
Number of pages14
JournalCEUR Workshop Proceedings
Volume2086
DOIs
Publication statusPublished - 2017
Event25th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2017 - Dublin, Ireland
Duration: 7 Dec 20178 Dec 2017

Keywords

  • English second language acquisition
  • authentic text
  • comprehension difficulty
  • readability metrics
  • vocabulary-based features
  • syntax-based features
  • prediction models

Fingerprint

Dive into the research topics of 'Assessing the usefulness of different feature sets for predicting the comprehension difficulty of text'. Together they form a unique fingerprint.

Cite this