Abstract
Within English second language acquisition there is an enthusiasm for using authentic text as learning materials in classroom and online settings. This enthusiasm, however, is tempered by the difficulty in finding authentic texts at suitable levels of comprehension difficulty for specific groups of learners. An automated way to rate the comprehension difficulty of a text would make finding suitable texts a much more manageable task. While readability metrics have been in use for over 50 years now they only capture a small amount of what constitutes comprehension difficulty. In this paper we examine other features of texts that are related to comprehension difficulty and assess their usefulness in building automated prediction models. We investigate readability metrics, vocabulary-based features, and syntax-based features, and show that the best prediction accuracies are possible with a combination of all three.
Original language | English |
---|---|
Pages (from-to) | 12-25 |
Number of pages | 14 |
Journal | CEUR Workshop Proceedings |
Volume | 2086 |
DOIs | |
Publication status | Published - 2017 |
Event | 25th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2017 - Dublin, Ireland Duration: 7 Dec 2017 → 8 Dec 2017 |
Keywords
- English second language acquisition
- authentic text
- comprehension difficulty
- readability metrics
- vocabulary-based features
- syntax-based features
- prediction models