Abstract
Many staff in higher education have a sense that useful information is buried within their datathat they are unsure how to access, or even what questions it can answer. This is particularly so with survey text responses from large student cohorts. This paper examines valid and repeatable methods to analyze such data while seeking to minimize computational and analyst workload by maximizing machine learning to accommodate the large volume of data.We evaluate clustering and topic modelling as methods to analyze one year's data from a national student survey in Ireland, an anonymized dataset with more than 44, 700 respondents. The primary focus was on free text responses to two questions, namely those seeking to identify the best aspects of students' reported experiences, and those identifying aspects that need improvement. K-means and Latent Dirichlet Allocation unsupervised learners were used to identify key themes emerging from the text data. K-means proved computationally expensive and failed to usefully categorize significant minorities of the data. In contrast, topic modelling had relatively low overheads and effectively categorized more than 97% of the sample data into themes which could be usefully considered in the business domain. From this research, topic modelling provided an effective method to analyze such text data once careful consideration was given to determining the appropriate initial number of topics for configuring the algorithm.
Original language | English |
---|---|
Pages (from-to) | 89-100 |
Number of pages | 12 |
Journal | CEUR Workshop Proceedings |
Volume | 3383 |
Publication status | Published - 2022 |
Event | 1st Finnish Learning Analytics and Artificial Intelligence in Education Conference, FLAIEC 2022 - Joensuu, Finland Duration: 29 Sep 2022 → 30 Sep 2022 |
Keywords
- clustering
- free text
- Higher education
- k-means
- LDA
- machine learning
- student survey
- topic modelling
- unsupervised