Eavesdropping Hackers: Detecting Software Vulnerability Communication on Social Media Using Text Mining

Susan McKeever, Brian Keegan, Andrei Queiroz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on Deep/Dark Web and even Surface Web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how feature reduction techniques, such as Document Frequency Reduction, Chi-square and Singular Value Decomposition (SVD) can be used to reduce the number of features of the model without impacting the detection performance. We conclude that: (1) a Support Vector Machine (SVM) algorithm used with traditional Bag of Words achieved highest accuracies (2) The increase of the minority class with Random Oversampling technique improves the detection performance of the model by 5% on average, and (3) The number of features of the model can be reduced by up to 10% without affecting the detection performance. Also, we have provided the labelled dataset used in this work for further research. These findings can be used to support Cyber Security Threat Intelligence (CTI) with respect to the use of text mining techniques for detecting security-related communication
Original languageEnglish
Title of host publicationCYBER 2019 : The Fourth International Conference on Cyber-Technologies and Cyber-Systems
Pages41-48
DOIs
Publication statusPublished - 2019
EventCyber 2019 - Porto, Portugal
Duration: 1 Jan 2019 → …

Conference

ConferenceCyber 2019
Country/TerritoryPortugal
CityPorto
Period1/01/19 → …

Keywords

  • Cyber security
  • hacker attacks
  • Deep/Dark Web
  • Surface Web
  • supervised machine learning
  • text mining
  • software-vulnerability-related communications
  • sampling approaches
  • feature reduction techniques
  • Support Vector Machine
  • Bag of Words
  • Random Oversampling
  • Cyber Security Threat Intelligence

Fingerprint

Dive into the research topics of 'Eavesdropping Hackers: Detecting Software Vulnerability Communication on Social Media Using Text Mining'. Together they form a unique fingerprint.

Cite this