Data Quality Assessment of Comma Separated Values Using Linked Data Approach

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

With an increasing amount of structured data on the web, the need to understand and convert it into linked data is growing. One of the most frequent data formats is Comma Separated Value (CSV). However, it is not easy to describe metadata such as the datatype, data quality and data provenance along with it. Therefore, to publish CSV on the web, it is required to convert CSV into linked data format. Many approaches exist to facilitate the conversion process from structured data to linked data. However, all methods require additional domain knowledge for the conversion process. The goal of this research is to assist publishers in converting CSV files into linked data without human intervention whilst understanding its quality and root causes of data quality violations. The proposed framework consists of two modules. The first module converts the given CSV file into a knowledge graph based on a proposed ontology which is appended with data quality information. In the second module, triples that have violated the data quality constraints are identified. The results show that it is possible to convert a CSV to a knowledge graph by adding its quality information without the help of external mappings.

Original languageEnglish
Title of host publicationBusiness Information Systems Workshops - BIS 2021 International Workshops, Revised Selected Papers
EditorsWitold Abramowicz, Sören Auer, Milena Stróżyna
PublisherSpringer Science and Business Media Deutschland GmbH
Pages240-250
Number of pages11
ISBN (Print)9783031042157
DOIs
Publication statusPublished - 2022
Event24th International Conference on Business Information Systems, BIS 2021 - Virtual, Online
Duration: 14 Jun 202117 Jun 2021

Publication series

NameLecture Notes in Business Information Processing
Volume444 LNBIP
ISSN (Print)1865-1348
ISSN (Electronic)1865-1356

Conference

Conference24th International Conference on Business Information Systems, BIS 2021
CityVirtual, Online
Period14/06/2117/06/21

Keywords

  • CSV
  • Data quality
  • Knowledge graphs
  • Linked data
  • Quality assessment
  • Root cause analysis

Fingerprint

Dive into the research topics of 'Data Quality Assessment of Comma Separated Values Using Linked Data Approach'. Together they form a unique fingerprint.

Cite this