Skip to main navigation Skip to search Skip to main content

Towards a Better Replica Management for Hadoop Distributed File System

  • Hilmi Egemen Ciritoglu
  • , Takfarinas Saber
  • , Teodora Sandra Buda
  • , John Murphy
  • , Christina Thorpe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Hadoop Distributed File System (HDFS) is the storage of choice when it comes to large-scale distributed systems. In addition to being efficient and scalable, HDFS provides high throughput and reliability through the replication of data. Recent work exploits this replication feature by dynamically varying the replication factor of in-demand data as a means of increasing data locality and achieving a performance improvement. However, to the best of our knowledge, no study has been performed on the consequences of varying the replication factor. In particular, our work is the first to show that although HDFS deals well with increasing the replication factor, it experiences problems with decreasing it. This leads to unbalanced data, hot spots, and performance degradation. In order to address this problem, we propose a new workload-aware balanced replica deletion algorithm. We also show that our algorithm successfully maintains the data balance and achieves up to 48% improvement in execution time when compared to HDFS, while only creating an overhead of 1.69% on average.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Congress on Big Data, BigData Congress 2018 - Part of the 2018 IEEE World Congress on Services
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages104-111
Number of pages8
ISBN (Electronic)9781538672327
DOIs
Publication statusPublished - 7 Sep 2018
Externally publishedYes
Event7th IEEE International Congress on Big Data, BigData Congress 2018 Part of the 2018 IEEE World Congress on Services - San Francisco, United States
Duration: 2 Jul 20187 Jul 2018

Publication series

NameProceedings - 2018 IEEE International Congress on Big Data, BigData Congress 2018 - Part of the 2018 IEEE World Congress on Services

Conference

Conference7th IEEE International Congress on Big Data, BigData Congress 2018 Part of the 2018 IEEE World Congress on Services
Country/TerritoryUnited States
CitySan Francisco
Period2/07/187/07/18

Keywords

  • Hadoop Distributed File System
  • Replication Factor
  • Software Performance

Fingerprint

Dive into the research topics of 'Towards a Better Replica Management for Hadoop Distributed File System'. Together they form a unique fingerprint.

Cite this