Method Proposed for Avoidance of Data Replication in Distributed Storage System
Main Article Content
Abstract
Data replication is the process of copying and maintaining files in multiple locations to ensure consistency, reliability, and accessibility. Data replication in the cloud offers many benefits, but it also comes with certain drawbacks like increase storage cost. The process of replication can consume significant resources, including CPU, memory, and network bandwidth, which might impact the performance of primary systems or applications. Also Replicating data across different regions or cloud providers may expose it to additional security risks if proper encryption and access controls are not implemented. As the volume of data grows, the replication process might become more complex and resource-intensive, requiring additional infrastructure or optimization to handle the increased load [1].
The study consider example of GFS and HDFS which are two most used distributed file systems for dealing with huge clusters where big data lives [3].
Avoiding data replication can be advantageous in terms of cost efficiency, resource utilization, and system simplicity. Efficient resource utilization in cloud storage involves optimizing how storage resources are allocated, used, and managed to achieve cost savings, performance improvements, and operational efficiency. The study suggest HoneyBee algorithm for resource optimization. HoneyBee is a cloud-native storage system designed to optimize data placement and resource utilization in distributed storage environments.