“Challenges Of Big Data Management”
Main Article Content
Abstract
In this present time and in the near future, Big Storage holds great promise in the realm of technical storage and retrieval. Hadoop and various other big data frameworks have introduced a data storage model fundamentally distinct from traditional database management systems. Consequently, numerous enterprise applications have come to rely on these innovative architectural features in their database management systems, which has led to an emerging need for further development. Drawing from years of experience dealing with Hadoop workloads, we have encountered the network challenges inherent in HDFS and similar file systems. Big-data storage systems serve as crucial platforms in these domains, providing the necessary scalability and reliability for storing and processing vast amounts of data. Recently, workflow-based data handling has been achieved by employing either application specific overlays that map the output of one task to serve as the input of another in a sequential pipeline or, more recently, by leveraging the Map Reduce programming model. However, it is evident that the Map Reduce model does not suit every scientific application. When deploying a large-scale workflow across multiple data centres, geographically distributed computation encounters bottlenecks due to data transfers, resulting in high costs and significant latencies. In the current context, the consumption of storage space by big-data systems supports the storage and processing of large data sets. It fills a crucial gap between the self-describing data commonly used by scientists for data distribution and sharing and the increasingly important big-data systems crucial for scientific analysis. Building upon this approach, we have extended two important and widely used big data systems, utilizing them to directly support the storage and analysis of scientific data stored in self-describing formats.