Improved Job Execution in Hadoop using Task Deduplication Approach
Main Article Content
Abstract
These days, nearly all applications are accessible online, and individuals are fascinated by using social media, sensor networks, and public systems. Large data storage systems are therefore required in order to manage and process the large volume of data. Additionally, this data needs to be handled and kept safely. As a result, massively distributed datacenters are built for processing and storing data. In the rapidly evolving big data era, organizations need to be able to handle and analyze massive amounts of data efficiently to learn valuable things. In order to provide better task execution and preserving data integrity, security, allowed data must be used with appropriate authorization. We have suggested a secure metadata-driven strategy that makes use of Hadoop and improves data processing and storage by reducing needless data movement and job execution time.