Data Lineage Strategies – A Modernized View
Main Article Content
Abstract
Data lineage refers to data sources and the data derived from them, along with the transformations that may be acquired from these sources. Data lineage is important to enterprises to understand how certain data was derived and if it is acceptable for use in particular analytical outputs. This could potentially lead to regulatory and compliance issues in many domains. We propose a modernized data lineage strategy that takes into account the modern approach to data management, inclusive of the current metadata stores and logical data structures. We will present specific approaches to modernize the existing data lineage strategies within the data warehousing and data virtualization paradigms. Finally, we will present research directions in the context of big data systems and data brokering for open data. Our modernized view on data lineage will benefit data management professionals, researchers, software and tools developers, and stakeholders who seek to promote open data.