Published 2020 | Version v1
Publication

Coverage-based rewriting for data preparation

Description

The development of technological solutions satisfying non discriminating requirements is currently one of the main challenges for data processing. Concepts like fairness, i.e., lack of bias, and diversity, i.e., the degree to which different kinds of objects are represented in a dataset, have been recently taken into account in designing non-discriminating set selection, ranking, and OLAP approaches. Information extraction is however also at the basis of back-end data processing, for preparing, e.g., extracting and transforming data, usually based on SQL queries, before loading them inside a data warehouse for further front-end processing. The impact of an unfair data preparation process might have a relevant impact on front-end analysis. As an example, an underrepresented category in the warehouse might lead to an underrepresentation of that category in most of the following processes. This kind of guarantee is known as coverage. In this paper, we start from this consideration and we propose an approach for automatically rewriting back-end queries, whose results do not guarantee some coverage constraints, into the "closest" queries satisfying those constraints. Through rewriting, coverage-based modifications of data preparation steps are traced for further processing. We also present some preliminary experimental results and we identify some directions for future works.

Additional details

Created:
April 14, 2023
Modified:
November 29, 2023