Published May 17, 2022 | Version v1
Publication

Test-driven anonymization in health data: a case study on assistive reproduction

Description

Artificial intelligence (AI) is a broad field whose prevalence in the health sector has increased during recent years. Clinical data are the basic staple that feeds intelligent healthcare applications, but due to its sensitive character, its sharing and usage by third parties require compliance with both confidentiality agreements and security measures. Data Anonymization emerges as a solution to both increasing the data privacy and reducing the risk against unintentional disclosure of sensitive information through data modifications. Although the anonymization improves privacy, the diverse modifications also harm the data functional suitability. These data modifications can affect applications that employ the anonymized data, especially those that are data-centric such as the AI tools. To obtain a trade-off between both qualities (privacy and functional suitability), we use the Test-Driven Anonymization (TDA) approach, which anonymizes incrementally the data to train the AI tools and validates with the real data until maximizing its quality. The approach is evaluated on a real-world dataset from the Spanish Institute for the Study of the Biology of Human Reproduction (INEBIR). The anonymized datasets are used to train AI tools and select the dataset that gets the best trade-off between privacy and functional quality requirements. The results show that TDA can be successfully applied to anonymize the clinical data of the INEBIR, allowing third parties to transfer without transgressing user privacy and develop useful AI Tools with the anonymized data.

Abstract

Ministerio de Economía y Competitividad TIN2016-76956-C3-1-R

Abstract

Ministerio de Economía y Competitividad TIN2016-76956-C3-2-R (POLOLAS)

Additional details

Created:
December 4, 2022
Modified:
November 29, 2023