Published December 9, 2022 | Version v1
Publication

Data Quality in Artificial Intelligence

Description

This thesis is part of the AI-TIE project coordinated by Haaga-Helia University of Applied Sciences. The main goal of the project is to support SME companies in developing and growing their business in Finland by utilizing artificial intelligence solutions. The aim of the thesis, which was carried out in 2022, is to study the importance of data quality in AI development, examine the dimensions of data quality and to find out the common problems and good practices affecting data quality in companies that are already using or planning to implement artificial intelligence. The theory section explains what is meant by artificial intelligence and what good data quality means from the perspective of artificial intelligence. In addition, the study explores what data is and how data quality can be measured and evaluated. By examining and comparing methods, the body of the interview and survey conducted in the research is selected. The research part of the thesis utilizes the means of concurrent mixed method research. Based on interviews and surveys, the research section examines the views of professionals in the field on the different dimensions of data quality and the related challenges and good practices from the perspective of AI development. Based on the results of the study, relevancy was considered the most challenging dimension of data quality in AI development. This dimension was selected as one of the most challenging data quality dimensions six times out of seven surveys. The reasons given for the challenging dimension included the difficulty of predicting what kind of data should be collected for future needs and a sufficient contextual understanding of the business and its needs. A comprehensive understanding of business problems from a technical and business perspective was considered important to be able to start collecting relevant data. In addition, the study revealed dimension-specific development suggestions and good practices for improving each data quality dimension. The results of the thesis can be used to improve and evaluate the quality of existing data and to support the planning of future data needs from the perspective of artificial intelligence. In addition, the results can be utilized in the development of the maturity model of data quality on the way to the implementation of a production-ready AI application.

Additional details

Identifiers

URL
http://www.theseus.fi/handle/10024/786090
URN
urn:oai:www.theseus.fi:10024/786090

Origin repository

Origin repository
HH