Published May 24, 2023 | Version v1
Publication

A quantitative methodology to identify related features in data sets

Description

In this paper, a methodology which quantifies the dependence beteen features in a data set is developed. This methodology uses the Ameva discretization algorithm. In particular, it uses the Ameva coefficient to quantify the dependece. Furthermore, a new coefficient called entropy has been proposed for cases where it is not possible to apply the Ameva discretization algorithm. Thus, different matrices of inter-dependence are built provinding a grade of dependence between two features. Finally, to verify the qualitiews of this methodology, a simple method to discard features base don it is applied to a well-known data set in a classification process and promising results for the carried out system are obtained.

Abstract

Ministerio de Ciencia e Innovación TIN2009-14378-C02-01

Additional details

Created:
May 26, 2023
Modified:
November 28, 2023