Feature selection is considered as one of the most important data pre-processing step in different modelling fields, especially for prediction and classification purposes. Feature selection belongs to the wider class of data mining procedures, as it allows to discover the variables that mostly affect a given phenomenon from an analysis of the available data, by thus increasing the knowledge of the considered process or phenomenon. There are three main categories of feature selection approaches, namely filter, wrappers and embedded methods: this work is focused on the first one and, in particular, on a fuzzy logic-based procedure which combines some traditional filter methods. Filter methods exploit intrinsic properties of the data to select the features before the learning task and, with respect to the other kinds of approaches, require a shorter computational time and adequate for datasets with a large number of instances and features. In order to prove the effectiveness of the proposed approach, several tests have been performed. Different classifiers have been designed and applied for binary classification on different datasets: some widely used public datasets including a lot of instances and features and two datasets coming from the metal industry. The obtained results are presented and discussed in the paper.
A fuzzy system for combining filter features selection methods
CATENI, Silvia
;COLLA, Valentina;VANNUCCI, Marco
2017-01-01
Abstract
Feature selection is considered as one of the most important data pre-processing step in different modelling fields, especially for prediction and classification purposes. Feature selection belongs to the wider class of data mining procedures, as it allows to discover the variables that mostly affect a given phenomenon from an analysis of the available data, by thus increasing the knowledge of the considered process or phenomenon. There are three main categories of feature selection approaches, namely filter, wrappers and embedded methods: this work is focused on the first one and, in particular, on a fuzzy logic-based procedure which combines some traditional filter methods. Filter methods exploit intrinsic properties of the data to select the features before the learning task and, with respect to the other kinds of approaches, require a shorter computational time and adequate for datasets with a large number of instances and features. In order to prove the effectiveness of the proposed approach, several tests have been performed. Different classifiers have been designed and applied for binary classification on different datasets: some widely used public datasets including a lot of instances and features and two datasets coming from the metal industry. The obtained results are presented and discussed in the paper.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.