Title: A Survey on Evolutionary Multiobjective Feature Selection in Classification: Approaches, Applications, and Challenges

Authors: Ruwang Jiao, Bach Hoai Nguyen, Bing Xue, Mengjie Zhang

Published in: IEEE Transactions on Evolutionary Computation

Citation information: DOI 10.1109/TEVC.2023.3292527

Abstract

The paper discusses the dual objectives of feature selection which is ==to maximize classification accuracy and minimize the number of selected features==, effectively a multi-objective task. This paper provides a broad survey on existing research in this area, focusing on recent approaches, applications, current challenges and future directions.

Intro

  1. Importance of Multi-objective Feature Selection: Feature selection has the potential to improve classification performance, reduce the dimensionality of data, decrease storage space, enhance computational efficiency, and facilitate data visualization and understanding. It is also a complex and challenging combinatorial optimization problem, with its challenges being threefold:

    1. The exponential growth of the search space with the number of features.
    2. Complex interactions among features
    3. The inherent multi-objective nature of feature selection, where the two main objectives often conflict.
  2. Methods and Applications of Multi-objective Feature Selection: The paper provides a wide survey on the current research in multi-objective feature selection in classification, with a specific focus on the latest methodologies and applications. These methodologies and applications offer various theoretical insights and can also be applied to solve practical problems.
  3. Current Challenges and Future Directions: The paper discusses the key challenges facing multi-objective feature selection and provides guidance for future developments. These challenges include the exponential growth of the feature selection search space, complex interactions between features, and challenges in optimizing multi-objective feature selection.

Multi-objective Evolutionary Selection Algorithms

Description of the Problem

A multiobjective optimization problem could be mathematically expressed as:

Multi-objective feature selection aims to optimize both accuracy and the number of features simultaneously. Following the multi-objective framework depicted in the figure below, the design and improvement of multi-objective feature selection typically involve several steps: initialization, evaluation, environmental selection, offspring generation, and final decision-making. The evaluation step can be further divided into solution representation and evaluation function design.

Solution representation

A suitable data structure that can represent solutions to the MOFS problem is the first step in designing a working MOFS method.

Solution representation can be divided into four categories: vector, tree, graph, and matrix.

  • Vector: The vector-based representation is most commonly used in the evolutionary FS community. It can be grouped into two categories: binary encoding and continuous encoding.
  • Tree: The tree-based genetic programming is the representative method using the tree representation, which allows for the simultaneous optimization of feature interactions and feature selection.
  • Graph: The graph structure representation primarily pertains to feature selection approaches based on Ant Colony Optimization (ACO).
  • Matrix: The matrix structure representation is primarily employed in sparse machine learning algorithms.

Each representation method possesses its own merits and limitations, and the choice of the most suitable method relies on the employed evolutionary computation approach and the representation requirements of the solution.

Evaluation functions

As shown in Fig, based on how much a classification algorithm is involved during the FS process, MOFS could be roughly categorized into the filter, wrapper, and embedded methods.

  • Filter methods: Typically, the evaluation of a feature subset revolves around the inherent properties of the data, such as information-theoretic measures, correlation measures, similarity measures, consistency measures, statistical measures, and fuzzy set theory measures. These evaluation metrics are known for their computational efficiency and their ability to generalize well across different classification algorithms. However, it is important to note that they may exhibit inferior classification performance compared to wrapper methods.
  • Wrapper methods: Wrapper methods employ the accuracy of machine learning algorithms to evaluate feature subsets. They can yield high classification accuracy for a particular classification algorithm often at the cost of high computational time and weaker generalization of the selected features to other classification algorithms.
  • Embedded methods: Embedded methods involve automating the feature selection process within machine learning algorithms. Specifically, these methods integrate feature selection with model training. By incorporating feature selection directly into the learning process, embedded methods often exhibit superior classification performance compared to filter methods. Additionally, they tend to be more efficient than wrapper methods.
  • Hybrid methods: Hybrid approaches encompass the partial or complete combination of the aforementioned methods to effectively leverage their individual strengths.

In general, the design of evaluation functions primarily takes into account two aspects: the quality of classification and the size of the feature subset. However, other factors can also be considered, such as fairness, reliability, feature cost, and model interpretability. Different evaluation functions can be employed within various methods to measure the importance and contribution of features.

Initialization

In multi-objective feature selection, it is important to consider the distribution of initial solutions in the multi-objective space. Random initialization may not yield desirable results, as illustrated in the diagram below.

Hence, existing improved initialization methods primarily focus on two aspects: the ==quantity== of features and the ==quality== of features, aiming to provide better initial solutions.

  • Quantity: Methods considering the quantity of features employ strategies such as uniform sampling or probability-based adjustments. Specifically, researchers have found that random initialization tends to concentrate the initial solutions around selecting 50% of the features. Therefore, a good initialization algorithm should promote a more even distribution of initial solutions in terms of feature quantity, enhancing the diversity of the solutions.
  • Quality: Methods considering the quality of features incorporate indicators such as relevance and conditional entropy during the initialization phase to filter out less important features. This approach generates initial solutions with higher classification performance. By evaluating the significance and relevance of features early on, these methods facilitate the creation of initial solutions that are more likely to achieve superior performance.

By considering both the quantity and quality of features during the initialization process, these improved methods aim to enhance the effectiveness and efficiency of multi-objective feature selection by providing better initial solutions.

Offspring generation

Offspring generation in MOFS is utilized to discover more promising feature subsets. The commonly-used operators include crossover and/or mutation.The improved progeny generation method usually focuses on four aspects of improvement:

  • Reducing the number of selected features: The method to reduce the number of features is to remove irrelevant features by means of evolutionary pattern mining, unsupervised neural network and population distribution detection. For example, based on sparse optimization technique, the number of selected features is reduced.
  • Offspring generation based on feature importance: Feature-importance based methods use population distribution information or use both filter and wrap methods to generate better solutions.
  • Performing local search: The local search method improves the search performance by searching the adjacent feature subsets of the non-dominant or elite solutions.
  • Retaining building blocks of feature subsets: The method of preserving local structure of feature subsets preserves the integrity of local structure of feature subsets and the interaction between features by using Bayesian networks and non-dominated solution mining.

Environmental selection

Environmental selection (population update) plays a vital role in evolution as it determines which candidate solutions can survive, and is the driving force of population evolution.
The current EMO methods for MOFS could be mainly classified as Pareto dominance-based and decomposition-based approaches.

  • Pareto dominance-based: In Pareto dominance-based EMO, the fitness assignment usually combines the non-dominated sorting with a density estimator (e.g., crowding distance, fitness sharing, entropy, adaptive grids, parallel coordinates) .
  • Decomposition-based: The core idea of the decomposition-based EMO is to transform a multi-objective optimization problem into several single-objective optimization problems which are simultaneously solved using information from their neighboring subproblems.

Decision making

The goal of MOFS is to provide a representative subset of the non-dominated solutions for decision makers. However, from the point of view of users, among the multiple nondominated solutions obtained during the training phase, only one single solution is needed. Without specific constraints, the choice of a single solution from the non-dominated set for the unseen test data is not a trivial task. There are three primary decision-making techniques for multiobjective machine learning:

  • Objective preference-based:Target preference-based methods aim to select the final solution based on the priority of different objectives. For instance, when considering classification error rate and the number of selected features as two objectives, it is common to choose a solution with the lowest classification error rate since classification performance is considered more important than the number of features.
  • Knee point-based:Without any prior preference information, ==knee point (x1 in fig)== is a solution that incurs a large loss in at least one objective and gains a small amount of improvement in other objectives, which makes it intriguing to decision makers in posterior decision making.
  • Ensemble-based: The ensemble-based methods endeavor to extract and leverage useful information from multiple trade-off solutions, and generate a more reliable and robust solution based on these non-dominated solutions.

APPLICATIONS

Multi-objective feature selection is widely used in various practical domains, such as image and signal processing, biomedical field, business and finance, networks and network services, engineering, and other related fields. In image and signal processing, it improves tasks like facial emotion recognition, handwritten digit classification, and hyperspectral image classification by selecting discriminative features that balance the number of features and accuracy. In the biomedical field, it enhances cancer detection, drug discovery, and medical diagnosis by selecting crucial genes. In business and finance, it improves credit scoring and credit card fraud detection by selecting pertinent financial information. In networks and network services, it enhances tasks like intrusion detection, network security, and information retrieval by selecting essential features. In engineering and related fields, it is used for software defect prediction, churn prediction, energy consumption forecasting, environmental monitoring, and public health. These applications demonstrate the widespread use and practicality of multi-objective feature selection across diverse domains.

CHALLENGES AND FUTURE DIRECTIONS

Challenges

  1. Online MOFS with streaming data
  2. Multi-modal MOFS
  3. MOFS in multi-label classification
  4. Transfer learning for MOFS
  5. Many-objective FS in classification
  6. MOFS for multi-modal learning
  7. Multiobjective optimization for simultaneously FS and instance selection

Future Directions

  1. Handling different preferences between objectives
  2. Handling highly-discrete Pareto front
  3. Handling partially conflicting objectives
  4. Designing more appropriate performance metrics
  5. Handling objective selection bias
  6. Handling solution duplication
  7. How to visualize final results
  8. Applying EMO to sparse learning-based FS
  9. Discretization-based MOFS
Last modification:July 17, 2023
如果觉得我的文章对你有用,请随意赞赏