## Section: New Results

### Artificial evolution, fractal analysis and applications

Abstract :

This document contains a selection of research works to which I have contributed. It is structured around two themes, artificial evolution and signal regularity analysis and consists of three main parts: Part I: Artificial evolution, Part II: Estimation of signal regularity and Part III: Applications, combination of signal processing, fractal analysis and artificial evolution. In order to set the context and explain the coherence of the rest of the document, this manuscript begins with an introduction, Chapter 1, providing a list of collaborators and of the research projects carried out. Theoretical contributions focus on two areas: evolutionary algorithms and the measurement of signal regularity and are presented in Part I and Part II respectively. These two themes are then exploited and applied to real problems in Part III. Part I, Artificial Evolution, consists of 8 chapters. Chapter 2 contains a brief presentation of various types of evolutionary algorithms (genetic algorithms, evolutionary strategies and genetic programming) and presents some contributions in this area, which will be detailed later in the document. Chapter 3, entitled Prediction of Expected Performance for a Genetic Programming Classifier proposes a method to predict the expected performance for a genetic programming (GP) classifier without having to run the program or sample potential solutions in the research space. For a given classification problem, a pre-processing step to simplify the feature extraction process is proposed. Then the step of extracting the characteristics of the problem is performed. Finally, a PEP (prediction of expected performance) model is used, which takes the characteristics of the problem as input and produces the predicted classification error on the test set as output. To build the PEP model, a supervised learning method with a GP is used. Then, to refine this work, an approach using several PEP models is developed, each now becoming a specialized predictors of expected performance (SPEP) specialized for a particular group of problems. It appears that the PEP and SPEP models were able to accurately predict the performance of a GP-classifier and that the SPEP approach gave the best results. Chapter 4, entitled A comparison of fitness-case sampling methods for genetic programming presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and the proposed Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world datasets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification. Chapter 5, entitled Evolving Genetic Programming Classifiers with Novelty Search, deals with a new and unique approach towards search and optimization, the Novelty Search (NS), where an explicit objective function is replaced by a measure of solution novelty. This chapter proposes a NS-based GP algorithm for supervised classification. Results show that NS can solve real-world classification tasks, the algorithm is validated on real-world benchmarks for binary and multiclass problems. Moreover, two new versions of the NS algorithm are proposed, Probabilistic NS (PNS) and a variant of Minimal Criteria NS (MCNS). The former models the behavior of each solution as a random vector and eliminates all of the original NS parameters while reducing the computational overhead of the NS algorithm. The latter uses a standard objective function to constrain and bias the search towards high performance solutions. This chapter also discusses the effects of NS on GP search dynamics and code growth. The results show that NS can be used as a realistic alternative for supervised classification, and specifically for binary problems the NS algorithm exhibits an implicit bloat control ability. In Chapter 6, entitled Evaluating the Effects of Local Search in Genetic Programming, a memetic GP that incorporates a local search (LS) strategy to refine GP individuals expressed as syntax trees is studied in the context of symbolic regression. A simple parametrization for GP trees is proposed, by weighting each function with a parameter (unique for each function used in the construction of a tree). These parameters are then optimized using a trust region optimization algorithm which is therefore used here as a local search method. Then different heuristic methods are tested over several benchmark and real-world problems to determine which individuals from the tree population should be subjected to a LS. The results show that the best performances (in term of both quality of the solution and bloat control) was achieved when LS is applied to all of the solutions or to random individuals chosen from the top percentile (with respect to fitness) of the population. Chapter 7, entitled A Local Search Approach to Genetic Programming for Binary Classification, proposes a memetic GP, tailored for binary classification problems, extending the work on symbolic regression presented in the previous chapter. In particular, a small linear subtree is added on the top of the root node of the original tree and each node in a tree is weighted by a real-valued parameter, which is then numerically optimized using the trust-region algorithm used as a local search method. Experimental results show that potential classifiers produced by GP are improved by the local searcher, and hence the overall search is improved achieving substantial performance gains. Application on well-known benchmarks provided results competitive with state-of-the-art. Chapter 8, entitled RANSAC-GP: Dealing with Outliers in Symbolic Regression with Genetic Programming, presents a hybrid methodology based on the RAndom SAmpling Consensus (RANSAC) algorithm and GP, called RANSAC-GP. RANSAC is an approach to deal with outliers in parameter estimation problems, widely used in computer vision and related fields. This work presents the first application of RANSAC to symbolic regression with GP. The proposed algorithm is able to deal with extreme amounts of contamination in the training set, evolving highly accurate models even when the amount of outliers reaches 90Part II, Estimation of signal regularity consists of 3 chapters. Chapter 9, entitled Hölderian Regularity, provides some reminders and some theoretical contributions on the estimation of Hölderian regularity. Some details are given on the estimation of the Hölder exponent using oscillation method or a wavelet transform. These approaches and improved versions are compared on synthetic signals. The FracLab software, were all the above methods have been integrated, is also presented at the end of this chapter. The work proposed in Chapter 10, entitled Theoretical comparison of the DFA and variants for the estimation of the Hurst exponent, involves a theoretical and numerical comparison between the Detrended Fluctuation Analysis (DFA) and its variants, namely DMA, AFA, RDFA and the proposed Continuous DFA method, in which the trend is constrained to be continuous. The DFA is a well-established method to detect long-range correlations in time series. It has been used in a wide range of applications, from biomedical applications to signal denoising. It allows the Hurst exponent of a pure mono-fractal time series to be estimated. It operates as follows: after integration, the signal is split into segments. Using a least-squares criterion, local trends are deduced. The resulting piecewise linear trend is then subtracted to the whole signal. The power of the residual is computed for different segment lengths and its log-log representation allows the Hurst exponent to be deduced. The comparison performed in this chapter is based on a new common matrix writing formalism of the square of the fluctuation function from the instantaneous correlation function of the process for all these methods. In the case where the process under study is stationary in the broad sense, the statistical mean of the square of the fluctuation function is thus expressed as a weighted sum of the terms of the autocorrelation function, and this without any approximation. More precisely, the mathematical expectation of the square of the fluctuation function can be seen for each method as the autocorrelation function of the output of a filter dependent on this method and calculated for a lag equal to zero, i.e. the power of the filter output. In the general case, this analytical framework provides a means of comparing the DFA and its variants that is different from a traditional synthetic signal performance study, and explains the different behaviours of these regularity estimation methods, using the proposed filter analysis. Chapter 11 contains two patents with THALES AVS related to the work presented in the previous chapter. Part III of this manuscript, Applications, combination of signal processing, fractal analysis and artificial evolution, contains contributions combining the tools previously mentioned in order to develop new tools such as in the Chapters 12 and 13 or contributions on the resolution of real problems in the biomedical field, such as in the Chapters 14, 15 and 16. Chapter 12, entitled "The Estimation of Hölderian Regularity using Genetic Programming", presents a GP approach to synthesize estimators for the pointwise Hölder exponent in 2D signals. The optimization problem to solve is to minimize the error between a prescribed regularity and the estimated regularity given by an image operator. The search for optimal estimators is then carried out using a GP algorithm. Experiments confirm that the GPoperators produce a good estimation of the Hölder exponent in images of multifractional Brownian motions. In fact, the evolved estimators significantly outperform a traditional method by as much as one order of magnitude. These results provide further empirical evidence that GP can solve difficult problems of applied mathematics. In Chapter 13, entitled "Optimization of the Hölder Image Descriptor using a Genetic Algorithm", a local descriptor based on the Hölder exponent is studied. The proposal is to find an optimal number of dimensions for the descriptor using a genetic algorithm (GA). To guide the GA search, fitness is computed based on the performance of the descriptor when applied to standard region matching problems. This criterion is quantified using the F-Measure, derived from recall and precision analysis. Results show that it is possible to reduce the size of the canonical Hölder descriptor without degrading the quality of its performance. In fact, the best descriptor found through the GA search is nearly 70performance on standard tests. Chapter 14, entitled "Interactive evolution for cochlear implants fitting", presents a study that intends to make cochlear implants more adaptable to environment and to simplify the process of fitting, by designing and using a specific interactive evolutionary algorithm combined with signal processing. Real experiments on volunteer implanted patients are presented, that show the efficiency of interactive evolution for this purpose. In Chapter 15, entitled "Feature extraction and classification of EEG signals. The use of a genetic algorithm for an application on alertness prediction", the development of computer systems for the automatic analysis and classification of mental states of vigilance; i.e., a person’s state of alertness is studied. Such a task is relevant to diverse domains, where a person is expected or required to be in a particular state. For instance, pilots, security personnel or medical staffs are expected to be in a highly alert state, and a brain computer interface could help confirm this or detect possible problems. In this chapter, a combination of an evolutionary algorithm and signal processing is used. The purpose of this algorithm was to select an electrode and a frequency range to use in order to discriminate between the two states of vigilance. This approach determined the most useful electrode for the classification task. Using the recording of this electrode, the prediction obtained has a reliability rate of 89.33

In Chapter 16, entitled "Regularity and Matching Pursuit Feature Extraction for the Detection of Epileptic Seizures", a novel methodology for feature extraction on EEG signals that allows to perform a highly accurate classification of epileptic states is presented. Specifically, Hölderian regularity and the Matching Pursuit algorithm are used as the main feature extraction techniques, and are combined with basic statistical features to construct the final feature sets. These sets are then delivered to a Random Forests classification algorithm to differentiate between epileptic and non-epileptic readings. Several versions of the basic problem are tested and statistically validated producing perfect accuracy in most problems and 97.6a well known database, reveals that the proposal achieves state-of-the-art performance.The experimental results suggest that using a feature extraction methodology composed of regularity analysis, a Matching Pursuit algorithm and time-domain statistic measures together with a classifier produces a system that can predict epileptic states with competitive performance that matches or even surpass other novel methods. Finally the last chapter concludes this manuscript and provides perspectives for future work.

Authors : Pierrick Legrand