Section: New Results
Keywords : AUC-based Learning, Feature Selection, Human Computer Interaction and Visual Data Mining (human-machine interaction and visual data mining, visual data mining, human computer interaction), Methodological aspects, Meta-learning and Competence Maps (meta-learning, competence maps), Inductive Logic Programming, Constraint Satisfaction and Phase Transition (constraint satisfaction, phase transition), Bounded Relational Reasoning, Phase Transitions.
Stochastic Optimisation for ML and DM
Representation, Feature Selection, and Learning Criteria
At the core of Machine Learning is the representation of the problem domain. Building an appropriate representation, aka Feature Extraction (I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, eds. Feature Extraction, Foundations and Applications. Physica-Verlag, Springer, 2006.)or Constructive Induction, includes Feature Selection and Feature Construction.
Some results previous to TAO regarding Feature Selection exploits the stochasticity of EC-based learning: The ROGER algorithm ( ROC-based GEnetic learneR ) (M. Sebag, J. Azé, and N. Lucas. Roc-based evolutionary learning: Application to medical data mining. In Xin Yao et al., eds, Proc. PPSN 2004, LNCS 3242, pages 384–396. Springer Verlag, 2004.)is based on the evolutionary optimisation (While it has been shown by Thorsten Joachims that Support Vector Machines were amenable to AUC optimisation (Int. Conf. on Machine Learning, 2005, best paper award), in practice only greedy optimization is performed for the sake of tractability.)of the Area Under the ROC Curve (AUC) criterion, equivalent to the Wilcoxon-Mann-Whitney statistic. The ensemble of the various hypotheses provided by independent ROGER runs is used for sensitivity analysis and to achieve feature selection (K. Jong, E. Marchiori, and M. Sebag. Ensemble Learning with Evolutionary Computation: Application to Feature Ranking. In Xin Yao et al., eds, Proc. PPSN 2004, LNCS 3242, pages 1133–1142. Springer Verlag, 2004.).
Another kind of ensemble-based feature selection has been recently devised and applied for DNA microarrays analysis [Oops!] , focusing on the notion of Type I and Type II errors (distinguishing relevant and irrelevant features using feature rankings based on independent criteria) (A Pascal workshop related to Type I and Type II errors, Multiple Simultaneous Hypotheses Testing , has been organized by O. Teytaud et al. in May 2007.)
A new criterion for graphical model learning, stressing the graph structure complexity, has been proposed in Sylvain Gelly's PhD [Oops!] ; the advantage of this criterion in terms of learning consistency has been demonstrated in the specific but applicatively relevant case of a small learning sample when the graph structure is not the true one.
In the context of unsupervised learning, a new latent-clustering based criterion has been proposed in the SELECT project-team, and tackled by TAO using evolutionary approaches [Oops!] .
Finally, AUC-like learning criteria are being considered in Arpad Rimmel's PhD, aimed at handling imbalanced classification problems with many more features than examples, motivated by chemometry applications. The dialogue with the experts in the applicative context (Accamba ANR) did not permit the assessment of the approach up to now.
Hypothesis search space
The great applicative successes of Support Vector Machines (V. N. Vapnik, Statistical Learning Theory , J. Wiley, 1998.)are partly explained as prior knowledge about the problem domain can be rigorously encapsulated in the (manually designed) kernel. Previous works related to the use of prior knowledge in TAO, such as Carlos Kavka's PhD (Carlos Kavka. Evolutionary Design of Geometric-Based Fuzzy Systems . PhD thesis, Université Paris-Sud, July 2006.), aimed at merging the best of two worlds: the expert provides his knowledge (specific fuzzy rules), the scope of which is automatically determined and optimized using EC.
In a more theoretical perspective, the feasibility of learning in higher-order logic spaces has been investigated in an average-case perspective [Oops!] ; a new framework has been developed to study the expected undecidability (the probability of meeting undecidable clauses) and the convergence thereof along learning.
Another way of exploring the hypothesis space is based on ensemble learning, in the hope that the whole can perform better than the sum of its parts. Along this line, a multi-objective evolutionary ensemble learning approach has been proposed, leading to some insights into how to ensure the diversity of the hypotheses along evolution or in the final population, and how to select the best ensemble [Oops!] .
Independently, motivated by the search for active neural cell assemblies or relevant patterns (in the context of the ACI NeuroDyne), the spatio-temporal data mining of Magneto-Encephalographic datasets has been formalised as a multi-objective optimisation problem (finding large spatio-temporal areas with high signal correlation). An extension to multi-objective multi-modal optimisation was required to capture the several neural cell assemblies in interplay [Oops!] . Interestingly, the search for discriminant patterns among the relevant patterns turns out to be significantly easier than directly searching for discriminant patterns (Vojtech Krmicek and Michèle Sebag. Functional brain imaging with multi-objective multi-modal evolutionary optimization. In Th. Runarsson et al., eds, Proc. PPSN 2006, LNCS 4193, pages 382–391. Springer Verlag, 2006.).
Phase Transition and Competence Maps
ML can simultaneously be viewed as an optimisation problem and a constraint satisfaction problem (CSP). Inspired from the Phase Transition paradigm developed in the CSP community since the early 90s, Lorenza Saitta and Attilio Giordana have been studying relational learning after some order parameters; some implications on the limitations of existing relational learners have been demonstrated in an early collaboration with TAO members (1999). The order parameters define a landscape, enabling to depict the average behaviour of any related algorithm through a Competence Map . The comparison of the competence maps related to various -subsumption algorithms was shown instrumental to building a meta-layer, automatically selecting the best (on average) algorithm depending on the problem instance at hand (Fast Theta-Subumption with Constraint Satisfaction Algorithms, J. Maloberti and M. Sebag, Machine Learning Journal, 2004, pp 137-174.). The approach has been applied to propositional decision tree learning in Nicolas Baskiotis' PhD (C4.5 Competence Map: a Phase Transition-inspired Approach, N. Baskiotis and M. Sebag. In Proc. Int. Conf. on Machine Learning, ICML 2004, Morgan Kaufman pp 73–80.)and to grammatical inference [Oops!] . In the latter case, some unexpected biases of prominent learners, e.g. RPNI, have been discovered and tentative explanations have been provided. The link between CSP and linear programming and Support Vector Machines has been further investigated in Romaric Gaudel's PhD, leading to new bounds on the generalization error. The merit of the approach is twofold: it provides lower bounds which are applicable for small sample size [Oops!] .