Section: Scientific Foundations
The last years have seen some breakthroughs related to the ML search space, including Deep Networks (DN (Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy Layer-Wise Training of Deep Networks. NIPS 2006, pp 153–160, MIT Press, 2007.)), Echo State Networks (ESN (H. Jaeger. The "echo state" approach to analysing and training recurrent neural networks. German National Research Center for Information Technology, technical report GMD Report 148, 2001.)) or Liquid State Machines (LSM (NIPS 2006 Workshop on Echo State Networks and Liquid State Machines. H. Jaeger and H. Haas and J. C. Principe Eds. 2006.)). These frameworks offer a compact or sparse coding of complex functions (e.g. Deep Networks require a logarithmic vs exponential number of units to encode the canonical parity problem). While they have long been discarded as their full training is a hopelessly ill-behaved optimization problem, new heuristics have been shown to achieve efficient learning although their theoretical properties are not yet fully understood. For instance DNs consider a sequence of tractable optimization problems (unsupervised learning), iteratively growing the network and reaching a region where the supervised learning problem becomes “reasonably” tractable (Notably, TAO has done some early works along this line of research. Cascade mechanisms embedded in logical learning enables the construction of disjunctive hypotheses while handling conjunctive ones, in a transparent and cost-effective way (M. Sebag and M. Schoenauer, Incremental Learning of Rules and Meta-Rules. In B. Porter and R. Mooney, Eds, Proc. ICML90, Morgan Kaufmann, 1990); handling a sequence of objective functions within a population-based optimization scheme, thanks to the Behavioral Memory mechanism, was shown effective to tackle optimization problems which could not be addressed up front (M. Schoenauer and S. Xanthakis, Constrained GA Optimization. In S. Forrest, Ed., Proc ICGA'93, pp 573-580, Morgan Kaufmann, 1993)). In ESNs, a compact representation (sparse graph + set of weights) is mapped onto a complex function space (dynamic systems and limit cycles).
Interestingly, the distinction between the search space (referred to as genotypic space) and the solution space (referred to as phenotypic space) has long been identified as a main source of effectiveness for Evolutionary Computation (R. C. Lewontin, The Genetic Basis of Evolutionary Change , Columbia University Press, 1974.). The merits of the distinction between genotypic and phenotypic spaces can be illustrated by the so-called developmental representations, a prototype of which is the Cellular Encoding (F. Gruau. Neural Network Synthesis using Cellular encoding and the Genetic Algorithm. PhD thesis, Ecole Normale Superieure de Lyon, 1994), more recently extended toward embryogenic representations based on various biological models, from plant growth to Genetic Regulatory Networks. Developmental representations map a compact search space (parametric or tree-structured, e.g. programs) onto a complex, usually non parametric, solution space (e.g. analog circuits).
Taking advantage of the statistical learning and evolutionary cultures in TAO, our first research objective will be to analyze and study the diverse frameworks enabling a compact description of complex solutions through procedural heuristics, referred to as Deep Representations (DRs). The theoretical study will focus on the following two aspects:
DRs allow a huge solution space to be searched through exploring a comparatively restricted and well identified search space, which either offers some performance guarantees, or was found the only feasible way to obtain any result at all (J. Koza, Genetic Programming III: Automatic Synthesis of Analog Circuits . MIT Press, 1999). Tools from statistical ML, e.g. covering numbers, will be used to analyze the genotypic/phenotypic mappings in the EC literature.
The feasibility of learning/optimization requires some stability of the search landscape, meant as most genotypic changes result in little phenotypic differences. In the meanwhile, the search space should offer “sufficiently” many shortcuts toward various regions of the solution space. This property, referred to as versatility, implies that additional information enables efficient jumps in the phenotypic space, making the most of efficient active learning or exploration strategies. The stability/versatility tradeoff will be studied in the spirit of active learning (S. Dasgupta. Coarse Sample Complexity Bounds for Active Learning, NIPS'05, MIT Press, 2005).).
In an application perspective, the search for deep representations is relevant to the on-going Gennetec project (investigating Gene Regulatory Networks in a Genetic Programming perspective), and to Symbrion IP (as the target representation should allow learning at different time scales, e.g. involving both evolutionary optimization and on-line learning).