Project : cortex
Section: New Results
Data exploitation and interpretation
Participants : Frédéric Alexandre, Shadi Al Shehabi, Mohammed Attik, Yann Boniface, Laurent Bougrain, Hervé Frezza-Buet, Randa Kassab, Jean-Charles Lamirel, Georges Schutz.
This research aims at adapting classical models of connectionism (cf. § 3.1) to extend their use to data interpretation and knowledge extraction (cf. § 3.2).
Our knowledge extraction approches are focused on two main techniques : unsupervised networks and pruning algorithms for supervised networks. Concerning pruning methods, we are interested by a subpart of these methods where a connection is removed according to a relevance criterion often named the weight saliency (also termed sensitivity). More precisely, the weight with the smallest saliency will generate the smallest error variation if it is removed. We proposed several variants based on Optimal Brain Surgeon (OBS) and Unit-Optimal Brain Surgeon (Unit-OBS) [15][16]. The first variant called F-OBS performs a backward selection by successively removing single weights from the input variables to the hidden units in a fully connected multilayer perceptron (MLP) for variable selection. The second one removes a subset of non-significant weights in one step. The last one combines the two properties presented above. The first motivation of this work is that the weight saliency distributions in the different layers of a MLP are not the same. It can be observed experimentally that the first layer is more stable, which explains that the saliency in the first layer is small as compared to the other ones. Accordingly, it can be interesting to selectively remove weights in the first layer. The second motivation is to propose a novel way to select the weights in the Generalized Optimal Brain Surgeon method. We used statistic methods to compare empirical performances of these different variants. Unit-OBS presents some better results with high frequencies for a large number of pruned variables compared to F-OBS, but our algorithm is faster and keep better the variables which are associated to rules to extract. We proposed an implementation of G-OBS with a criterion to eliminate a subset of weights by selecting the weights with the smallest saliencies, which allows to make G-OBS faster. The results obtained are comparable to OBS, which allows to use G-OBS as a fast method for MLP topology optimization. In the aim of making F-OBS faster, we proposed GF-OBS which eliminates several weights at the same time. Moreover, we presented a comparison between OBS and Unit-OBS more detailed that the previous studies. We have also presented new algorithms for variable selection in MLPs. We have shown [15][16] the advantages of applying OBS on the architecture obtained by a variable selection method. We have also presented new hybrid methods for variable selection based on the previous idea that only weights between the first and the second layer should be removed at first.
These techniques were applied to geographical information systems (cf. § 8.2) and medical databases (cf. § 8.4). In this latter domain, we have studied how self-organizing algorithms can indicate interesting hints to differentiate EEG signals for epileptic seizure prediction and vigilance state identification [18]. This has also led to the implementation on FPGA of a portable system (cf. § 6.4) [19][20]. All these applications have been made possible by our DynNet software library (cf. § 5.3).
We have also studied the interpretation of databases including an important temporal aspect, namely databases of sensor signals of a very complicated industrial machine in the domain of steelmaking. We have more particularly proposed an hybridization between self-organizing maps and dynamic time warping [35].
We are also working on unsupervised approaches for the design of an information retrieval/data mining system. This approach implies the design of specific models for developing strong interaction capabilities with the user, as well as extended capabilities of adaptation to the context. An important example of such a model is the map conjunction model (i.e. the multicriteria classification model) that we have developed. This model, whose name is MicroNOMAD-MultiSOM, represents an important extension of the Kohonen SOM model (cf. § 5.4). The automatic deduction capabilities of the model represent a major advantage as compared to usual classification methods in the domain of data analysis. Hence, these latter methods do not permit the dynamical management of several viewpoints that can be considered as several different dimensions on the same information.
We focused this year on the overall validation of our approach by comparing it with classical models, like probabilistic models. We have proven that the intercommunication mechanism between viewpoints can be assimilated to a bayesian inference whenever the propagation mechanism is suitably adapted [13]. This proof represents an important advance for our approach because the indirect effect of the related adaptation is to increase the accurary of the deduction that have been obtained with the model. Thus, it opens new ways for accurately comparing different classifications that have been obtained on the same data [8]. In the framework of webometrics and science evaluation, it also permits us to suitably compare the behaviour of our model with the one of more classical network analysis model. This led us to proof its added value as compared to these latter models [9].
Documentary data have such a characteristic that each datum is individually defined in low size description spaces, with low overlapping of one datum with another. This situation led to global description space of important size but of low density for documentary data. Dimension reduction and outliers elimination becomes thus mandatory for the optimisation of the analysis of such data. We explore this year different techniques. One of our significant advancement in this area concerns our proposal of adaptation of singular value decomposition techniques for data selection and cleaning. This proposal has been set up thanks to the quasi-symbolic evaluation criteria for measuring the quality of numerical classification that we have proposed last year. More specifically, these criteria allow us to highlight the defects of classical methods, like latent semantic indexing, and, to set up this new proposal. In a complementary way, we are furthermore investigating in non linear data projection techniques.
The result of our comparison between topographic methods also highlights the superiority of Neural Gas as compared to other topographic methods, like SOM or Growing Neural Gas, for documentary information analysis [8]. Hence, the Neural Gas method appeared to be the most stable one for information analysis which are conducted on a small number of classes when data are sparse, like documentary ones. Taking this result into account we are currently extending the Neural Gas model in order to adapt it to our multicriteria classification approach. Our new multi Gas approach also led us to focus on information visualisation techniques for representing relationships between classes initially defined in highly multidimensional spaces. Hence, one of our recent alleys of research consists in developing an hyperbolic visualisation model based on the definition of hierarchies of Gas classes. A model proposal has been set up this year.
The limitations of the numerical classification methods, like MicroNOMAD-MultiSOM, are related to the errors of interpretation that they may generate as soon as they are used without preliminary care by non-specialists for the precise analysis of a given domain. On their own side, symbolic methods when they are used for the same goal present the limitation to deliver results of unmanageable size. After having set up a matching mechanism between Galois lattice and SOM Maps and defined a quasi-symbolic evaluation criterion for measuring the quality of numerical classification, we are pursuing our studies about the complementarities that can exist between the two types of methods. This year we have developed a new principle of knowledge extraction that consists in using an unsupervised neural network as a front-end for extracting rules. An unsupervised neural network copes with the problem of rules inflation that is inherent to symbolic methods because of its synthesis capabilities that can be used both for reducing the number of rules and for extracting the most significant ones. Moreover, the rule extraction is facilitated as soon as multi-viewpoint unsupervised neural network model with low topologic constraints and including generalization, like the Multigas model we have already proposed, is used.
Another technique issued from our systemic approach is under development. It concerns a specific novelty detection model based on the Moore-Penrose projectors. This technique is in its preliminary phase. Nevertheless, our first experiment led us to expect promising results both for user's profile modelling and analysis of flow of permanently changing information. This technique is currently used in the framework of an international project (cf. § 8.4).