Section: New Results
Thematic Web Warehousing
Participants : Serge Abiteboul, Hélène Gagliardi, Alban Galland, Fayçal Hamdi, Nobal Niraula, Nathalie Pernelle, Chantal Reynaud, Fatiha Saïs, Brigitte Safar.
The reference reconciliation problem consists in deciding whether different identifiers refer to the same data (same person, same conference, ...). The logical and numerical approach named LN2R that we have developed has been detailed in  . This approach allows computing a set of reference pairs that (1) refers to the same data, (2) does not refer to the same data and (3) may refer to the same data. In addition to the reconciliation and no reconciliation decisions, the logical method L2R allows inferring the semantic equivalence of heterogeneous basic values that are stored in a dictionary. We are studying how this dictionary can be automatically refined in order to improve the reconciliation results in the settings of collaboration with THALES (in the HEDI project). In  we have shown how the reference reconciliation is used in a data warehouse building process, where data is extracted from the original sources, transformed in order to conform with the ontology and then reconciled by using the ontology. In order to enhance the user confidence in the results of data reconciliation methods that are numerical, global and ontology driven, we have proposed an explanation approach. In this approach, the explanations are computed and represented in colored Petri Nets.
The issue of data fusion arises once reconciliations have been determined. The objective of the fusion is to obtain a unique representation of the real world entity. We have proposed a fusion approach which deals with the uncertainty in the values associated with the attributes thanks to a formalism based on belief functions whose shapes are based on a set of criteria, using evidence theory formalism  has been developed. The aim now is to build a flexible querying approach of the fused data where the user preferences are taken into account.
Mapping between ontologies
We pursue our work on TaxoMap in the setting of the WebContent and Geonto projects. Following several issues previously investigated as the use of support knowledge published this year  , we focused on two main points, alignment of very large ontologies and mapping refinement.
Very large ontologies have been built in some domains such as medecine or agronomy and the challenge now lays in scaling up alignment techniques that often perform complex tasks. We proposed two partitioning methods which have been designed to take the alignment objectives into account in the partitioning process as soon as possible. These methods transform the two ontologies to be aligned into two sets of blocks of a limited size. Furthermore, the elements of the two ontologies that might be aligned are grouped in a minimal set of blocks and the comparison is then enacted upon these blocks. We performed experiments with the two methods on various pairs of ontologies and results are promising. This work obtained the best application paper award at EGC2009  and has been selected for an english book chapter  .
We investigated mapping refinement because current ontology matchers are not efficient for all application domains or ontologies and very often the quality of the results could be improved by considering the specificities of the ontologies domain. We proposed an environment, called TaxoMap framework, based on TaxoMap, which helps an expert to specify treatments based on alignment results. The aim is to refine these results or to merge, restructure or enrich ontologies  . Currently, this approach has been applied to mapping refinement in the topographic field with the ANR project, GEONTO.
At the same time, developments on TaxoMap have been pursued. Terminological techniques have been improved with a better morpho-syntactic analysis and we introduced new structural techniques. This allowed us to participate for the third time in the international alignment contest OAEI2009  which consists of applying matching systems to ontology pairs and evaluating their results. We took part to five tests and experimented our algorithm on large multilingual ontologies (English, French, German). Our participation in the campaign allows us to test the robustness of TaxoMap, our partitioning algorithms and new structural techniques.
We have also worked on the alignment of generic and specific models in the setting of Adaptive Hypermedia (AH) in order to help AH creators' models to be reused in a platform only made up of generic components. We developed a Protege Plug-in assisting designers to specialize generic models using their own models  . The plug-in includes two parts. A knowledge part gathers the meta-model based on the OWL meta-model and deduction rules. The processing part is made of components performing interactions with an inference engine (Jess) and the OWL Protege editor. We use the OWL protege API to manipulate OWL models and the SWRL Jess bridge to execute SWRL rules using the jess inference engine.
Finally, we initiated a work about global comparison of ontologies. The aim is to be able to assess and compare the points of view behind each particular conceptualization of the world. We studied which insights could be discovered from ontology matching, introducing initial ideas towards a global comparison of ontologies  .
Integration of web resources
We investigated the integration of resources available on the Web in Adaptive Hypermedia Systems (AHS). More and more metadata describing resources are available on the Web using Semantic Web languages, and can be reused. Our aim is to build an open corpus AHS by, on one hand, reusing AHS technologies, particularly the adaptation engine which is the heart of these systems and, on the other hand, reusing resources and their descriptions which are available on the Web. Moreover, we want to allow the creator of an adaptive system not only to reuse adaptation strategies that come with the system, but to also be able to specify his own ones. For that, we propose a pattern-based approach to express adaptation strategies in a semi-automatic and simple way. It allows the creator of an adaptive system to define elementary adaptations by using and instantiating adaptation patterns. These elementary adaptations can then be combined, allowing to specify adaptation strategies in an easy and flexible manner. We distinguish adaptive navigation according to two main criteria: the selection operations performed in order to obtain resources being proposed to the user and the elements of the domain model involved in the selection process. We validated our approach using the GLAM adaptation engine. We show that the GLAM rules can be automatically generated from pattern-based adaptations.
In a separate research work, we have carried work on Liquid Queries, a new paradigm for querying combined Web information systems. A liquid query provides to the user an interface containing a certain number of attributes, but which can be modified if the user requires e.g. more attributes by joining with other, dynamically identified, data sources, aggregates the content on some criteria, omits some columns etc. Behind liquid queries stands the SeCo execution engine, a database-like system for joining independent Web information sources. This work is carried in collaboration with Politecnico di Milano  .
We consider a set of views stating possibly conflicting facts. Negative facts in the views may come, e.g., from functional dependencies in the underlying database schema. We want to predict the truth values of the facts. Beyond simple methods such as voting (typically rather accurate), we explore techniques based on “corroboration”, i.e., taking into account trust in the views. We introduce three fixpoint algorithms corresponding to different levels of complexity of an underlying probabilistic model. They all estimate both truth values of facts and trust in the views. We present experimental studies on synthetic and real-world data. This analysis illustrates how and in which context these methods improve corroboration results over voting methods. We believe that corroboration can serve in a wide range of applications such as source selection in the semantic Web, data quality assessment or semantic annotation cleaning in social networks. This work sets the bases for a wide range of techniques for solving these more complex problems.
Use of the web to share personnal data is increasing rapidly with the emergence of Web 2.0 and social networks applications. However, users have yet to trust all the different hosts of their data and face difficulty with updates. To overcome this problem, we are studying on a model of distributed knolwedge base with access control and cryptographic functionalities,. The model allows exchanging documents, access control statements, keys and instructions in a distributed setting. We are considering different implementations of this model that can be used to leverage technologies such as DHT or Gossiping.
In such a social network, participants may bring conflicting opinions. We have studied the problem of trying to corroborate information coming from a very large number of participants. We have proposed and evaluated various algorithms towards this goal.