Project Team Edelweiss

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Partnerships and Cooperations
PDF e-pub XML

Section: New Results

Graph Based Knowledge Representation

Knowledge Graph Abstract Machine

Participants : Olivier Corby, Catherine Faron-Zucker, Fabien Gandon.

KGRAM (Knowledge Graph Abstract Machine) is a generic interpreter for W3C SPARQL Query Language that operates not only on RDF graphs but on labelled graphs. The interpreter interacts with the target graph through proxies that implement an interface: Producer enumerates edges from the target graph, Evaluator evaluates filters and Matcher takes entailments into account.

This year, work have been done to leverage KGRAM up to SPARQL 1.1 Query Language & Update. It implements most of current version of the recommendation, except the service statement. It passes almost all W3C SPARQL 1.1 test cases.

In addition, the Corese Semantic Web Factory has been redesigned and modularized into release 3.0 entirely based on KGRAM interfaces and proxies. Corese 3.0 is a new lightweight RDF/S implementation with SPARQL 1.1. We ported the former Inference Rule engines (forward and backward engines) onto Corese 3.0. We also ported former SPARQL extensions: approximate search based on ontological distance, SQL and XPath in SPARQL 1.1, edge enumeration and length of Property Path, pragmas.

This new version is already used in several applications among which: cartography at IGN [28] , design constraint modeling at CSTB [35] , technological watch in ISICIL ANR project. It is also used in several PhD Theses in the team. A list of applications can be found on Corese Web site( ).

Semantic Web Graph Visualization

Participants : Olivier Corby, Nicolas Delaforge, Erwan Demairy, Fabien Gandon [contact] .

Thanks to an INRIA grant (ADT), we design and develop a Semantic Web Gephi Plugin. This plugin is coupling Corese and the Gephi Open Graph Visualization Platform to provide a framework to query and visualize RDF data taking into account their schemas. See the web pages( ) ( ).

Semantic Social Network Analysis

Participants : Guillaume Erétéo, Fabien Gandon.

The PhD thesis of Guillaume Erétéo [14] in the context of the ANR project ISICIL allowed us to analyze the characteristics of the heterogeneous social networks that emerge from the use of web-based social applications, with an original contribution that leverages Social Network Analysis with Semantic Web frameworks. Social Network Analysis (SNA) proposes graph algorithms to characterize the structure of a social network and its strategic positions.

Semantic Web frameworks allow representing and exchanging knowledge across web applications with a rich typed graph model (RDF), a query language (SPARQL) and schema definition frameworks (RDFS and OWL). In this thesis, we merged both models in order to go beyond the mining of the link structure of social graphs by integrating two approaches: (1) semantic processing of the network typing and (2) emerging knowledge of online activities.

In particular we investigated how (1) to bring online social data to ontology-based representations, (2) to conduct a social network analysis that takes advantage of the rich semantics of such representations, and (3) to semantically detect and label communities of online social networks and social tagging activities.

This work was published at [15] , [14] .

Index Summarizing the Content of RDF Triple Stores

Participants : Adrien Basse, Fabien Gandon, Isabelle Mirbel.

We are interested in designing an architecture to support the distribution of a SPARQL query on a small and fixed number of RDF repositories. To do so, the key stage is to characterize the content of the base of each server in order to be able to predict if a server could contribute or not to the answer of a query. In the context of the PhD Thesis of Adrien Basse we propose an algorithm to extract a compact representation of the content of an RDF store. We improved the canonical representation of RDF graphs based on DFS code proposed in the literature by providing a join operator to reduce the number of generated redundant patterns.

Rules for the Web of Data

Participants : Oumy Seye, Olivier Corby.

In the context of this PhD thesis, the focus is on Rules for the Web of data. We are interested in integrating Rule Interchange Format (RIF) - W3C recommendation for exchanging rules on Web - to others W3C technologies. The aim of this year is to study the integration possibilities of RIF-BLD into semantic Web technologies. RIF-BLD is the dialect of RIF for logic-based systems. Firstly, we have studied the state of the art. Secondly we improved the RIF-BLD parser for presentation syntax and XML syntax. As RIF-BLD can be used with RDF data and OWL ontologies, it is interesting to consider RIF inferences in queries on RDF graph structure. That is why we finally study the integration of RIF-BLD into the Corese Semantic Web engine. In this last step, we have implemented the mapping of abstract syntax tree of RIF-BLD to abstract syntax tree of SPARQL. Thus, we can now excute logic inferences of RIF-BLD in the backward engine of Corese.

We have a paper accepted at EGC 2012 presenting RIF2SPARQL [44] , a translation of RIF-BLD statements in SPARQL to perform the logical inferences of RIF-BLD on the Corese Semantic Web Factory. These inferences are implemented in backward chaining approach. We have designed and implemented the mapping of RIF-BLD to SPARQL.

Collaborative Management of Interlingual Knowledge

Participants : Maxime Lefrançois, Fabien Gandon.

We are interested in bridging the gap between the world of natural language and the world of the Semantic Web, in particular to support multilingual access to the Web of Data and management of interlingual knowledge bases. We introduce the ULiS approach, that aims at designing a pivot-based NLP technique called Universal Linguistic System, using Semantic Web formalisms, and being compliant with the Meaning-Text theory. Through ULiS, a user could interact with an Interlingual Knowledge base (IKB) in controlled natural language. Linguistic resources themselves (e.g. dictionary, grammar) are part of a specific IKB, thus, actors may enhance them (i.e. the model of the controlled natural language), through requests in controlled natural language (e.g., add a new lexical units, add grammar rules).

In [30] we proposed a novel approach to define Interlingual Lexical Units classes in the Interlingual Lexical Ontology so that they support the projection of their lexicographic definition on themselves using the OWL formalism. This approach is compliant with the Meaning-Text Theory.

In [31] , [40] we introduced three basic interaction scenario for ULiS and we proposed and overviewed the layered architecture of ULiS: meta-ontology, ontology, facts; and ontology, interlingual knowledge, situational knowledge.

We have started a collaboration with the RELIEF project that deals with the construction of a French Lexical Network (Alain Polguère, CNRS-ATILF).

Reuse of Data Analytics Contents and Processes

Participants : Corentin Follenfant, Fabien Gandon, Olivier Corby.

Industrial Business Intelligence (BI) proposes tools and methods to perform data analysis over heterogeneous enterprise sources. They allow one to harvest, federate, cleanse, annotate, query, organize and visualize data in order to support decision making with human-readable documents such as reports, dashboards, mobile visualizations. Such processes currently require expertise in technical domains like relational modeling in order to produce relevant content.

Users willing to do so without following the learning curve have to reuse existing content to create new one, and need to be guided throughout the workflow. Recommender systems can contribute to easing their progression, but most of them will operate inside walled garden for specific tasks instead of assisting the user throughout his workflow.

Semantic Web tools allow us to provide a common ground for modeling the different operations that compose BI workflows with RDFS vocabularies, capturing usage of the underlying transformations operators within document repositories with RDF graphs, and enabling further composition and reuse of BI operations to achieve new analysis. We introduced with [38] an extension of the RDF Data Cube vocabulary( ) to describe these operations as flexible services that are composed by matching multidimensional data structures interfaces, and validated this model on a production repository containing 900 BI documents decomposed into 8000 documents snippets.

The underlying sequence of operations specific to each snippet was then extracted into a unique RDF graph. Aggregate SPARQL queries allow us to compute basic usage statistics for BI operations that can feed recommender systems such as BI workflows wizards. Besides refining the proposed model, next steps include evaluating the technical usability of SPARQL property paths patterns for data lineage and to identify frequent patterns in sequences of BI operations.

This PhD Thesis is done with a CIFRE industrial grant from SAP Research.