Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
XML PDF e-pub
PDF e-Pub

Section: New Results

Understanding Query Behavior and Explaining Linked Data

Participants : Fabien Gandon, Rakebul Hasan.

Our main research is to understand how to assist users in querying [63] and consuming [64] Linked Data. In querying Linked Data, we help users by providing information on how a query may behave. In addition, we provide information about the behavior of similar queries executed in the past. Users can use these information for query construction and refinement. Accurately predicting query behavior is also important for workload management, query scheduling, query optimization. In consuming Linked Data, we explain why a given piece of data exists and how the data was derived. Users can use these explanations to understand and debug Linked Data. Overall, we address the followings research questions:


How to predict query behavior prior to executing the query?


How to explain Linked Data?

Predicting query behavior

To predict query behavior prior to query execution, we apply machine learning techniques on the logs of executed queries. We work with SPARQL queries and predict how long a query would take to execute. We use the frequencies and the cardinalities of SPARQL algebra operators of a query as its features. We also extract a compact set of features from the basic graph patterns belonging to the query. We achieve high accuracy (R2=0.837) using the k-nearest neighbors regression. We also suggest similar queries from the query log using an efficient neighbors search. Users can use these suggestions to understand behaviors of similar past queries, and construct and refine their queries accordingly.

Explaining Linked Data

The diverse and distributed nature of Linked Data presents opportunities for large-scale data integration and reasoning over cross-domain data. In this scenario, consumers of Linked Data may need explanations for debugging or understanding ontologies. A consumer may also want a short explanation to have an overview of the reasoning. We propose to publish the explanation related metadata as Linked Data. This enables us to explain derived data in the distributed setting of Linked Data. We present the Ratio4TA ( ) vocabulary to describe explanation metadata and guidelines to publish these metadata as Linked Data. In addition, we summarize explanations using four measures: centrality, coherence, abstractness, and similarity. Users can specify their explanation filtering criteria - types of information they are interested in. We evaluate our summarization approach by comparing the summarized explanations generated by our approach and ground truth summarized explanations generated by humans. Our explanation summarization approach performs roughly with 60% to 70% accuracy for small summaries.