Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Modeling XML document transformations

XML database queries, logic and automata

Keywords : Querying, streaming, node selection queries, tree transformations, XPath, XQuery.

Participants : Olivier Gauwin, Emmanuel Filiot, Mathias Samuelides, Sławek Staworko, Anne-Cécile Caron, Joachim Niehren, Yves Roos, Sophie Tison [ correspondent ] .

Gauwin, Niehren, and Roos [12] introduce streaming tree automata (STAs), a new notion of tree automata for unranked trees. While being of interest for streaming XML processing, STAs can be shown to be equally expressive as both, Alur's (2007) nested word automata and Neumann and Seidl's (1998) pushdown forest automata. The advantage of streaming tree automata is that they directly operate on unranked trees, rather than nested words or forests.

Gauwin, Caron, Niehren, and Tison [19] apply STAs to streaming query answering. They investigate earliest query answering, as needed for query answering with optimal memory management. They propose a new algorithm for earliest query answering, which require only polynomial time combined complexity. It applies to n-ary node selection queries in unranked trees defined by deterministic STAs. This class is highly expressive in that captures all MSO definable n-ary queries (even though not modulo polynomial time). As a corollary, they obtain an earliest query answering algorithm for CoreXPath 2.0 with polynomial time data complexity. This seems close to optimal as they show, since deciding earliest selection is coNP-hard for XPath even when restricted to Forward XPath with descendant axis only. Without determinism, earliest selection becomes DEXPTIME-complete for n-ary queries defined by whatsoever kinds of tree automata.

Filiot and Tison [18] investigate the variable independence problem for n-ary queries in trees defined by MSO formulas with n free first-order variables. They show how to decide whether a regular query is equivalent to a union of cartesian products, independently of the input tree. They introduce variable independence w.r.t. a dependence forest between blocks of variables, which they prove to be decidable.

Filiot, Talbot (J.M. Talbot was a member of Mostrare until 2006 and is now professor in Marseille), and Tison [17] study TAGEDs ( tree automaton with global equality and disequality constraints ). This kind of automaton on trees allows to test (dis)equalities between subtrees which may be arbitrarily faraway. In particular, it is equipped with an (dis)equality relation on states, so that whenever two subtrees t and t' evaluate (in an accepting run) to two states which are in the (dis)equality relation, they must be (dis)equal. They prove decidability of emptiness of several classes and give two applications of TAGEDs: decidability of an extension of Monadic Second Order Logic with tree isomorphism tests and of unification with membership constraints.

Staworko, Filiot and Chomicki [23] investigate the problem of querying (regular) sets of XML documents represented with tree automata and consider n -ary tree automata queries whose expressive power captures MSO on trees. Because finite automata can represent infinite sets of documents, they propose the notions of universal and existential query answers, answers that are present resp. in all and some documents. They study complexity of query answering and show that computing existential query answers is in PTIME under the assumption that the arity of the query is a fixed parameter. On the other hand, computing universal query answers is EXPTIME-complete, but they show that it is in PTIME if one assumes that the query is fixed (data complexity). Finally, the framework captures problems central to many novel XML applications like querying inconsistent XML documents. In particular, they demonstrate how to use this framework to compute consistent query answers in XML documents that do not satisfy the schema.

Niehren collaborated with Kuhlmann from Saarbrücken [21] on the monadic second-order logic (MSO) for totally ordered trees. Totally ordered trees are ground terms equipped with an additional total order on their nodes. They provide a formal model for data that comes with both a hierarchical and a sequential structure; one example for such data are streaming in natural language, another are natural language sentences, where a sequential structure is given by word order, and a hierarchical structure is given by grammatical relations between words. They show that the MSO satisfiability problem of unrestricted structures is undecidable, but give a decision procedure for practically relevant sub-classes, based on tree automata.

Roos, Terlutte and Latteux [13] define the notion of biRFSA which is a residual finate state automaton (RFSA) whose the reverse is also an RFSA. The languages recognized by such automata are called biRFSA languages. They prove that the canonical RFSA of a biRFSA language is a minimal NFA for this language and that each minimal NFA for this language is a sub-automaton of the canonical RFSA. This leads to a characterization of the family of biRFSA languages. They also define the family of biseparable automata and prove that every biseparable NFA is uniquely minimal among all NFAs recognizing a same language, improving the result of H. Tamm and E. Ukkonen for bideterministic automata.

Tison and Roos started the PhD project of Groz in September, jointly with Caron and André. They investigate XML database security, especially access control for XML documents.

Programming languages

Keywords : Concurrency, stochastic programming, system biology, semantics, rewriting.

Participants : Joachim Niehren [ correspondent ] , Sophie Tison.

Niehren continues participating in the BioComputing activity, leaded by Lhoussaine at the LIFL. Together, they started a cooperation with John and Uhrmacher [20] from Rostock. This has resulted in the attributed pi calculus ( $ \pi$( L) ) , an extension of the stochastic pi calculus with attributed processes and attribute dependent synchronization. A stochastic simulator for this modeling language for systems biology has been presented and implemented. This show that the extension by attributes can be handled with reasonable efficiency.

Niehren continues his cooperation with Schmidt-Schauß and Sabel from Frankfurt and Schwinghammer from Saarbrücken. In [22] they investigate methods and tools for analysing translations between programming languages with respect to observational semantics. The behaviour of programs is observed in terms of may-and-must convergence in arbitrary contexts, and adequacy of translations, i.e., the reflection of program equivalence, is taken to be the fundamental correctness condition. For compositional translations they propose a notion of convergence equivalence as a means for proving adequacy. This technique avoids explicit reasoning about contexts, and is able to deal with the subtle role of typing in implementations of language extension.

Tison continues her cooperation with Godoy from Barcelona and Maneth from Sidney [25] . They study the well known open problem of the decidability of regularity preservation by a homomorphism for regular tree languages. They consider two interesting subclasses. First, they prove that regularity preservation is decidable in polynomial time when the domain language is constructed over a monadic signature, i.e., over a signature where all symbols have arity 0 or 1. Second, they prove decidability for the case where non-linearity of the homomorphism is restricted to the root node (or nodes of bounded depth) of any input term. They also prove the decidability of this problem: given a set of terms with regular constraints on the variables, is its set of ground instances regular? This extends previous results where regular constraints where not considered.


Logo Inria