Section: Scientific Foundations
XML Processing
Participants : Everardo Bárcenas-Patiño, Melisachew Chekol, Pierre Genevès, Nabil Layaïda, Vincent Quint.
Given the prominent role of XML for representing all kinds of data on the web and elsewhere, XML structures processing becomes a key issue. There are already dedicated languages for processing XML structures, such as XSLT or XQuery, which abstract over data through a tree-based data model and provide a powerful execution model. Our research follows this approach.
Some properties are expected from these specialized languages in order to help solving the most common problems: expressiveness, verifiability, efficiency, reusability, evolvability, scalability, correctness, etc. These properties are studied using the fundamental connection between language theory, mathematical logic, structured languages and query languages.
The goal of the research published so far in the literature is often limited to establishing new theoretical properties and complexity bounds. Our research differs in that, in addition to these goals, we seek resolution algorithms, efficient implementation techniques, and concrete design that may be directly applied to XML systems. We also consider that some properties are of particular importance for XML structure processing, namely:
Type checking: The types we consider are structural constraints over documents and data expressed in formalisms such as DTD, XML Schema, or Relax-NG. Few techniques are able to exploit typing information of the input or output documents to provide type-safe processing. In this domain, algorithmic advances have led to the creation of research languages, such as XDuce, based on efficient containment of regular tree types. However, many challenges remain. While type-checking full XSLT or XQuery is theoretically impossible (these are Turing-complete languages), one challenge is to push the “decidability envelope” further for type-checking standard XML transformations. In particular, one of the most difficult issue is to find techniques for analyzing XPath queries with regular tree types. Another challenge is to provide effective algorithms usable in practice for realistic scenarios.
Efficiency: XML processing languages may benefit from static analysis whenever performance is a concern. Static analysis techniques usually take advantage of robust formal semantics to help development of optimized compilers and runtimes.
Most of our work so far focuses on the XPath query language, for which we try to check properties statically, in the presence of types (schemas) or not.