Team Gallium

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Type systems

Partial Type inference with first-class polymorphism

Participants : Didier Rémy, Boris Yakobowski, Didier Le Botlan [ INSA Toulouse ] .

The ML language uses simple types (first-order types without quantifiers) enriched with type schemes (simple types with outer-most universal quantifiers). This allows for simple type inference based on first-order unification, relieving the user from the burden of writing type annotations. However, it only enables a limited form of parametric polymorphism. In contrast, System F uses second-order types (types with inner universal quantifiers at arbitrary depth) that are much more expressive. As a result, type inference is undecidable in System F which forces the user to provide all type annotations.

Didier Le Botlan and Didier Rémy have proposed a type system, called MLF, that enables type synthesis as in ML while retaining the expressiveness of System F [2] . Only type annotations on parameters of functions that are used polymorphically in their body are required. All other type annotations, including all type abstractions and type applications are inferred. Remarkably, type inference in MLF reduces to a new form of unification that amounts to performing first-order unification in the presence of second-order types.

The initial study of MLF was the topic of Didier Le Botlan's PhD dissertation  [36] . Didier Le Botlan and Didier Rémy have continued their work on MLF focusing on the simplification of the formalism. There is an interesting restriction of MLF that retains most of its expressiveness while being simpler and more intuitive for which types can be interpreted as sets of System-F types and type-instantiation becomes set inclusion on the semantics. This justifies a posteriori the type-instance relation of MLF that was previously defined only by syntactic means. This work has been submitted for journal publication  [Oops!] .

Boris Yakobowski, who started his Ph.D. under Didier Rémy's supervision in October 2004, is pursuing the investigation of MLF, aimed at simplifying the presentation. The use of graphs rather than terms to represent types, has permitted the elimination of most of the notational redundancies. Graph types are the superposition of a dag representation of first-order terms and a binding tree that describes where and how variables are bound. This representation is much more canonical than syntactic types. This exposed a linear-time unification algorithm on graph types, which can be decomposed into standard first-order unification on the underlying dags and a simple computation on the underlying binding trees. These results have been be presented at the workshop on Types in Language Design and Implementation [Oops!] .

Didier Rémy presented both of these new results in an invited talk at the ML workshop.

Graphic types can also be extended to represent type inference constraints internally. This is a stepping stone for efficient type inference for MLF. This also allows for a direct specification of type inference via generation of typing constraints and a more direct proof of type soundness by showing that typing constraints of a program entails the typing constraints of its reduct. These two new results are to be submitted for publication. Boris Yakobowski has also implemented a prototype implementation both for pedagogical purposes and for verifying the efficiency of the implementation in practice— i.e. that the constant overhead of the graph representation is quite small.

First-class module systems

Participants : Benoît Montagu, Didier Rémy.

Advanced module systems have now been in use for two decades in modern, statically typed languages. Modules are easy to understand intuitively and also easy to use in simple case. However, they remain surprisingly hard to formalize and also often become harder to use in larger, more complex but practical examples. In fact, useful features such as recursive modules or mixins are technically challenging—and still an active topic of research.

This persisting gap between the apparent simplicity and formal complexity of modules is surprising. We have identified at least two orthogonal sources of widthwise and depthwise complexity. On the one hand, the stratified presentation of modules as a small calculus of functions and records on top of the underlying base language duplicates the base constructs and therefore complicates the language as a whole. On the other hand, the use of paths to designate abstract types relatively to value variables so as to keep track of sharing pulls the whole not-so-simple formalism of dependent types, even though only a very limited form of depend types is effectively used.

Our goal is to provide a new presentation of modules that is conceptually more economical while retaining (or increasing) the expressiveness and conciseness of the actual approaches. We rely on first-class modules to avoid duplications of constructs, (a new form of) opened existential types to represent type abstraction, and a new form of paths in types that do not depend on values to preserve the conciseness of writing.

Preliminary investigations, presented in Benoit Montagu's master dissertation, are promising. This work has also been described in an article to be submitted for publication.

In the future, we should exploit the first-class nature of our approach to increase expressiveness and conciseness of the module sub-language and exploit the simplicity of the theoretical formalism to tackle recursive modules and mixins.

Exact type checking for XML transformations

Participants : Alain Frisch, Haruo Hosoya [ University of Tokyo ] .

Type systems for programming languages are usually sound but incomplete: they reject programs that would not cause type errors at run-time. There are good reasons for this incompleteness: for most programming languages, exact (complete) typing is undecidable. However, this might not be the case for type systems for XML transformation languages, which usually rely on tree automata and regular tree languages to precisely constrain the structure.

We are interested in importing results from the theory of tree transducers into programming languages for XML. There is a strong analogy between top-down tree transducers and functional programs (top-down traversal of values through pattern matching and mutually recursive functions). There is a rich literature about tree transducers. Many of the existing formalisms enjoy a property of exact type-checking: given two regular tree languages interpreted as input and output constraints, it is possible to decide without any approximation whether a given tree transducer is sound with respect to this specification.

One of the reasons which can explain the relative lack of interest from the programming language community for tree transducer techniques is that most of the problems are EXPTIME-complete and algorithms are quite complex. We believe that a proper reformulation of the algorithms will allow us to define interesting classes of transformations that support efficient type-checking and to experiment with original implementation techniques.

We are particularly interested in the formalism of so-called macro-tree transducers, which directly capture the essence of top-down functional transformations with accumulators. We have obtained a new backward type-inference algorithm for this kind of transducers. From a deterministic bottom-up tree automaton describing the output type, this algorithm produces an alternating tree automaton that represents all the valid input trees, in polynomial time (alternating tree automata can have both conjunctive and disjunctive transitions, which makes them exponentially more succinct than normal tree automata). The type-checking problem then reduces to checking emptiness of alternating tree automata, which is an DEXPTIME-complete problem in general, but some algorithms are efficient for many common situations. In particular, we have established that a transducer that traverses the input tree a bounded number of times results in an alternating tree automata whose emptiness can be checked in polynomial time (and most transducers that appear in practice satisfy this condition).

We have also developed various optimization heuristics and have implemented an efficient emptiness check for alternating tree automata. Combined with the backward type-inference algorithm, this gives the first usable type-checking tool for realistic macro-tree transducers. We have benchmarked our tool with several XML transformations on the XHTML DTD (which is quite large). For all our examples, the type-checking takes at most a few seconds and usually a few milliseconds.

These results were presented at the DBLP 2007 conference [Oops!] .


Logo Inria