Team VASY

Application Domains
Software
Contracts and Grants with Industry
Other Grants and Activities
Bibliography
Inria / Raweb 2003
Project: VASY

# Project : vasy

## Section: New Results

### Languages and Compilation Techniques

#### Compilation of LOTOS

Participants : Damien Bergamini, David Champelovier, Hubert Garavel, Wendelin Serwe.

In 2003, work took place — essentially in the framework of the FormalFame contract (see §  6.3.1 and §  7.2) — to enhance the Lotos tools present in the Cadp toolbox. As regards the Cæsar.adt compiler for the data part of Lotos:

• We fixed 4 remaining bugs in the compiler front-end common to Cæsar and Cæsar.adt (lexical analysis, syntactic error recovery, and static semantics).

• We designed and implemented a fixpoint algorithm for the detection of Lotos types whose domain of values is either finite or manually bounded by the specifier.

As regards the Cæsar compiler for the process part of Lotos:

• We fixed 2 bugs in Cæsar's optimization E2 and simulation phases.

• The Exec/Cæsar programming interface was enhanced with new functionalities that give access to Petri net-related information (number of places, number of transitions, last transition fired, etc.), as well as means to randomize the firing of $\tau$ -transitions; these features proved to be useful for the FormalFame contract.

• We introduced a heuristical algorithm that permutes the various fields of state vectors so as to save memory by reducing the unused ``padding'' bits introduced by machine word alignment constraints; although the average memory gain measured on a large set of benchmarks is disappointing ( $2%$ ), it is still worth when millions of state vectors are to be stored in main memory.

• Similarly, to reduce the memory size of transition labels, we introduced several optimization techniques: compaction of transition numbers, use of bit fields, and field permutation (as for state vectors), all of which led to an average memory gain of 45%.

• The code generated by Cæsar for computing a hash function on transition labels was improved; in practice, the average number of hash collisions is divided by a factor ranging from 2.5 (on Pc/Linux) to 3.9 (on Sun/Solaris).

• The code generated by Cæsar for converting transition labels to character strings was improved in several ways; in practice, this makes the exploration of an entire transition system (between 1.25 and 4 times) faster.

• In the Exec/Cæsar mode, the code generated for firing each transition was made faster by delaying as much as possible certain computations (e.g., current state storage and next state computation) until it is sure that the transition will be actually fired (which is only the case if all guard predicates are true and if the environment selects that particular transition); otherwise, these computations can be safely skipped. Another profitable optimization consists in avoiding to recompute the successor state information several times for the same state (a situation that may occur when the environment is not immediately ready to accept a transition). On the Ilu benchmark studied in the FormalFame contract, these optimizations led to a speed improvement between $24%$ (on Pc/Linux) and $43%$ (on Sun/Solaris).

Additionally, we also investigated techniques for state space reduction, our goal being to decrease the size of the graphs generated by Cæsar, still preserving strong bisimulation between the original and reduced graphs.

We considered the approach based on live variable analysis, first proposed by H. Garavel and Juan Galvez [42]. The basic idea is to assign a canonical value to any variable that is no longer used, so as to avoid distinguishing state vectors that only differ by the values of some variables not used in the future. This is done by adapting classical data flow analysis to the extended Petri nets generated by Cæsar and by resetting to zero each variable as soon as it ceases to be alive.

In 2003, we generalized the approach of [42] to handle so-called hierarchical units, i.e., the possibility to split each process into a set of concurrent sub-processes at an arbitrary nesting depth. In this model, concurrent processes do not share variables; however, the variables of a parent process can be consulted (but not modified) by its children sub-processes, a situation for which we designed several heuristics.

We implemented our ideas in a prototype version of Cæsar (about $3,600$ lines of additional C code), which we applied to a benchmark suite of 469 Lotos specifications. For 98 examples ( $21%$ ), the size of graphs generated by Cæsar was divided by a mean factor of 11.6. On some examples, we even observed a reduction factor of 300.

#### Compilation of the E-LOTOS Data Part

Participants : David Champelovier, Hubert Garavel.

As regards the data part of E-Lotos, we continued to improve the Traian compiler (see §  5.2), which is distributed on the Internet (see §  9.1) and used intensively within the Vasy team as a development tool for compiler construction [6].

In 2003, we released a new version 2.3 of Traian, which supersedes the previous version 2.2 issued in 2002. This development effort, which increased the software size from $48,000$ to $55,000$ lines of code, completes the integration in Traian of the code optimizations studied by Claude Chaudet in 1999 (see § 5.2.3 in the 1999 Vasy activity report and § 5.2.1 in the 2002 Vasy activity report). It also brings a higher degree of symmetry between Traian and the Cæsar.adt compiler for the data part of Lotos (see §  5.1). In addition to several bug fixes, the new version of Traian brings useful enhancements:

• Particular classes of Lotos NT data types (enumerated types, tuples, natural numbers, singleton types, and isomorphic types) are now recognized automatically and implemented optimally.

• For recursive types, heuristics allow to reduce the number of Lotos NT types implemented using pointers; however, a compiler directive exists to force a given Lotos NT type to be implemented using pointers.

• It is now possible to give pointer types a canonical representation by storing all their values into hash tables, which avoids to allocate multiple instances of the same value; in the framework of enumerative verification, this technique allows significant savings in memory space (on a benchmark proposed by Jan Friso Groote, we observed that the amount of memory needed was divided by 400).

• As for Lotos, it is now possible to split Lotos NT specifications into several files using a compiler directive.

In parallel, we pursued the design of Traian 3.0, a new generation compiler that could handle the data parts of both Lotos and Lotos NT, so as to merge Cæsar.adt and Traian 2.3 into a unique compiler. In 2003, the requirement base for Traian 3.0 grew from 140 to 198 entries.

#### Compilation of the E-LOTOS Process Part

Participants : Aurore Collomb, Hubert Garavel, Frédéric Lang, Guillaume Schaeffer.

Compiling the process part of E-Lotos and Lotos NT is a difficult problem as these languages combine concurrency, quantitative time, and exceptions. To deal with these problems progressively, we chose to focus first on the sequential processes present in E-Lotos and Lotos NT. We designed a formalism named Ntif (New Technology Intermediate Form) to be used as an intermediate language for compiling and verifying E-Lotos and Lotos NT processes.

Ntif allows to specify extended automata parameterized by typed variables. Each transition is labeled with an action (which allows communication with the environment according to the rendezvous semantics of process algebras) and a sequential code fragment to read and/or write variables. Compared to classical ``condition/action'' (or ``guarded commands'') automata, Ntif provides high level control structures (statements ``case'', ``if-then-else'', ``while'', etc.); this avoids the introduction of spurious intermediate states and transitions, as well as the duplication of boolean conditions, an important source of errors [5].

In 2003, we started introducing quantitative time concepts in Ntif. In the vein of E-Lotos, we added a ``wait'' operator that lets a given amount of time elapse, timing tags on actions to express deadline and urgency, and a construct to capture the time elapsed between the instant an action is enabled and the instant it actually occurs. We defined the semantics of this extension and started to demonstrate suitable properties (e.g., time additivity) using the Coq theorem prover. The semantics was also assessed by modeling several classical timed protocols (e.g., Bounded Retransmission Protocol, Fisher protocol, etc.) using Ntif.

In parallel, the existing tools for Ntif were enhanced in several ways:

• For modularity reasons, we merged the Nt2Dot (which visualizes Ntif descriptions graphically) and Nt2If (which unfolds Ntif descriptions to produce lower level formalisms) into one single tool, named Ntif. The architecture of this new tool supports several compiler back-ends that translate Ntif into a variety of languages and formats.

• A file inclusion mechanism was implemented, which allows to split Ntif descriptions into several files.

• The static semantics checking phase of Ntif was entirely rewritten to be more efficient and display better error messages. Static checks for proper variable initialization were added, which allowed to detect uninitialized variables in existing Ntif specifications.

• Two new back-ends were developed, which translate Ntif to the input languages used by the TReX and Uppaal tools for timed automata. In the case of Uppaal, both Xta and Xml formats can be generated. The back-ends translate the high level Ntif timed constructs into clocks, time guards, and time progress conditions that express the impossibility to enter or stay in a given state.

• We added to Ntif two standard libraries (lossy buffers and write-only buffers). In the general case, these buffers are expressed as normal Ntif processes. However, when translating to TReX input language, these buffers are recognized and implemented as TReX built-in buffers for optimization purpose.

These improvements increased the size of the Ntif tool from $6,500$ to $13,300$ lines of code ( $9,500$ lines of Lotos NT code, $2,200$ lines of Syntax code, and $1,600$ lines of C code).