Section: Scientific Foundations
Strong coupling among the parts of an application severely hampers its evolution. Therefore, it is crucial to answer the following questions: How to support the substitution of certain parts while limiting the impact on others? How to identify reusable parts? How to modularize an object-oriented application?
Having good classes does not imply a good application layering, absence of cycles between packages and reuse of well-identified parts. Which notion of cohesion makes sense in presence of late-binding and frameworks? Indeed, frameworks define a context that can be extended by subclassing or composition: in this case, packages can have a low cohesion without being a problem for evolution. How to obtain algorithms that can be used on real cases? Which criteria should we select for a given remodularization?
We plan to enrich Moose, our reengineering environment, with a new set of analyses  ,  . We decompose our approach in three main and potentially overlapping steps:
Tools for understanding applications at large: packages/modules,
Remodularization analyses, and
Software Quality and Open DashBoard.
Tools for understanding applications at large: packages/modules
Context and Problems. As we are going to design and evaluate several algorithms and analyses to remodularize applications, we need a way to understand and assess the results we will obtain. Our experience on real application analyses taught us that analyses tend to produce a huge amount of data that we should understand and correlate to the original source code situation  . The problem is that understanding large systems is already difficult  ,  ,  ,  , but in our case we need to understand an existing system and the results of our analysis. Parallelism between software programs and cities is commonly used to reason about evolution  ,  . While interesting, this metaphor does not scale because location of houses does not have any semantics information related to the connection between classes. A notion of radar has also been proposed  , but this mechanism suffers from the same problem.
Therefore, there is a definitive need to have ways to support the understanding of large applications at the level of their structure.
Research Agenda. We are going to study the problems raised by the understanding of applications at the larger level of granularity such as packages/modules. We will develop a set of conceptual tools to support this understanding. These tools will certainly be visual such as the Distribution Map Software visualization  or based on the definition of new metrics taking into account the complexity of packages. Such a step is really crucial as a support for the remodularization analyses that we want to perform. The following tasks are currently on going work:
The Qualixo model has been originally implemented on top of the Java platform. An implementation of this model, named MoQam (Moose Quality Assessment Model), is under development in the Moose open-source and free reengineering environment. A first experiment has been conducted  . Exporters from Moose to the Squale software are under development.
- Cohesion Metric Assessment.
We are assessing the metrics and practices used originally in the Qualixo model. We are also compiling a number of metrics for cohesion and coupling assessment. We want to assess for each of these metrics their relevance in a software quality setting.
Dependency Structure Matrix (DSM), an approach developed in the context of process optimization, has been successfully applied to identify software dependencies among packages and subsystems. A number of algorithms helps organizing the matrix in a form that reflects the architecture and highlights patterns and problematic dependencies between subsystems. However, the existing DSM implementations often miss important information in their visualization to fully support a reengineering effort. We plan to enrich them to improve their usefulness to assess system general structure.
Context and Problems. It is a well-known practice to layer applications with bottom layers being more stable that top layers  . Until now, few works have attempted to identify layers in practice: Mudpie  is a first cut at identifying cycles between packages as well as package groups potentially representing layers. DSM (dependency structure matrix)  ,  seems to be adapted for such a task but there is no serious empirical experience that validates this claim. From the side of remodularization algorithms, many were defined for procedural languages  . However, object-oriented programming languages bring some specific problems linked with late-binding and the fact that a package does not have to be systematically cohesive since it can be an extension of another one  ,  .
Some approaches based on Formal Concept Analysis  show that such an analysis can be used to identify modules. However the presented example is small and not representative of real code. Other clustering algorithms  ,  have been proposed to identify modules  ,  . Once again, the specific characteristics of object-oriented programming are not taken into account. This is a challenge since object-oriented programming tends to scatter classes definitions over multiple packages and inheritance hierarchies. In addition, the existing algorithms or analyses often only work on toy applications. In the context of real applications, other constraints exist such as the least perturbation of the code, minimizing the hierarchy changes, paying attention to code ownership, layers, or library minimization. The approach will have to take into account these aspects.
Many different software metrics exist in the literature  ,  ,  such as the McCabe complexity metrics  . In the more specific case of object-oriented programming, assessing cohesion and coupling have been the focus of several metrics. However their success are rather mitigated as the number of critics raised. For example, LCOM  has been highly criticized  ,  ,  ,  ,  ,  . Other approaches have been proposed such as RFC and CBO  to assess coupling between classes. However, many other metrics have not been the subject of careful analysis such as Data Abstraction Coupling (DAC) and Message Passing Coupling (MPC)  , or some of metrics are not clearly specified (MCX, CCO, CCP, CRE)  . New cohesion measures were proposed  ,  taking into account class usage.
Research Agenda. We will work on the following items:
- Characterization of “good” modularization.
Any remodularization effort must use a quality function that allow the programer to compare two possible decompositions of the system and choose which one represents a more desirable modularization. Remodularization consists in trying to maximize such a function. The typical function used by most researcher is some measure of cohesion/coupling. However, manual system modularization may rely on many different considerations: implemented functionalities, historical considerations, clients or markets served, ...We want to evaluate various modularization quality functions against existing modularizations to identify their respective strengths and weaknesses.
- Cohesion and coupling metric evaluation and definition.
Chidamber's well-known cohesion metric named LCOM has been strongly criticized  ,  ,  ,  . However, the solutions rarely take into account that a class is an incremental definition and, as such, can exist in a several packages at once. For example, LCOM* flattens inheritance to determine the cohesion of a class. In addition, these metrics are not adapted to packages. We will thus work on the assessment of existing cohesion metrics for classes, define new ones if necessary for packages and work as well as coupling metrics  ,  . This work is also related to the notion of software quality treated below.
- Build an empirical validation of DSM and enhancements.
We want to assess Dependency Structure Matrix (DSM) to support remodularization. DSM is good to identify cyclic dependencies. Now we want to know if we can identify misplaced classes, groups of packages working as layers. For this purpose we will perform controlled experiments and in a second period apply it on one of the selected case study. Based on these results, we will propose enhanced or a specific DSM.
- Layer identification.
We want to propose an approach to identify layer based on a semi-automatic classification of package and class interrelationships that they contain. However, taking into account the wish or knowledge of the designer or maintainer should be supported. We will try to apply different algorithms and adapt them to the specific context of object-oriented programming  .
Within the context of the REMOOSE Associated team with the Geodes group of the DIRO, we plan to use explanation-based constraint programming to help remodularization. It is not clear if this will work but several research questions are worth to be investigated: how to model a remodularization situation as a declarative set of constraints? Is this model explanation-based constraint programming useful and scalable?
Companies often look for the assessment of their software quality. Several models of software quality have been proposed: J.A. McCall  with his Factor-Criteria-Metrics has identified more than 50 candidate factors that may be used to assess software quality. Among those factors, only 11 were retained. Each of those has been characterized by 23 criteria that represent the internal project quality view. This approach is not easily used because of the high number of metrics —more than 300 for which some of them are not automatically computed. In an effort of conformance, the ISO (International Standardisation Organisation) and the IEC (International Electronical Commission) are conjointly defined in the ISO 9126 norm in 1999. This norm, currently being restructured, will be composed of 4 parts: quality model (ISO 9126-1), external metrology (ISO 9126-2), Internal metrology (ISO 9126-3), Usage quality of metrology (ISO 9126-4). There is also a family of work focusing on design evaluation as quality criteria  ,  ,  and new quality models: QMOOD is for example an hierarchical quality model which proposes to link directly quality criteria to software metrics based on object-oriented software metrics  while other work focus on linking different high level criteria with software code metrics  ,  .
Research Agenda. Since software quality is fuzzy by definition and that a lot of parameters should be taken into account we consider that defining precisely a unique notion of software quality is definitively a Graal in the realm of software engineering. The question is still relevant and important. We plan to work on the two following items in the context of the Squale project in contact with the Qualixo company:
Quality Model . We want to study the existing quality models and develop in particular models that take into account (1) the possible overlaps in the source of information —it is important to know whether a model measures the same aspects several times, using different metrics (2) the combination of indicators —often, software quality models happily combine metrics, but at the price of losing the explicit relationships between the indicator contributions.