- A1.1.8. Security of architectures
- A1.1.10. Reconfigurable architectures
- A1.1.13. Virtualization
- A1.3.4. Peer to peer
- A1.3.5. Cloud
- A1.3.6. Fog, Edge
- A1.5.1. Systems of systems
- A1.6. Green Computing
- A2.1.7. Distributed programming
- A2.1.10. Domain-specific languages
- A2.5.2. Component-based Design
- A2.6. Infrastructure software
- A2.6.1. Operating systems
- A2.6.2. Middleware
- A2.6.3. Virtual machines
- A2.6.4. Ressource management
- A3.1.2. Data management, quering and storage
- A3.1.3. Distributed data
- A3.1.8. Big data (production, storage, transfer)
- A4.1. Threat analysis
- A4.4. Security of equipment and software
- A4.9. Security supervision
- B2. Health
- B4. Energy
- B4.5.1. Green computing
- B5.1. Factory of the future
- B6.3. Network functions
- B6.4. Internet of things
- B6.5. Information systems
- B7. Transport and logistics
- B8. Smart Cities and Territories
1 Team members, visitors, external collaborators
- Shadi Ibrahim [Inria, Researcher, until Sep 2020]
- Adrien Lebre [Team leader, IMT Atlantique, Professor, HDR]
- Hélène Coullon [IMT Atlantique, Chair]
- Remous Aris Koutsiamanis [IMT Atlantique, Associate Professor, from Sep 2020]
- Thomas Ledoux [IMT Atlantique, Associate Professor, Full Professor from Oct. 2020, HDR]
- Jean-Marc Menaud [IMT Atlantique, Professor, HDR]
- Jacques Noyé [IMT Atlantique, Associate Professor]
- Mario Südholt [IMT Atlantique, Professor, HDR]
- David Guyon [Inria, until Sep 2020]
- Thomas Lambert [Inria, until Sep 2020]
- Jonathan Pastor [IMT Atlantique, until Jul 2020]
- Simon Robillard [IMT Atlantique]
- Hamza Sahli [Inria, until Apr 2020]
- Alexandre Van Kempen [IMT Atlantique, from Mar 2020 until Nov 2020]
- Maxime Belair [Orange Labs, CIFRE]
- Fatima-Zahra Boujdad [IMT Atlantique, until Mar 2020]
- Emile Cadorel [Armines, until Sep 2020]
- Maverick Chardet [Inria, until Sep 2020]
- Marie Delavergne [Inria]
- David Espinel [Orange Labs, CIFRE]
- Wilmer Edicson Garzon Alfonso [IMT Atlantique]
- Karim Manaouil [Inria, from Oct 2020 until Dec 2020]
- Jolan Philippe [IMT Atlantique]
- Dimitri Saingre [Armines]
- Sirine Sayadi [IMT Atlantique]
- Ronan-Alexandre Cherrueau [Inria, Engineer]
- Jad Darrous [Inria, Engineer, until Aug 2020]
- Brice Nedelec [Sigma, Engineer]
- Rémy Pottier [IMT Atlantique, Engineer, until Oct 2020]
- Charlene Servantie [IMT Atlantique, Engineer, until Jun 2020]
- Matthieu Simonin [Inria, Engineer]
Interns and Apprentices
- Karim Manaouil [Inria, until Jul 2020]
- Nathan Rossard [Inria, from Mar 2020 until Aug 2020]
- Anne-Claire Binétruy [Inria]
- Twinkle Jain [Université Northeastern Boston - USA, from Feb 2020 until Aug 2020]
2 Overall objectives
2.1 STACK in a Nutshell
The STACK team addresses challenges related to the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). More specifically, the team is interested in delivering appropriate system abstractions to operate and use massively geo-distributed infrastructures, from the lowest (system) levels to the highest (application development) ones, and addressing crosscutting dimensions such as energy or security. 1 These infrastructures are critical for the emergence of new kinds of applications related to the digitalization of the industry and the public sector (a.k.a. the Industrial and Tactile Internet).
2.2 Toward a STACK for Geo-Distributed Infrastructures
With the advent of Cloud Computing, modern applications have been developed on top of advanced software stacks composed of low-level system mechanisms, advanced middleware and software abstractions. While each of these layers has been designed to enable developers to efficiently use ICT resources without dealing with the burden of the underlying infrastructure aspects, the complexity of the resulting software stack has become a key challenge. As an example, Map/Reduce frameworks such as Hadoop have been developed to benefit from the cpu/storage capacities of distinct servers. Running such frameworks on top of a virtualized cluster (i.e., in a Cloud) can lead to critical situations if the resource management system decides to consolidate all the VMs on the same physical machine 99. In other words, self-management decisions taken in isolation at one level (infrastructure, middleware, or application) may indirectly interfere with the decisions taken by another layer, and globally affect the performance of the whole stack. Considering that geo-distributed ICT infrastructures significantly differ from the Cloud Computing ones regarding heterogeneity, resiliency, and the potential massive distribution of resources and networking environments 63, 60, we can expect that the complexity of the software stacks is going to increase. Such an assumption can be illustrated, for instance, by the sotfware architecture proposed in 2016 by the ETSI Mobile edge computing Industry Specification Group 88. This architecture is structured around a new layer in charge of orchestrating distinct independent cloud systems, a.k.a. Virtual Infrastructure Managers (VIMs) in their terminology. By reusing VIMs, ETSI targets an edge computing resource management that behaves in the same fashion as Cloud Computing ones. While mitigating development requirements, such a proposal hides all the management decisions that might be taken in the VIM of one particular site and thus may lead to conflicting decisions and consequently to non-desired states overall.
Through the STACK team, we propose to investigate the software stack challenge as a whole. We claim it is the only way to limit as much as possible the complexity of the next generation software stack of geo-distributed ICT infrastructures. To reach our goal, we will identify major building blocks that should compose such a software stack, how they should be designed (i.e., from the internal algorithms to the APIs they should expose), and finally how they should interact with each other.
Delivering such a software stack is an ambitious objective that goes beyond the activities of one research group. However, our expertise, our involvements in different consortiums (such as OpenStack) as well as our participation in different collaborative projects enable STACK members to contribute to this challenge in terms of architecture models, distributed system mechanisms and software artefacts, and finally, guideline reports on opportunities and constraints of geo-distributed ICT infrastructures.
3 Research program
STACK research activities have been organized around four research topics. The two first ones are related to the resource management mechanisms and the programming support that are mandatory to operate and use ICT geo-distributed resources (compute, storage, network, and IoT devices). They are transverse to the System/Middleware/Application layers, which generally compose a software stack, and nurture each other (i.e., the resource management mechanisms will leverage abstractions/concepts proposed by the programming support axis and reciprocally). The third and fourth research topics are related to the Energy and Security dimensions (both also crosscutting the three software layers). Although they could have been merged with the first two axes, we identified them as independent research directions due to their critical aspects with respect to the societal challenges they represent. In the following, we detail the actions we plan to do in each research direction.
3.2 Resource Management
The challenge in this axis is to identify, design or revise mechanisms that are mandatory to operate and use a set of massively geo-distributed resources in an efficient manner 47. This covers considering challenges at the scale of nodes, within one site (i.e., one geographical location) and throughout the whole geo-distributed ICT infrastructure. It is noteworthy that the network community has been investigating similar challenges for the last few years 69. To benefit from their expertise, in particular on how to deal with intermittent networks, STACK members have recently initiated exchanges and collaborative actions with some network research groups and telcos (see Sections 9.1 and 10). We emphasize, however, that we do not deliver contributions related to network equipments/protocols. The scientific and technical achievements we aim to deliver are related to the (distributed) system aspects.
Performance Characterization of Low-Level Building Blocks
Although Cloud Computing has enabled the consolidation of services and applications into a subset of servers, current operating system mechanisms do not provide appropriate abstractions to prevent (or at least control) the performance degradation that occurs when several workloads compete for the same resources 99. Keeping in mind that server density is going to increase with physical machines composed of more and more cores and that applications will be more and more data intensive, it is mandatory to identify interferences that appear at a low level on each dimension (compute, memory, network, and storage) and propose counter-measures. In particular, previous studies 99, 58 on pros and cons of current technologies – virtual machines (VMs) 75, 83, containers and microservices – which are used to consolidate applications on the same server, should be extended: In addition to evaluating the performance we can expect from each of these technologies on a single node, it is important to investigate interferences that may result from cross-layer and remote communications 100. We will consider in particular all interactions related to geo-distributed systems mechanisms/services that are mandatory to operate and use geo-distributed ICT infrastructures.
Geo-Distributed System Mechanisms
Although several studies have been highlighting the advantages of geo-distributed ICT infrastructures in various domains (see Section 4), progress on how to operate and use such infrastructures is marginal. Current solutions 29 30 are rather close to the initial Cisco Fog Computing proposal that only allows running domain-specific applications on edge resources and centralized Cloud platforms 37 (in other words, these solutions do not allow running stateful workloads in isolated environments such as containers or VMs). More recently, solutions leveraging the idea of federating VIMs (as the aforementioned ETSI MEC proposal 88) have been proposed. ONAP 62, an industry-driven solution, enables the orchestration and automation of virtual network functions across distinct VIMs. From the academic side, FogBow 40 aims to support federations of Infrastructure-as-a-Service (IaaS) providers. Finally, NIST initiated a collaborative effort with IEEE to advance Federated Cloud platforms through the development of a conceptual architecture and a vocabulary2. Although all these projects provide valuable contributions, they face the aforementioned orchestration limitations (i.e., they do not manage decisions taken in each VIM). Moreover, they all have been designed by only considering the developer/user's perspective. They provide abstractions to manage the life cycle of geo-distributed applications, but do not deliver means to administer the physical resources.
To cope with specifics of Wide-Area networks while delivering most features that made Cloud Computing solutions successful also at the edge, our community should first identify limitations/drawbacks of current resource management system mechanisms with respect to the Fog/Edge requirements and propose revisions when needed 68, 81.
To achieve this aim, STACK members propose to conduct first a series of studies aiming at understanding the software architecture and footprint of major services that are mandatory for operating and using Fog/Edge infrastructures (storage backends, monitoring services, deployment/reconfiguration mechanisms, etc.). Leveraging these studies, we will investigate how these services should be deployed in order to deal with resources constraints, performance variability, and network split brains. We will rely on contributions that have been accomplished in distributed algorithms and self-* approach for the last decade. In the short and medium term, we plan to evaluate the relevance of NewSQL systems 50 to store internal states of distributed system mechanisms in an edge context, and extend our proposals on new storage backends such as key/value stores 49, 94, and burst buffers 101. We also plan to conduct new investigations on data-stream frameworks for Fog and Edge infrastructures 44. These initial contributions should enable us to identify general rules to deliver other advanced system mechanisms that will be mandatory at the higher levels in particular for the deployment and reconfiguration manager in charge of orchestrating all resources.
Capacity Planning and Placement Strategies
An objective shared by users and providers of ICT infrastructures is to limit as much as possible the operational costs while providing the expected and requested quality of service (QoS). To optimize this cost while meeting QoS requirements, data and applications have to be placed in the best possible way onto physical resources according to data sources, data types (stream, graphs), application constraints (real-time requirements) and objective functions. Furthermore, the placement of applications must evolve through time to cope with the fluctuations in terms of application resource needs as well as the physical events that occur at the infrastructure level (resource creation/removals, hardware failures, etc.). This placement problem, a.k.a. the deployment and reconfiguration challenge as it will be described in Section 3.3, can be modelized in many different ways, most of the time by multi-dimensional and multi-objective bin-packing problems or by scheduling problems which are known to be difficult to solve. Many studies have been performed, for example, to optimize the placement of virtual machines onto ICT infrastructures 77. STACK will inherit the knowledge acquired through previous activities in this domain, particularly its use of constraint programming strategies in autonomic managers 73, 72, relying on MAPE (monitor, analyze, plan, and execute) control loops. While constraint programming approaches are known to hardly scale, they enable the composition of various constraints without requiring to change heuristic algorithms each time a new constraint has to be considered 71. We believe it is a strong advantage to deal with the diversity brought by geo-distributed ICT infrastructures. Moreover, we have shown in previous work that decentralized approaches can tackle the scalability issue while delivering placement decisions good enough and sometimes close to the optimal 87.
Leveraging this expertise, we propose, first, to identify new constraints raised by massively geo-distributed infrastructures (e.g., data locality, energy, security, reliability and the heterogeneity and mobility of the underlying infrastructure). Based on this preliminary study, we will explore new placement strategies not only for computation sandboxes but for data (location, replication, streams, etc.) in order to benefit from the geo-distribution of resources and meet the required QoS. These investigations should lead to collaborations with operational research and optimization groups such as TASC, another research group from IMT Atlantique.
Second, we will leverage contributions made on the previous axis “Performance Characterization of Low-Level Building Blocks” to determine how the deployment of the different units (software components and data sets) should be executed in order to reduce as much as possible the time to reconfigure the system (i.e., the Execution phase in the control loop). In some recent work 83, we have shown that the provisioning of a new virtual machine should be done carefully to mitigate boot penalties. More generally, proposing an efficient action plan for the Execution phase will be a major point as Wide-Area-Network specifics may lead to significant delays, in particular when the amount of data to be manipulated is important.
Finally, we will investigate new approaches to decentralize the placement process while considering the geo-distributed context. Among the different challenges to address, we will study how a combination of autonomic managers, at both the infrastructure and application levels 57, could be proposed in a decentralized manner. Our first idea is to geo-distribute a fleet of small control loops over the whole infrastructure. By improving the locality of data collection and weakening combinatorics, these loops would allow the system to address responsiveness and quality expectations.
3.3 Programming Support
We pursue two main research directions relative to new programming support: first, developing new programming models with appropriate support in existing languages (libraries, embedded DSLs, etc.) and, second, providing new means for deployment and reconfiguration in geo-distributed ICT environments, principally supporting the mapping of software onto the infrastructure. For both directions two levels of challenges are considered. On the one hand, the generic level refers to efforts on programming support that can be applied to any kind of distributed software, application or system. On this level, contributions could thus be applied to any of the three layers addressed by STACK (i.e., system, middleware or application). On the other hand, the corresponding generic programming means may not be appropriate in practice (e.g., requirements for more dedicated support, performance constraints, etc.), even if they may lead to interesting general properties. For this reason, a specific level is also considered. This level could be based on the generic one but addresses specific cases or domains.
Programming Models and Languages Extensions
The current landscape of programming support for cloud applications is fragmented. This fragmentation is based on apparently different needs for various kinds of applications, in particular, web-based, computation-based, focusing on the organization of the computation, and data-based applications, within the last case a quite strong dichotomy between applications considering data as sets or relations, close to traditional database applications and applications considering data as real-time streams. This has led to various programming models, in a loose sense, including for instance microservices, graph processing, dataflows, streams, etc. These programming models have mostly been offered to the application programmer in the guise of frameworks, each offering subtle variants of the programming models with various implementation decisions favoring particular application and infrastructure settings. Whereas most frameworks are dedicated to a given programming model, e.g., basic Pregel 82, Hive 95, Hadoop 96, some of them are more general-purpose through the provision of several programming models, e.g., Flink 43 and Spark 79. Finally, some dedicated language support has been considered for some models (e.g., the language SPL underlying IBM Streams 74) as well as core languages and calculi (e.g., 39, 92).
This situation raises a number of challenges on its own, related to a better structuring of the landscape. It is necessary to better understand the various programming models and their possible relations, with the aim of facilitating, if not their complete integration, at least their composition, at the conceptual level but also with respect to their implementations, as specific languages and frameworks.
Switching to massively geo-distributed infrastructures adds to these challenges by leading to a new range of applications (e.g., smart-* applications) that, by nature, require mixing these various programming models, together with a much more dynamic management of their runtime.
In this context, STACK would like to explore two directions:
- First, we propose to contribute to generic programming models and languages to address composability of different programming models 52. For example, providing a generic stream data processing model that can operate under both data stream 43 and operation stream 102 modes, thus streams can be processed in micro batches to favour high throughput or record by record to sustain low latency. Software engineering properties such as separation of concerns and composition should help address such challenges 32, 93. They should also facilitate the software deployment and reconfiguration challenges discussed below.
Second, we plan to revise relevant programming models, the associated specific languages, and their implementation according to the massive geo-distribution of the underlying infrastructure, the data sources, and application end-users. For example, although SPL is extensible and distributed, it has been designed to run on multi-cores and clusters 74. It does not provide the level of dynamicity required by geo-distributed applications (e.g., to handle topology changes, loss of connectivity at the edge, etc.).
Moreover, as more network data transfers will happen within a massively geo-distributed infrastructure, correctness of data transfers should be guaranteed. This has potential impact from the programming models to their implementations.
Deployment and Reconfiguration Challenges
The second research direction deals with the complexity of deploying distributed software (whatever the layer, application, middleware or system) onto an underlying infrastructure. As both the deployed pieces of software and the infrastructures addressed by STACK are large, massively distributed, heterogeneous and highly dynamic, the deployment process cannot be handled manually by developers or administrators. Furthermore, and as already mentioned in Section 3.2, the initial deployment of some distributed software will evolve through time because of the dynamicity of both the deployed software and the underlying infrastructures. When considering reconfiguration, which encompasses deployment as a specific case, the problem becomes more difficult for two main reasons: (1) the current state of both the deployed software and the infrastructure has to be taken into account when deciding on a reconfiguration plan, (2) as the software is already running the reconfiguration should minimize disruption time, while avoiding inconsistencies 80, 85. Many deployment tools have been proposed both in academia and industry 54. For example, Ansible 3, Chef 4 and Puppet 5 are very well-known generic tools to automate the deployment process through a set of batch instructions organized in groups (e.g., playbooks in Ansible). Some tools are specific to a given environment, like Kolla to deploy OpenStack, or the embedded deployment manager within Spark. Few reconfiguration capabilities are available in production tools such as scaling and restart after a fault 6 7. Academia has contributed to generic deployment and reconfiguration models. Most of these contributions are component-based. Component models divide a distributed software as a set of component instances (or modules) and their assembly, where components are connected through well defined interfaces 93. Thus, modeling the reconfiguration process consists in describing the life cycle of different components and their interactions. Most component-based approaches offer a fixed life cycle, i.e., identical for any component 59. Two main contributions are able to customize life cycles, Fractal 42, 35 and its evolutions 32, 33, 56, and Aeolus 51. In Fractal, the control part of a component (e.g., its life cycle) is modeled itself as a component assembly that is highly flexible. Aeolus, on the other hand, offers a finer control on both the evolution and the synchronization of the deployment process by modeling each component life cycle with a finite state machine.
A reconfiguration raises at least five questions, all of them are correlated: (1) why software has to be reconfigured? (monitoring, modeling and analysis) (2) what should be reconfigured? (software modeling and analysis), (3) how should it be reconfigured? (software modeling and planning decisions), (4) where should it be reconfigured? (infrastructure modeling and planning decisions), and (5) when to reconfigure it? (scheduling algorithms). STACK will contribute to all aspects of a reconfiguration process as described above. However, according to the expertise of STACK members, we will focus mainly on the three first questions: why, what and how, leaving questions where and when to collaborations with operational research and optimization teams.
First of all, we would like to investigate why software has to be reconfigured? Many reasons could be mentioned, such as hardware or software fault tolerance, mobile users, dynamicity of software services, etc. All those reasons are related somehow to the Quality of Service (QoS) or the Service Level Agreement (SLA) between the user and the Cloud provider. We first would like to explore the specificities of QoS and SLAs in the case of massively geo-distributed ICT environments 89. By being able to formalize this question, analyzing the requirement of a reconfiguration will be facilitated.
Second, we think that four important properties should be enhanced when deploying and reconfiguring models in massively geo-distributed ICT environments. First, as low-latency applications and systems will be subject to deployment and reconfiguration, the performance and the ability to scale are important. Second, as many different kinds of deployments and reconfigurations will concurrently hold within the infrastructure, processes have to be reliable, which is facilitated by a fine-grained control of the process. Finally, as many different software elements will be subject to deployment and reconfiguration, common generic models and engines for deployment and reconfiguration should be designed 41. For these reasons, we intend to go beyond Aeolus by: first, leveraging the expression of parallelism within the deployment process, which should lead to better performance; second, improving the separation of concerns between the component developer and the reconfiguration developer; third, enhancing the possibility to perform concurrent and decentralized reconfigurations.
Research challenges relative to programming support have been presented above. Many of these challenges are related, in different manners, to the resource management level of STACK or to crosscutting challenges, i.e., energy and security. First, one can notice that any programming model or deployment and reconfiguration implementation should be based on mechanisms related to resource management challenges. For this reason, all challenges addressed within this section are linked with lower level building blocks presented in Section 3.2. Second, as detailed above, deployment and reconfiguration address at least five questions. The question what? is naturally related to programming support. However, questions why, how?, where? and when? are also related to Section 3.2, for example, to monitoring and capacity planning. Moreover, regarding the deployment and reconfiguration challenges, one can note that the same goals recursively happen when deploying the control building blocks themselves (bootstrap issue). This comforts the need to design generic deployment and reconfiguration models and frameworks. These low-level models should then be used as back-ends to higher-level solutions. Finally, as energy and security are crosscutting themes within the STACK project, many additional energy and security considerations could be added to the above challenges. For example, our deployment and reconfiguration frameworks and solutions could be used to guarantee the deployment of end-to-end security policies or to answer specific energy constraints 70 as detailed in the next section.
The overall electrical consumption of DCs grows according to the demand of Utility Computing. Considering that the latter has been continuously increasing since 2008, the energy footprint of Cloud services overall is nowadays critical with about 91 billion kilowatt-hours of electricity 91. Besides the ecological impact, the energy consumption is a predominant criterion for providers since it determines a large part of the operational cost of their infrastructure. Among the different appraoches that have been investigated to reduce the energy footprint, some studies have been ingestigating the use of renewable energy sources to power microDCs 64. Workload distribution for geo-distributed DCs is also another promising approach 66, 78, 97. Our research will extend these results with the ultimate goal of considering the different opportunities to control the energy footprint across the whole stack (hardware and software opportunities, renewable energy, thermal management, etc.). In particular, we identified several challenges that we will address in this context within the STACK framework.
First, we propose to evaluate the energy efficiency of low-level building blocks, from the viewpoints of computation (VMs, containers, microkernel, microservices) 55 and data (hard drives, SSD, in-memory storage, distributed file systems). For computations, in the continuity of our previous work 53, 73, we will investigate workload placement policies according to energy (minimizing energy consumption, power capping, thermal load balancing, etc.). Regarding the data dimension, we will investigate, in particular, the trade-offs between energy consumption and data availability, durability and consistency 48, 94. Our ambition is to propose an adaptive energy-aware data layout and replication scheme to ensure data availability with minimal energy consumption. It is noteworthy that these new activities will also consider our previous work on DCs partially powered by renewable energy (see the SeDuCe project, in Section 7.2), with the ultimate goal of reducing the CO footprint.
Second, we will complete current studies to understand pros and cons of massively geo-distributed infrastructures from the energy perspective. Addressing the energy challenge is a complex task that involves considering several dimensions such as the energy consumption due to the physical resources (CPU, memory, disk, network), the performance of the applications (from the computation and data viewpoints), and the thermal dissipation caused by air conditioning in each DC. Each of these aspects can be influenced by each level of the software stack (i.e., low-level building blocks, coordination and autonomous loops, and finally application life cycle). In previous projects, we have studied and modeled the consumption of the main components, notably the network, as part of a single microDC. We plan to extend these models to deal with geo-distribution. The objective is to propose models that will enable us to refine our placement algorithms as discussed in the next paragraph. These models should be able to consider the energy consumption induced by all WAN data exchanges, including site-to-site data movements as well as the end users' communications for accessing virtualized resources.
Third, we expect to implement green-energy-aware balancing strategies, leveraging the aforementioned contributions.
Although the infrastructures we envision increase complexity (because WAN aspects should also be taken into account), the geo-distribution of resources brings several opportunities from the energy viewpoint. For instance, it is possible to define several workload/data placement policies according to renewable energy availability. Moreover, a tightly-coupled software stack allows users to benefit from such a widely distributed infrastructure in a transparent way while enabling administrators to balance resources in order to benefit from green energy sources when available.
An important difficulty, compared to centralized infrastructures, is related to data sharing between software instances. In particular, we will study issues raised by the distribution and replication of services across several microDCs. In this new context, many challenges must be addressed: where to place the data (Cloud, Edge) in order to mitigate dat a movements? What is the impact in terms of energy consumption, network and response time of these two approaches? How to manage the consistency of replicated data/services? All these aspects must be studied and integrated into our placement algorithms.
Fourth, we will investigate the energy footprint of the current techniques that address failure and performance variability in large-scale systems. For instance, stragglers (i.e., tasks that take a significantly longer time to finish than the normal execution time) are natural results of performance variability, they cause extra resource and energy consumption. Our goal is to understand the energy overhead of these techniques and introduce new handling techniques that take into consideration the energy efficiency of the platform 86.
Finally, in order to answer specific energy constraints, we want to reify energy aspects at the application level and propose a metric related to the use of energy (Green SLA 31), for example to describe the maximum allowed CO emissions of a Fog/Edge service. Unlike other approaches 67, 36, 65 that attempt to identify the best trade-off, we want to offer to developers/end-users the opportunity to select the best choice between application performance, correctness and energy footprint. Such a capability will require reifying the energy dimension at the level of big-data and interactive applications. Besides, with the emergence of renewable energy (e.g., solar panels for microDC), investigating the energy consumption vs performance trade-off 70 and the smart usage of green energy for ICT geo-distributed services seems promising. For example, we want to offer the opportunity to developers/end-users to control the scaling of the applications based on this trade-off instead of current approaches that only considered application load. Providing such a capability will also require appropriate software abstractions.
Because of its large size and complex software structure, geo-distributed applications and infrastructures are particularly exposed to security and privacy issues 90. They are subject to numerous security vulnerabilities that are frequently exploited by malicious attackers in order to exfiltrate personal, institutional or corporate data. Securing these systems require security and privacy models and corresponding techniques that are applicable at all software layers in order to guard interactions at each level but also between levels. However, very few security models exist for the lower layers of the software stack and no model enables the handling of interactions involving the complete software stack. Any modification to its implementation, deployment status, configuration, etc., may introduce new or trigger existing security and privacy issues. Finally, applications that execute on top of the software stack may introduce security issues or be affected by vulnerabilities of the stack. Overall, security and privacy issues are therefore interdependent with all other activities of the STACK team and constitute an important research topic for the team.
As part of the STACK activities, we consider principally security and privacy issues related to the vertical and horizontal compositions of software components forming the software stack and the distributed applications running on top of it. Modifications to the vertical composition of the software stack affect different software levels at once. As an example, side-channel attacks often target virtualized services (i.e., services running within VMs); attackers may exploit insecure hardware caches at the system level to exfiltrate data from computations at the higher level of VM services 84, 98. Security and privacy issues also affect horizontal compositions, that is, compositions of software abstractions on one level: most frequently horizontal compositions are considered on the level of applications/services but they are also relevant on the system level or the middleware level, such as compositions involving encryption and database fragmentation services.
The STACK members aim at addressing two main research issues: enabling full-stack (vertical) security and per-layer (horizontal) security. Both of these challenges are particularly hard in the context of large geo-distributed systems because they are often executed on heterogeneous infrastructures and are part of different administrative domains and governed by heterogeneous security and privacy policies. For these reasons they typically lack centralized control, are frequently subject to high latency and are prone to failures.
Concretely, we will consider two classes of security and privacy issues in this context. First, on a general level, we strive for a method for the programming and reasoning about compositions of security and privacy mechanisms including, but not limited to, encryption, database fragmentation and watermarking techniques. Currently, no such general method exists, compositions have only been devised for specific and limited cases, for example, compositions that support the commutation of specific encryption and watermarking techniques 76, 45. We provided preliminary results on such compositions 46 and have extended them to biomedical, notably genetic, analyses in the e-health domain 38. Second, on the level of security and privacy properties, we will focus on isolation properties that can be guaranteed through vertical and horizontal composition techniques. We have proposed first results in this context in form of a compositional notion of distributed side channel attacks that operate on the system and middleware levels 34.
It is noteworthy that the STACK members do not have to be experts on the individual security and privacy mechanisms, such as watermarking and database fragmentation. We are, however, well-versed in their main properties so that we can integrate them into our composition model. We also interact closely with experts in these techniques and the corresponding application domains, notably e-health for instance, in the context of the PrivGen project8, see Section 10.
More generally, we highlight that security issues in distributed systems are very closely related to the other STACK challenges, dimensions and research directions. Guaranteeing security properties across the software stack and throughout software layers in highly volatile and heterogeneous geo-distributed systems is expected to harness and contribute results to the self-management capabilities investigated as part of the team's resource management challenges. Furthermore, security and privacy properties are crosscutting concerns that are intimately related to the challenges of application life cycle management. Similarly, the security issues are also closely related to the team's work on programming support. This includes new means for programming, notably in terms of event and stream programming, but also the deployment and reconfiguration challenges, notably concerning automated deployment. As a crosscutting functionality, the security challenges introduced above must be met in an integrated fashion when designing, constructing, executing and adapting distributed applications as well as managing distributed resources.
4 Application domains
Supporting industrial actors and open-source communities in building an advanced software management stack is a key element to favor the advent of new kinds of information systems as well as web applications. Augmented reality, telemedecine and e-health services, smart-city, smart-factory, smart-transportation and remote security applications are under investigations. Although, STACK does not intend to address directly the development of such applications, understanding their requirements is critical to identify how the next generation of ICT infrastructures should evolve and what are the appropriate software abstractions for operators, developers and end-users. STACK team members have been exchanging since 2015 with a number of industrial groups (notably Orange Labs and Airbus), a few medical institutes (public and private ones) and several telecommunication operators in order to identify both opportunities and challenges in each of these domains, described hereafter.
4.2 Industrial Internet
The Industrial Internet domain gathers applications related to the convergence between the physical and the virtual world. This convergence has been made possible by the development of small, lightweight and cheap sensors as well as complex industrial physical machines that can be connected to the Internet. It is expected to improve most processes of daily life and decision processes in all societal domains, affecting all corresponding actors, be they individuals and user groups, large companies, SMEs or public institutions. The corresponding applications cover: the improvement of business processes of companies and the management of institutions (e.g., accounting, marketing, cloud manufacturingi, etc.); the development of large “smart” applications handling large amounts of geo-distributed data and a large set of resources (video analytics, augmented reality, etc.); the advent of future medical prevention and treatment techniques thanks to the intensive use of ICT systems, etc. We expect our contributions will favor the rise of efficient, correct and sustainable massively geo-distributed infrastructures that are mandatory to design and develop such applications.
4.3 Internet of Skills
The Internet of Skills is an extension of the Industrial Internet to human activities. It can be seen as the ability to deliver physical experiences remotely (i.e., via the Tactile Internet). Its main supporters advocate that it will revolutionize the way we teach, learn, and interact with pervasive resources. As most applications of the Internet of Skills are related to real time experiences, latency may be even more critical than for the Industrial Internet and raise the locality of computations and resources as a priority. In addition to identifying how Utility Computing infrastructures can cope with this requirement, it is important to determine how the quality of service of such applications should be defined and how latency and bandwidth constraints can be guaranteed at the infrastructure level.
The e-Health domain constitutes an important societal application domain of the two previous areas. The STACK teams is investigating distribution, security and privacy issues in the fields of systems and personalized (aka. precision) medicine. The overall goal in these fields is the development of medication and treatment methods that are tailored towards small groups or even individual patients.
We are working, as part of the ongoing PrivGen CominLabs collaborative project on new means for the sharing of genetic data and applications in the Cloud. More generally, we are applying and developing corresponding techniques for the medical domains of genomics, immunobiology and transplantalogy in the international network SHLARC and the regional networks SysMics and Oncoshare (see Section 10): there, we investigate how to secure and preserve privacy if potentially sensitive personal data is moved and processed by distributed biomedical analyses.
We are also involved in the SyMeTRIC regional initiative where preliminary studies have been conducted in order to build a common System Medicine computing infrastructure to accelerate the discovery and validation of bio-markers in the fields of oncology, transplantation, and chronic cardiovascular diseases. The challenges were related to the need of being able to perform analyses on data that cannot be moved between distinct locations.
The STACK team will continue to contribute to the e-Health domain by harnessing advanced architectures, applications and infrastructures for the Fog/Edge.
4.5 Network Virtualization and Mobile Edge Services
Telecom operators have been among the first to advocate the deployment of massively geo-distributed infrastructures, in particular through working groups such as Mobile Edge Computing at the European Telecommunication Standards Institute9. The initial reason is that geo-distributed infrastructures will enable Telecom operators to virtualize a large part of their resources and thus reduce capital and operational costs. As an example, we are investigating through the I/O Lab, the joint lab between Orange and Inria, how can a Centralized Radio Access Networks (a.k.a. C-RAN or Cloud-RAN) be supported for 5G networks. We highlight that our expertise is not on the network side but rather on where and how we can deploy, allocate and reconfigure software components, which are mandatory to operate a C-RAN infrastructure, in order to guarantee the quality of service expected by the end-users. Finally, working with actors from the network community is a valuable advantage for a distributed system research group such as STACK. Indeed, achievements made within one of the two communities serve the other.
5 Social and environmental responsibility
5.1 Footprint of research activities
In addition to the international travels10, the environmental footprint of our research activities is linked to our intensive use of large-scale testbeds such as Grid'5000 (STACK members are often in the top10 list of the largest consummers). Although the access to such facilities is critical to move forward in our research roadmap, it is important to recognize that they have a strong environmental impact as decribed in the next paragraph.
5.2 Impact of research results
The environmental impact of digital technology is a major scientific and societal challenge. Even though the software remains virtual objects, it is executed on very real hardware contributing to the carbon footprint. This impact materializes during the manufacture / destruction of hardware infrastructure (estimated at 45% of digital consumption in 2018 by the The Shift Project) and during the software use phase via terminals, networks and data centers. data (estimated at 55%). Stack members have been studying various approaches for several years to reduce the energy footprint of digital infrastructures during the use phase. The work carried out revolves around two main axes: (i) reducing the energy footprint of infrastructures and (ii) adapting the software applications hosted by these infrastructures according to the energy available. More precisely, this second axe investigates possible improvements that could be made by the end-users of the software themselves. At scale, involving end-users in decision-making processes concerning energy consumption would lead to more frugal Cloud computing. For instance, in the GL4MA project (cf. Section 10), we propose that the end-users customize the software in SaaS mode to contribute to reducing their carbon footprint by using either fewer resources or resources powered directly by renewable energy.
6 Highlights of the year
Regarding scientific results, the team has produced a number of outstanding results on the management of resources and data in large-scale infrastructures. In particular, the team published a survey in the ACM Computing Journal on the service placement problem in Fog and Edge Computing 1. The team obtained also three publications in the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2020), a major conference in the area of Cloud Computing systems 4, 2, 3.
On the software side, the team has pursued its efforts on the development of the EnosLib library and the resulting artifacts to help researchers perform experiment campaigns: https://
In 2020, the team has received one individual award:
- Adrien Lebre received the IEEE TCC Editorial Excellence and Eminence Award to recognize his exceptional contributions to the TCC Editorial Board.
We would like also to highlight two elements that underline the visibility and recognition of the team nationally and internationally. First, at the national level, the team has obtained two grants from the Appel ANR générique 2020 for the SeMaFoR and PICNIC projects. The former's objective is to model, design and develop a generic and decentralised solution for the self-management of Fog resources using a Fog Architecture Description Language, a collaborative/consensual decision-making process and an automatic reconfiguration coordination mechanism. The latter addresses the problem of transferring large data-sets between two geographically distant sites. More precisely, the objective is to develop network mechanisms in the linux kernel to sending in parallel a file by breaking the VM bottleneck by using the property that the file is (all or part) duplicated on several disks.
Second, at the international level, Helene Coullon obtained a 20% Adjunct Professor position at the Arctic University of Norway in Tromsø (UiT)13. This position has been offered to Helene after an invited talk (at the end of 2019), and discussions on collaborations with the team involved in the DAO project14 leaded by Otto Anshus. The position has started in september 2020 and will end in august 2022. Helene has the responsability to give a few courses to the UiT students (Masters and PhD) in advanced distributed systems. Furthermore, this position facilitates interactions for research collaborations on the DAO project in which Helene brings an expertise in dynamic software configuration and reconfiguration.
7 New software and platforms
7.1 New software
- Name: Madeus Application Deployer
- Keywords: Automatic deployment, Distributed Software, Component models, Cloud computing
- Scientific Description: MAD is a Python implementation of the Madeus deployment model for multi-component distributed software. Precisely, it allows to: 1. describe the deployment process and the dependencies of distributed software components in accordance with the Madeus model, 2. describe an assembly of components, resulting in a functional distributed software, 3. automatically deploy the component assembly of distributed software following the operational semantics of Madeus.
- Functional Description: MAD is a Python implementation of the Madeus deployment model for multi-component distributed software. Precisely, it allows to: 1. describe the deployment process and the dependencies of distributed software components in accordance with the Madeus model, 2. describe an assembly of components, resulting in a functional distributed software, 3. automatically deploy the component assembly of distributed software following the operational semantics of Madeus.
- Release Contributions: Initial submission with basic functionalities of MAD
- News of the Year: Operational prototype.
- Publications: hal-01858150, hal-01897803
- Authors: Maverick Chardet, Hélène Coullon, Christian Pérez, Dimitri Pertin
- Contacts: Maverick Chardet, Christian Pérez, Dimitri Pertin, Hélène Coullon
- Participants: Christian Pérez, Dimitri Pertin, Hélène Coullon, Maverick Chardet
- Partners: IMT Atlantique, LS2N, LIP
- Keywords: Cloud storage, Virtual Machine Image, Geo-distribution
Nitro is a storage system that is designed to work in geo-distributed cloud environments (i.e., over WAN) to efficiently manage Virtual Machine Images (VMIs).
Nitro employs fixed-size deduplication to store VMIs. This technique contributes to minimizing the network cost. Also, Nitro incorporates a network-aware scheduling algorithm (based on max flow algorithm) to determine which chunks should be pulled from which site in order to reconstruct the corresponding image on the destination site, with minimal (provisioning) time.
- Functional Description: Geo-distributed Storage System to optimize Images (VM, containers, ...) management, in terms of cost and time, in geographically distributed cloud environment (i.e. data centers are connected over WAN).
gitlab. inria. fr/ jdarrous/ nitro
- Authors: Jad Darrous, Shadi Ibrahim, Christian Pérez
- Contacts: Jad Darrous, Shadi Ibrahim, Christian Pérez
- Keywords: Simulation, Virtualization, Scheduling
VMPlaces is a dedicated framework to evaluate and compare VM placement algorithms. This framework is composed of two major components: the injector and the VM placement algorithm. The injector is the generic part of the framework (i.e. the one you can directly use) while the VM placement algorithm is the part you want to study (or compare with available algorithms). Currently, the VMPlaceS is released with three algorithms:
Entropy, a centralized approach using a constraint programming approach to solve the placement/reconfiguration VM problem
Snooze, a hierarchical approach where each manager of a group invokes Entropy to solve the placement/reconfiguration VM problem. Note that in the original implementation of Snooze, it is using a specific heuristic to solve the placement/reconfiguration VM problem. As the sake of simplicity, we have simply reused the entropy scheduling code.
DVMS, a distributed approach that dynamically partitions the system and invokes Entropy on each partition.
beyondtheclouds. github. io/ VMPlaceS/
- Contact: Adrien Lèbre
- Participants: Adrien Lèbre, Jonathan Pastor, Mario Südholt
- Name: Experimental eNvironment for OpenStack
- Keywords: OpenStack, Experimentation, Reproducibility
Enos workflow :
A typical experiment using Enos is the sequence of several phases: - enos up : Enos will read the configuration file, get machines from the resource provider and will prepare the next phase - enos os : Enos will deploy OpenStack on the machines. This phase rely highly on Kolla deployment. - enos init-os : Enos will bootstrap the OpenStack installation (default quotas, security rules, ...) - enos bench : Enos will run a list of benchmarks. Enos support Rally and Shaker benchmarks. - enos backup : Enos will backup metrics gathered, logs and configuration files from the experiment.
enos. readthedocs. io/ en/ stable/
- Contacts: Adrien Lèbre, Matthieu Simonin
- Partner: Orange Labs
- Name: EnOSlib is a library to help you with your experiments
- Keywords: Distributed Applications, Distributed systems, Evaluation, Grid Computing, Cloud computing, Experimentation, Reproducibility, Linux, Virtualization
EnOSlib is a library to help you with your distributed application experiments. The main parts of your experiment logic is made reusable by the following EnOSlib building blocks:
- Reusable infrastructure configuration: The provider abstraction allows you to run your experiment on different environments (locally with Vagrant, Grid’5000, Chameleon and more) - Reusable software provisioning: In order to configure your nodes, EnOSlib exposes different APIs with different level of expressivity - Reusable experiment facilities: Tasks help you to organize your experimentation workflow.
EnOSlib is designed for experimentation purpose: benchmark in a controlled environment, academic validation …
discovery. gitlabpages. inria. fr/ enoslib/
- Publications: hal-01664515, hal-01689726
- Contact: Matthieu Simonin
- Name: Concerto
- Keywords: Reconfiguration, Distributed Software, Component models, Dynamic software architecture
- Functional Description: Concerto is an implementation of the formal model Concerto written in Python. Concerto aloows to : 1. describe the life-cycle and the dependencies of software components, 2. describe a components assembly that forms the overall life-cycle of a distributed software, 3. automatically reconfigure a Concerto assembly of components by using a set of reconfiguration instructions as well as a formal operational semantics.
gitlab. inria. fr/ VeRDi-project/ concerto
- Contacts: Maverick Chardet, Hélène Coullon, Christian Pérez
- Partners: IMT Atlantique, LS2N, LIP
7.2 New platforms
OpenStack is the de facto open-source management system to operate and use Cloud Computing infrastructures. Started in 2012, the OpenStack foundation gathers 500 organizations including groups such as Intel, AT&T, RedHat, etc. The software platform relies on tens of services with a 6-month development cycle. It is composed of more than 2 millions of lines of code, mainly in Python, just for the core services. While these aspects make the whole ecosystem quite swift, they are also good signs of maturity of this community.
We created and animated between 2016 and 2018 the Fog/Edge/Massively Distributed (FEMDC) Special Interest Group15 and have been contributing to the Performance working group since 2015. The former investigates how OpenStack can address Fog/Edge Computing use cases whereas the latter addresses scalability, reactivity and high-availability challenges. In addition to releasing white papers and guidelines 60, the major result from the academic view point is the aforementioned EnOS solution, a holistic framework to conduct performance evaluations of OpenStack (control and data plane). In May 2018, the FEMDC SiG turned into a larger group under the control of the OpenStack foundation. This group gathers large companies such as Verizon, ATT, etc. Although our involvment has been less important in 2020, our participation is still signficant. For instance, we co-signed the second white paper delivered by the edge working group in 2020 61.
Grid'5000 is a large-scale and versatile testbed for experiment-driven research in all areas of computer science, with a focus on parallel and distributed computing including Cloud, HPC and Big Data. It provides access to a large amount of resources: 12000 cores, 800 compute-nodes grouped in homogeneous clusters, and featuring various technologies (GPU, SSD, NVMe, 10G and 25G Ethernet, Infiniband, Omni-Path) and advanced monitoring and measurement features for traces collection of networking and power consumption, providing a deep understanding of experiments. It is highly reconfigurable and controllable. STACK members are strongly involved into the management and the supervision of the testbed, notably through the steering committee or the SeDuCe testbed described hereafter.
The SeDuCe Project aims to deliver a research testbed dedicated to holistic research studies on energetical aspects of datacenters. Part of the Grid'5000 Nantes' site, this infrastructure is composed of probes that measure the power consumption of each server, each switch and each cooling system, and also measure the temperature at the front and the back of each servers. These sensors enable reasearch to cover a full spectrum of the energetical aspect of datacenters, such as cooling and power consumption depending of experimental conditions.
The testbed is connected to renewable energy sources (solar panels). This “green” datacenter enables researchers to perform real experiment-driven studies on fields such as temperature based scheduling or “green” aware software (i.e., software that take into account renewable energies and weather conditions).
In 2020, we developed a first step for a deployment and reservation system for the Edge Computing (Raspberry Pi Cluster). In November 2020, we also began the study of a device to cloud deployment system, funded by the CNRS, in the Kabuto project, in connection with the SILECS project.
STACK Members are involved in the definition and bootstrap of the SILECS infrastructure. This infrastructure can be seen as a merge of the Grid'5000 and FIT testbeds with the goal of providing a common platform for experimental computer Science (Next Generation Internet, Internet of things, clouds, HPC, big data, etc.). In 2020, STACK members took part to the PIA3 SILECS proposal submitted to the national PIA-3 call 16 and the submission to the European EFRI SLICES proposal 17.
8 New results
8.1 Resource Management
Participants: Adwait Jitendra Bauskar, Ronan-Alexandre Cherrueau, Bastien Confais, Marie Delavergne, David Espinel, David Guyon, Shadi Ibrahim, Remous Aris Koutsiamanis, Thomas Lambert, Adrien Lebre, Karim Manaouil, Javier Rojas Balderrama, Matthieu Simonin, Alexandre Van Kempen.
In 2020, we had multiple contributions in the field of resource management of geo-distributed cloud infrastructures, with a particular focus on Fog and Edge computing. First, we present contributions regarding the resource placement problem, for both traditional and data streaming applications. We then address issues regarding the network complications in Fog/Edge computing regarding the network itself and storage. To improve reproducibility and experimentation in resource management we also introduce a new simulation tool for edge computing use-cases. Following our multi-year work on this subject we then re-evaluate past suggested best practices and the readiness of modern resource management systems (Kubernetes) for the edge-computing case. Finally, we also start addressing the IoT part of the Cloud IoT continuum by presenting a scheduler for wireless IoT network resources.
In 5, we presented a survey of current research conducted on Service Placement Problem (SPP) in the Fog/Edge Computing. To support the large and various applications generated by the Internet of Things (IoT), Fog Computing has been introduced to complement the Cloud Computing and offer Cloud-like services at the edge of the network with low latency and real-time responses. Large-scale, geographical distribution and heterogeneity of edge computational nodes make service placement in such infrastructure a challenging issue. Diversity of user expectations and IoT devices characteristics also complexify the deployment problem. Based on a new classification scheme, we analysed current proposals and discussed issues and challenges our community should deal with.
In 21, we propose a model to evaluate Maximum Sustainable Throughput (MST) for operators placement in the Edge, with a strong focus on the heterogeneous nature of network. We then introduce an optimal operators placement for stream data applications in the Edge using constraint programming. Through simulations, we show how existing placement strategies that target overall communications reduction often fail to keep up with the rate of data streams. Importantly, the constraint programming-based operators placement is able to sustain up to 5x increased data ingestion compared to baseline strategies including resource-aware placements and graph partitioning-based placement.
In 13, we introduce DIMINET, a distributed module to interconnect independent networking resources across multiple locations in an automatized and transparent manner. It is nowadays accepted that the deployment of a geo-distributed cloud infrastructure , leveraging for instance Point-of-Presences at the edge of the network, could better fit the requirements of Network Function Virtualization services and Internet of Things applications. The envisioned architecture to operate such a widely distributed infrastructure relies on executing one instance of a Virtual Infrastructure Manager (VIM) per location and implement appropriate code to enable collaborations between them when needed. However, delivering the mechanisms that allow the collaborations is complex and error prone task. This is particularly true for the one in charge of establishing connectivity among VIM instances on-demand. Besides the reconfiguration of the network equipment, the main challenge is to design a mechanism that can offer usual network virtualization operations to the users while dealing with scalability and intermittent network properties of geo-distributed infrastructures. To deal with these challenges, we designed DIMINET, a DIstributed Module for Inter-site NETworking services. DIMINET relies on a decentralized architecture where each agent communicates with others only if needed. Moreover, there is no global view of all networking resources but each agent is in charge of interconnecting resources that have been created locally. This approach enables us to mitigate management traffic and keep each site operational in case of network partitions. A promising approach to make other cloud-services collaborative on-demand.
In 22, we describe in detail the storage solution we designed between 2017 and 2019 for Fog/Edge infrastructures. Our initial proposal consisted in coupling an object store system and a scale-out NAS (Network Attached Storage) allowing both scalability and performance. This architecture has been extended with a new protocol inspired from the Domain Name System (DNS) to manage replicas in a context of mobility. The different versions have been evaluated over Grid'5000, presenting each time promising performance.
In 12, we introduce a simulation engine dedicated to the evaluation and comparison of scheduling and data movement policies for edge computing use-cases. Although several strategies have been proposed, they have been evaluated through ad-hoc simulator extensions that are, when available, usually not maintained. This is a critical problem because it prevents researchers to-easily-perform fair comparisons between different proposals. We proposed to address this limitation by developping an extension to the Batsim/SimGrid toolkit. Our tool includes a plug-in system that allows researchers to add new models in order to cope with the diversity of edge computing devices. Moreover, it includes an injector that allows the simulator to replay a series of events captured in real infrastructures. We demonstrated the relevance of such a simulation toolkit by studying two scheduling strategies with four data movement policies on top of a simulated version of the Qarnot Computing platform, a production edge infrastructure based on smart heaters. We chose this use-case as it illustrates the heterogeneity as well as the uncertainties of edge infrastructures. Our ultimate goal is to gather industry and academics around a common simulator so that efforts made by one group can be factorised by others.
In 26, we present a break through apporach in the design of resource management systems (RMSes) for the edge. Two years ago, at HotEdge'18, we alerted our community of the importance of delivering a RMS to favor the advent of the edge computing paradigm. While new initiatives have been proposed, they are far from offering the expected features to administrate and use geo-distributed infrastructures. However, we claim our community can move forward by focusing on the collaborations of mutliple instances of the same RMS instead of developping a new solution from scratch. Our proposal leverages service concepts, dynamic composition as well as programming software abstractions. Beyond the development of a resource management system for edge infrastructures, our proposal may lead to a new way of distributing applications where intermittent network is the norm.
In 28, we provide reflections regarding how Kubernetes, the well-known container orchestration platform can cope with edge infrastructure specifics. With the advent of Edge Computing era, DevOps expect to find features that made the success of containerized applications in the cloud, also at the edge. However, orchestration systems have not been designed to deal with resources geo-distribution aspects such as latency, intermittent networks, locality-awareness, etc. In other words, it is unclear whether they could be directly used on top of such massively distributed infrastructures or whether they must be revised. To provide the first elements of answers, we conducted an experimental campaign to analyze the impact of WAN links on the vanilla Kubernetes. This study raised several challenges our community should consider in order to better address geo-distribution concerns in orchestration platforms.
In 19, we propose and evaluate a flexible centralized controller for scheduling wireless networks. The context of this work encompasses wireless networks within the wider Internet of Things (IoT) field and in particular addresses the requirements and limitations within the narrower Industrial Internet of Things (IIoT) sub-field. The overall aim of this work is to produce wireless networking solutions for industrial applications. The challenges include providing high reliability and low latency guarantees, comparable to existing wired solutions, within a noisy wireless medium and using generally computationally-and energy-restrained network nodes. We describe the development of a centralized controller for Wireless Industrial Networks, currently aimed at IEEE Std 802.15.4-2015 Time Slotted Channel Hopping protocol. Our controller takes a high-level network-centric problem description as input, translates it to a low-level representation and uses that to retrieve a solution from a Satisfiability Modulo Theories (SMT) solver, translating the solution back to a higher-level network-centric representation. The advantages of our solution are the ability to gain the added flexibility, higher ease of deployment, and lower deployment cost offered by wireless networks by generating configurable and flexible schedules for these applications.
8.2 Programming Support
Participants: Maverick Chardet, Hélène Coullon, Jolan Philippe.
In 2020, our contributions on programming support are related to two aspects, at first glance without any link. Yet, they both are related to the autonomic adaptation of systems. One can note that making distributed systems autonomous in their deployment and management is of high importance at the scale of Fog and Edge computing, to avoid human errors and guarantee safety properties. It is usual when speaking about autonomic to introduce the MAPE-K loop, where (M) is the monitoring of systems and environment, (A) is the analysis of the situation and the decision of new adaptations, (P) is the planning phase of the adaptation and (E) is the execution of the plan. These 4 steps share a knowledge (K). The first contribution detailed below answers the (A) phase, in the specific case of low-code platforms, by computing the most adapted parallelism and configuration strategy according to the current state (i.e., application, environment). The second publication contribute to building a generic model and tool to build efficient and safe execution of distributed systems adaptation (E), i.e., reconfiguration. By formalizing its semantics and offering performance prediction tools, this second contribution also open the door to automated computation of a reconfiguration plan (P).
In 14 we discuss programming support to automate the choice of the most adequate parallelism strategy and distributed configuration for Low-code development platforms. Low-code development platforms are taking an important place in the model-driven engineering ecosystem, raising new challenges, among which transparent efficiency or scalability. Indeed, the increasing size of models leads to difficulties in interacting with them efficiently. To tackle this scalability issue, some tools are built upon specific computational strategies exploiting reactivity, or parallelism. However, their performances may vary depending on the specific nature of their usage. Choosing the most suitable computational strategy for a given usage is a difficult task which should be automated. Besides, the most efficient solutions may be obtained by the use of several strategies at the same time. This paper motivates the need for a transparent multi-strategy execution mode for model-management operations. We present an overview of the different computational strategies used in the model-driven engineering ecosystem, and use a running example to introduce the benefits of mixing strategies for performing a single computation. This example helps us present our design ideas for a multi-strategy model-management system. The code-related and DevOps challenges that emerged from this analysis are also presented.
In 11 the dynamic reconfiguration of distributed software systems is tackled. Reconfiguration is nowadays gaining interest because of the emergence of dynamic IoT and smart applications as well as large scale dynamic infrastructures (e.g., Fog and Edge computing). When quality of service and experience is of prime importance, efficient reconfiguration is necessary, as well as performance predictability to decide when a reconfiguration should occur. This paper tackles the problem of efficient execution of a reconfiguration plan and its predictability with Concerto, a reconfiguration model supporting a high level of parallelism in reconfiguration operations. Evaluation performed on synthetic cases and on two real production scenarios show that Concerto provides better performance than state-of-the-art systems with accurate time estimation.
8.3 Energy-aware computing
Participants: Emile Cadorel, Hélène Coullon, Remous Aris Koutsiamanis, Thomas Ledoux, Jean-Marc Menaud, Jonathan Pastor, Dimitri Saingre, Yewan Wang.
In 2020, we achieved three contributions in the field of Energy-aware computing. First, we have addressed blockchain-based solutions and proposed a framework to benchmark different blockchain technologies (e.g., CPU consumption). Then, tackling the problem of scheduling multiuser workflows from the Cloud provider point of view, we have proposed an algorithm that optimizes two metrics: the energy consumption and the fairness between users. Finally, we have adressed the issue of network energy consumption in the context of IoT devices.
Over the last years, research activities on blockchain technologies have fairly increased. Firstly introduced with Bitcoin, some projects have since emerged to create or improve blockchain features like privacy while others propose to overcome technical limitations such as scalability and energy consumption. New proposals are often evaluated with ad hoc tools and experimental environments. Reproducibility and comparison of these new contributions with the state of the art of the blockchain technologies are therefore complicated. To the best of our knowledge, only a few tools partially address the design of a generic benchmarking of blockchain technologies (e.g., load generation). In 15, we introduce BCTMark, a generic framework for benchmarking blockchain technologies on an emulated network in a reproducible way. To illustrate the portability of experiments using BCTMark, we have conducted some experiments on two different testbeds: a cluster of Dell PowerEdge R630 servers (Grid’5000) and one of Raspberry Pi 3+. Experiments have been conducted on three different blockchain systems (Ethereum Clique/Ethash and Hyperledger Fabric) to measure their CPU consumption and energy footprint for different numbers of clients.
In 10, we deal with the problem of scheduling multiuser scientific workflows with unpredictable random arrivals and uncertain task execution times in a Cloud environment from the Cloud provider point of view. The solution consists in a deadline sensitive online algorithm, named NEARDEADLINE, that optimizes two metrics: the energy consumption and the fairness between users. Scheduling workflows in a private Cloud environment is a difficult optimization problem as capacity constraints must be fulfilled additionally to dependencies constraints between tasks of the workflows. Furthermore, NEARDEADLINE is built upon a new workflow execution platform. As far as we know no existing work tries to combine both energy consumption and fairness metrics in their optimization problem. The experiments conducted on a real infrastructure (clusters of Grid'5000) demonstrate that the NEARDEADLINE algorithm offers real benefits in reducing energy consumption, and enhancing user fairness.
In 18, we propose a set of extensions to a popular routing protocol for Industial IoT devices to improve the energy consumption and reduce the waiting time for devices joining the network. The IPv6 Routing Protocol for Low-Power and Lossy Networks (RPL) is the de facto routing protocol for Low Power and Lossy Networks (LLNs). It is a proactive and link-layer agnostic routing protocol standardized as RFC 6550 by the Internet Engineering Task Force (IETF). Based on the distance-vector technique, RPL builds a Destination Oriented Directed Acyclic Graph (DODAG) topology. To establish and maintain the routes, RPL uses DODAG Information Object (DIO) control packets, that are transmitted in broadcast, for RPL nodes to propagate the DODAG related information, while its transmission frequency depends on Trickle timer algorithm, i.e., the less stable the network the more DIOs are transmitted. Thus, when a new node intends to join the RPL DODAG, it listens for a DIO message from nearby nodes, which may take a very long time if the network is in a stable state. Therefore, RFC 6550 is equipped with the DODAG Informational Solicitation (DIS) message to solicit DIOs from nearby RPL nodes, similar to the Router Solicitation in IPv6 Neighbor Discovery. However, the solicitation procedure is not the most efficient one, since it resets the Trickle timers in the nodes that receive the DIS message and, thus, they transmit an unnecessarily large number of DIOs that congest the network and consume energy in the nodes. In this paper, we propose to augment RFC 6550, the RPL routing protocol, with additional DIS flags and options that allow a RPL node to better control how the nearby RPL nodes will respond to its solicitation for DIOs. Our performance evaluation in Contiki-NG & COOJA demonstrates that we can reduce the control packets in the network by up to 45.5%.
In 20, we propose and evaluate three different ways of implementing multi-path routing in Industial IoT devices to improve reliability, latency and energy consumption. The IPv6 Routing Protocol for Low-Power and Lossy Networks (RPL) networks is designed for Internet of Things (IoT) networks to generate routes between devices with minimal processing. This protocol creates a DODAG (Destination Oriented Directed Acyclic Graph) network topology through the use of DODAG Information Object (DIO) control packets. The DODAG routes the data packets upstream to the destination device. In order to obtain a reliable network, we implement Packet Replication and Elimination (PRE) to perform multi-path data transmission via multiple parent devices. However, there is no standard way to select an alternative path. This document presents three types of Alternative Parent (AP) selection following a braided model. We focus on analyzing its performance in terms of delay and compromise between network traffic and reliability.
Finally, Yewan Wang defended in March her PhD 23 on the evaluation and the modelisation of the energy impacts of data centers, in terms of hardware / software architecture and associated environment. The main contribution is a global power modeling to estimate the overall energy consumption of a cluster using informations such as ambient temperature, cooling system configurations and server loads.
8.4 Security and Privacy
Participants: Fatima Zahra Boujdad, Wilmer Edicson Garzon Alfonso, Sirine Sayadi, Mario Südholt.
This year the team has continued work on the protection of sensitive data and the security of distributed biomedical analyses in the context of the PhD theses of Fatima-zahra Boujdad (in cooperation with partners informaticiens and geneticiens working at INSERM laboratories in Brest), Wilmer Garzon (in cooperation with a Colombian technical university) and Sirine Sayadi (in cooperation with a medical group from the university hospital Nantes).
We have provided solutions to the problem of preserving privacy of sensitive data in distributed biomedical analyses, notably in the field of precision medicine, that is, analyses and treatments tailored to small groups of patients or even individual patients. 16
An important aspect of precision medicine consists in patient-centered contextualization analyses that are used as part of biomedical interactive tools. Such analyses often harness data of large populations of patients from different research centers and can often benefit from a distributed implementation. However, performance and the security/privacy concerns of sharing sensitive biomedical data can become a major issue.
We have investigated these issues in the context of the Kidney Transplantation Application (KITAPP). We presented a motivation for distributed implementations in this context, notably for computing percentiles for contextualization. We motivated privacy and performance issues and presented a novel system architecture as well as a distributed implementation to tackle them. Its evaluation in a realistic multi-site environment has shown that our approach reduces data sharing to a large extent, and thus enables imposing strong data privacy guarantees on distributed biomedical analyses.
9 Bilateral contracts and grants with industry
9.1 Bilateral contracts with industry
Participants: Ronan-Alexandre Cherrueau, Marie Delavergne, Adrien Lebre, Karim Manaouil, Matthieu Simonin.Following the ENOS bilateral contract (“Contrat de Recherche Externalisé”) between Orange and Inria (Sep 2017-Oct 2018), we agreed with Orange Labs to pursue this collaboration around a second contrat. This new contrat, which is going to last 18 months for a budget of 150K€, targets the following objectives:
- Strengthen the Enos framework and the resulting EnosLib solution (see Section 6.4 and Section 6.5).
- Define an experimental protocol allowing the automatized and reproducible evaluation of resource management/orchestrations systems in a WANWide context.
- Develop a DSL to reify location aspects at the CLI level in order to create new resources (image, VM, etc.) through a set of OpenStack instances while guaranteeing a notion of master copy.
10 Partnerships and cooperations
10.1 International initiatives
10.1.1 Inria International Labs
Associate Teams involved in the Inria International Lab:
- Title: Accelerating the Performance of Multi-Site Scientific applications through Coordinated Data management
- Duration: 2019 - 2021
- Coordinator: Shadi Ibrahim
- Scientific Data Management Group, Lawrence Berkeley National Laboratory (United States)
- Inria contact: Shadi Ibrahim
- Summary: Advances in computing, experimental, and observational facilities are enabling scientists to generate and analyze unprecedented volumes of data. A critical challenge facing scientists in this era of data deluge is storing, moving, sharing, retrieving, and gaining insight from massive collections of data efficiently. Existing data management and I/O solutions on high-performance computing (HPC) systems require significant enhancements to handle the three V’s of Big Data (volume, velocity, and variety) in order to improve productivity of scientists. Even more challenging, many scientific Big Data and machine learning applications require data to be shared, exchanged, and transferred among multiple HPC sites. Towards overcoming these challenges, in this project, we aim at accelerating scientific Big Data application performance through coordinated data management that addresses performance limitations of managing data across multiple sites. In particular, we focus on challenges related to the management of data and metadata across sites, distributed burst buffers, and online data analysis across sites.
10.2 International research visitors
10.2.1 Visits of international scientists
- Twinkle Jain: PhD student at Northeastern university visited the STACK team from February to August 2020.
10.3 European initiatives
10.3.1 FP7 & H2020 Projects
Participants: Hélène Coullon, Jolan Philippe.
Hélène Coullon is a member of the advisory board of the Lowcomote ITN project (H2020), on the subject of low code platforms. In particular, Hélène brings her expertise in distributed and high performance computing applied to model-driven engineering. She supervises Jolan Philippe, one PhD student of the project and a member of the team.
Participants: Adrien Lebre.
Adrien Lebre is a member of the SLICES Design Study project (ESFRI) that targets a Europe-wide test-platform designed to support large-scale, experimental research. It will provide advanced compute, storage and network components, interconnected by dedicated high-speed links. Pushing forward, the project’s main goal is to strengthen the research excellence and innovation capacity of European researchers and scientists in the design and operation of digital infrastructures.
10.4 National initiatives
Participants: Brice Nédelec, Thomas Ledoux.The Green Label for Microservices Architecture (GL4MA) project aims to design and develop a technological platform (tools, framework, dedicated languages) for the self management of eco-responsible micro-service architectures for the Cloud. The experiments will be carried out through case studies provided by Sigma Informatique and the presence of renewable energy will initially be simulated. At the end of the project, the technological platform will be deployed as part of the CPER SeDuCe platform.
This project is founded by the Ademe (call Perfecto) running for 18 months (starting in September 2019) with an allocated budget of 116 480€ (the majority of aid is dedicated to the R&D engineer salary).
10.4.2 CominLabs laboratory of excellence
Participants: Remy Pottier, Jean-Marc Menaud.The Kabuto project aims to develop a software solution allowing the reservation and deployment IoT and Edge devices to HPC nodes. Strated in November 2020 for one year, Kabuto finances a post doc over one year (70 K€).
Participants: Adrien Lebre, Alexandre Van Kempen.
The GRECO project (Resource manager for cloud of Things) is an ANR project (ANR-16-CE25-0016) running for 42 months (starting in January 2017 with an allocated budget of 522K€, 90K€ for STACK).
The consortium is composed of 4 partners: Qarnot Computing (coordinator) and 3 academic research group (DATAMOVE and AMA from the LIG in Grenoble and STACK from Inria Rennes Bretagne Atlantique).
The goal of the GRECO project 18 is to design a manager for cloud of things. The manager should act at the IaaS, PaaS and SaaS layer of the cloud. To move forward to this objective, we have been designing a simulator to innovate in designing scheduling and data management systems. This simular leverage the Simgrid/PyBATSIM solution
Participants: Shadi Ibrahim.
The KerStream project (Big Data Processing: Beyond Hadoop!) is an ANR JCJC (Young Researcher) project (ANR-16-CE25-0014-1) running for 48 months (starting in January 2017 with an allocated budget of 238K€).
The goal of the KerStream project is to address the limitations of Hadoop when running Big Data stream applications on large-scale clouds and do a step beyond Hadoop by proposing a new approach, called KerStream, for scalable and resilient Big Data stream processing on clouds. The KerStream project can be seen as the first step towards developing the first French middleware that handles Stream Data processing at Scale.
Participants: Emile Cadorel, Dimitri Saingre, Rémy Pottier, Hélène Coullon, Jean-Marc Menaud.
The HYDDA project aims to develop a software solution allowing the deployment of Big Data applications (with hybrid design (HPC/CLoud)) on heterogeneous platforms (cluster, Grid, private Cloud) and orchestrators (Task scheduler like Slurm, Virtual orchestrator like Nova for OpenStack or Swarm for Docker). The main questions we are investigating are :
- How to propose an easy-to-use service to host (from deployment to elimination) application components that are both typed Cloud and HPC?
- How to propose a service that unifies the HPCaaS (HPC as a service) and the Infrastructure as a Service (IaaS) in order to offer resources on demand and to take into account the specificities of scientific applications?
- How to optimize resources usage of these platforms (CPU, RAM, Disk, Energy, etc.) in order to propose solutions at the least cost?
Started in 2017, (with an allocated budget of 4000K€, 380K€ for STACK), the project ended in October 2020.
Participants: Adrien Lebre, Jean-Marc Menaud, Jonathan Pastor.
The SeDuCe project (Sustainable Data Centers: Bring Sun, Wind and Cloud Back Together), aims to design an experimental infrastructure dedicated to the study of data centers with low energy footprint. This innovative data center will be the first experimental data center in the world for studying the energy impact of cloud computing and the contribution of renewable energy (solar panels, wind turbines) from the scientific, technological and economic viewpoints. This project is integrated in the national context of grid computing (Grid'5000), and the Constellation project, which will be an inter-node (Pays de la Loire, Brittany).
10.4.7 Connect Talent
Apollo (Connect Talent)
Participants: Shadi Ibrahim.
The Apollo project (Fast, efficient and privacy-aware Workflow executions in massively distributed Data-centers) is an individual research project “Connect Talent” running for 36 months (starting in November 2017 with an allocated budget of 201K€).
The goal of the Apollo project is to investigate novel scheduling policies and mechanisms for fast, efficient and privacy-aware data-intensive workflow executions in massively distributed data-centers.
10.4.8 Etoiles Montantes
Participants: Emile Cadorel, Hélène Coullon, Dimitri Pertin, Simon Robillard, Charlène Servantie.
VeRDi is an acronym for Verified Reconfiguration Driven by execution. The VeRDi project is funded by the French region Pays De La Loire where Nantes is located. The project starts in November 2018 and ends on June 2021 (extended due to Covid-19) with an allocated budget of 172800€.
It aims at addressing distributed software reconfiguration in an efficient and verified way. The aim of the VeRDi project is to build an argued disruptive view of the problem. To do so, we want to validate the work already performed on the deployment in the team and extend it to reconfiguration.
10.5 Regional initiatives
Participants: Jean-Marc Menaud, Mario Südholt.
The SysMics project aims at federating the NExT scientific community toward a common objective: anticipate the emergence of medicine systems medicine by co-developing three approaches in population-scale genomics: genotyping by sequencing, cell-by-cell profiling and microbiome analysis. STACK investigates new means for secure and privacy-aware computations in the context of personalized medecine, notably genetic analyses.
This project is financed by the Nantes excellency initiative in Medecine and Informatics (NExT) with a global enveloppe of 150 K€ (10 K€ for our team) from 2018-22.
Participants: Mario Südholt, Sirine Sayadi.
The SHLARC project is an international network involving more than 20 partners from more than 15 countries located on four continents. The network aims at improving HLA imputation techniques in the domain of immunobiology, notably by investigation better computational methods for the correspoding biomedical analyses.
The ambition of the SHLARC is to bring together international expertise to solve essential questions on immune-related pathologies through innovative algorithms and powerful computation tool development. To achieve this goal, we determined 3 main objectives
- Data. By bringing together scientists from around the world, we will collectively increase the amount of SNP+HLA data available, both in terms of quantity and diversity.
- Applied mathematical and computer sciences. We will further optimize SNP-HLA imputation methods using the attribute-bagging HIBAG tool, and particularly for genetically diverse and admixed populations.
- Accessibility and service to the scientific community. Following the Haplotype Reference Consortium (HRC) initiative, the network envisions building a free, user-friendly webserver where researchers can access improved imputation protocols by simply uploading their data and obtaining the best possible HLA imputation for their dataset.
In this context, the STACK team is working on improved analysis techniques that harness distributed infrastructures.
This project is financed by the Nantes excellency initiative in Medecine and Informatics (NExT) from 2019-22 with a global enveloppe of 100 K€ (and 5 K€ for our team)
Participants: Mario Südholt.
The ONCOSHARe project (ONCOlogy big data SHAring for Research) will demonstrate, through a multidisciplinary cooperation within the Western CANCEROPOLE network, the feasibility and the added value of a Cancer Patient Centered Information Common for in-silico research. The STACK team will work on challenges to the security and the privacy of user data in this context.
This project is financed by three French regions from 2018-2021 with a global enveloppe of 150 K€ (and 10 K€ for our team).
11.1 Promoting scientific activities
11.1.1 Scientific events: organisation
- A. Lebre co-organized the GdR RSD yearly meeting (100 participants), Nantes, January 2020.
Member of the organizing committees
- Shadi Ibrahim: workshop co-chair for 40th IEEE International Conference on Distributed Computing Systems (ICDCS 2020), November 29 - December 1, 2020, Singapore.
- Adrien Lebre is a member of the steering committee of the international conference of Fog and Edge Computing (ICFEC).
11.1.2 Scientific events: selection
Chair of conference program committees
- Hélène Coullon: track vice-chair of CCGrid 2020, track “Programming models and runtime systems”
- Shadi Ibrahim: program vice chair for the Architecture, Networking, Data Centers track of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2020), May 11 – 14, 2020.
- Shadi Ibrahim: track co-chair for the Data, Storage and Visualization Track of ACM HPCAsia 2020, Fukuoka, Japan, January, 2020.
Member of the conference program committees
- Hélène Coullon: Euro-Par 2020, ICCS 2020, ISCC 2020
- Shadi Ibrahim: SC 2020 (regular paper and poster tracks), ICDCS 2020, HiPC 2020, ICAPDS 2020, ICA3PP 2020, HPDBC@IPDPS 2020, COMPAS 2020, HPSC 2020, BDPS@ICDCS 2020.
- Adrien Lebre: ICC 2020, Europar 2020, IC2E 2020, ICDCS 2020, CloudCom 2020, UCC 2020.
- Jean-Marc Menaud: SMARTGREENS'20, SOFTCOM 20.
Member of the editorial boards
- Shadi Ibrahim: Young Associate Editor of the Springer Frontiers of Computer Science journal.
- Adrien Lebre: Associate Editor of the IEEE Transactions on Cloud Computing.
Reviewer - reviewing activities
- Hélène Coullon: Transactions on Cloud Computing, Transactions on Parallel and Distributed Systems
- Shadi Ibrahim: IEEE Transactions on Computers, IEEE Transactions on Network and Service Management, ACM Transactions on Modeling and Performance Evaluation of Computing Systems.
- Remous Aris Koutsiamanis: IEEE Transactions on Cloud Computing, IEEE Internet of Things Journal, Springer Wireless Networks.
- Jacques Noyé: PeerJ Computer Science.
11.1.4 Invited talks
- T. Ledoux has given a talk to the Technoférence Pôle Images & réseaux « Voyage dans le XaaS », June 2020.
11.1.5 Scientific expertise
- T. Ledoux contributed for the Allistene alliance to the national inter-alliances seminar "Energie décarbonée, changement climatique, santé environnementale et biodiversité : les impacts de nos choix sur les nouvelles voies de recherche interdisciplinaires", Dec. 2020.
- A. Lebre contributed to the Allistene working group dedicated to the definition of the Infrastructure roadmap (“Instrument de recherche”), Dec 2020.
11.1.6 Research administration
- A. Lebre has been co-chairing of the transversal action “Virtualization” of the GdR RSD between 2015 and November 2020.
- A. Lebre has been a member of the executive committee of the GDR CNRS RSD “Réseau et Système distribué” and co-leader of the transversal action Virtualization and Clouds of this GDR between 2015 and November 2020.
- A. Lebre is a member of the executive and architect committees of the Grid’5000 GIS (Groupement d’intérêt scientifique).
- T. Ledoux has been vice-head of the cross-cutting theme "Gestion de l’énergie et maîtrise des impacts environnementaux" at LS2N
- Jean-Marc Menaud has been head of the scientific theme "Science du logiciel et systèmes distribués" at LS2N
11.1.7 Member of the Selection Committee for Associate Professors
- H. Coullon was a member of a Selection Committee (CS) at the University of Rennes 1
- H. Coullon was a member of a Selection Commitee (CS) at the University Nice Sophia Antipolis
- T. Ledoux was member of a Selection Committee (CS) at INSA Rennes
11.2 Teaching - Supervision - Juries
- S. Ibrahim is the co-coordinator of the international Master's program in Cloud Computing and Services at University of Rennes 1.
- T. Ledoux is the head of the apprenticeship program in
Software Engineering FIL
www. imt-atlantique. fr/ formation/ ingenieur-par-apprentissage/ ingenieurs-specialite-ingenierie-logicielle). This 3-year program leads to the award of a Master degree in Software Engineering from the IMT Atlantique.
- H. Coullon is responsible for the Computer Science domain of the new apprenticeship program in Industry 4.0 (FIT) of IMT Atlantique. This 3-year program leads to the award of a Master degree in Industry 4.0 from the IMT Atlantique.
- Phd: Marie Delavergne, director: A. Lebre.
- Phd: David Espinel, director: A. Lebre.
- PhD: Dimitri Saingre, advisor: T. Ledoux, director: J-M. Menaud.
- PhD: Maverick Chardet, advisor: H. Coullon, director: C. Perez (Avalon).
- PhD: Emile Cadorel, advisor: H. Coullon, director: J-M. Menaud.
- PhD: Jolan Philippe, adviros: H. Coullon, M. Tisi (NaoMod), director: G. Sunye (NaoMod).
- PhD: Yewan Wang, advisor: J.-M. Menaud.
- PhD: Maxime Belair, advisor: J.-M. Menaud.
- PhD: Fatima-zahra Boujdad, advisor: M. Südholt.
- PhD: Wilmer Garzon, advisor: M. Südholt.
- PhD: Sirine Sayadi, advisor: M. Südholt.
- Postdoc: David Guyon, advisor: S. Ibrahim, until September 2020.
- Postdoc: Thomas Lambert, advisor: S. Ibrahim, until October 2020.
- Postdoc: Jonathan Pastor, advisor: J.-M. Menaud, until June 2020?
- Postdoc: Rémy Pottier, advisor: J.-M. Menaud.
- Postdoc: Simon Robillard, advisor: H. Coullon.
- Postdoc: Hamza Sahli, advisor: T. Ledoux (until April 2020).
- Postdoc: Alexandre Van Kempen: A. Lebre (until November 2020).
- Engineer: Ronan-Alexandre Cherrueau, advisor: A. Lebre.
- Engineer: Brice Nédelec, advisor: T. Ledoux.
- Engineer: Charlène Servantie, advisor: H. Coullon (until June 2020).
- Engineer: Jad Darrous, advisor: S. Ibrahim. (until Sept 2020).
- H. Coullon was a member of the PhD committee of Salwa Swaf, "Formal Methods meet Security in a Cost Aware Cloud Brokerage Solution", Univ. Orléans, Dec. 15, 2020.
- S. Ibrahim was a reviewer of the PhD thesis of Muhammad Abdullah "Performance-aware Cloud Resource Management for Microservices-based Applications". Punjab University, Pakistan, Sep, 2020.
- S. Ibrahim was was a member of the PhD committee of Arif Ahmed "Efficient Cloud Application Deployment in Distributed Fog Infrastructures". Universite de Rennes 1, France, Jan. 20, 2020.
- A. Lebre was a member of the Phd Committee of Adrien Faure, "Advanced Simulation for Resource Management", Grenoble Alpes University, Dec 2, 2020.
- A. Lebre was a reviewer of the Phd Committee of Arthur Chevalier, "Optimisation du placement des licences logicielles dans le Cloud pour un déploiement économique et efficient", Ecole Normale Supérieure de Lyon, Nov, 24, 2020.
- A. Lebre was a reviewer of the Phd Committee of Bruno Donassolo, "Orchestration des applications IoT dans le Fog", Grenoble Alpes University, Nov, 4, 2020.
- A. Lebre was a member of the Phd Committee of Jean Emile Dartois, "Leveraging Cloud unused heterogeneous resources for applications with SLA guarantees", Rennes 1 University, Sept 4, 2020.
- A. Lebre was a member of the Phd Committee of Gregoire Todeschi, "Optimisation des caches de fichiers dans les environnements virtualisés", Institut National Polytechnique de Toulouse (France), June 8, 2020.
- T. Ledoux was a member of the PhD committee of Francis Laniel, “Vers une consolidation mémoire pour les conteneurs grâce à un retour applicatif”, Sorbonne Univ., Nov. 9, 2020.
- T. Ledoux was a reviewer of the PhD committee of Neil Ayeb, “Autonomic and decentralized device management for the Internet of Things”, Univ. Grenoble Alpes, Nov. 25, 2020.
- J.-M. Menaud was a reviewer of the PhD committee of Mathieu Bacou “Performance et gestion de ressources dans un cloud multi-virtualisé” Univ Toulouse, May 12, 2020.
- J.-M. Menaud was a reviewer of the PhD committee of Marc Platini `Apprentissage machine appliqué à l’analyse et à la prédiction des défaillances dans les systèmes HPC”, Univ Grenoble, May 20, 2020.
11.3.1 Internal or external Inria responsibilities
- A. Lebre is a co-director f the <I/O> Lab, a joint lab between Inria and Orange Labs.
- A. Lebre is a member of the scientific committe of the joint lab between Inria and Nokia Bell Labs.
11.3.2 Articles and contents
- Adrien Lebre is co-author of the second white paper of the Edge Computing working group of the OpenStack foundation 61
- T.Ledoux is head of the "Filière informatique nantaise" since Sept. 2020. This entity, created by the University of Nantes, the Ecole Centrale de Nantes and IMT Atlantique, aims to bring together the main players in IT training in Nantes to ensure a coherent and ambitious training offer that meets the present and future challenges of the IT discipline. It is organised around a Council made up of representatives from the academic and socio-economic worlds.
12 Scientific production
12.1 Major publications
- 1 article'An overview of service placement problem in Fog and Edge Computing'.ACM Computing Surveys533June 2020, Article 65, 35 pages
- 2 inproceedings 'Online Multi-User Workflow Scheduling Algorithm for Fairness and Energy Optimization'. CCGrid2020 : 20th International Symposium on Cluster, Cloud and Internet Computing Melbourne, Australia November 2020
- 3 inproceedings 'Predictable Efficiency for Reconfiguration of Service-Oriented Systems with Concerto'. CCGrid 2020 : 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing Melbourne, Australia IEEE May 2020
- 4 inproceedings'Multi-site Connectivity for Edge Infrastructures DIMINET:DIstributed Module for Inter-site NETworking'.CCGRID 2020 - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet ComputingIEEE and The University of MelbourneMelbourne, AustraliaMay 2020, 1-10
12.2 Publications of the year
International peer-reviewed conferences
Conferences without proceedings
Scientific book chapters
Doctoral dissertations and habilitation theses
Reports & preprints
12.3 Cited publications
- 29 misc'Akamai Cloudlets'.(Accessed: 2018-03-08)2018, URL: http://cloudlets.akamai.com
- 30 misc'Amazon Lambda@Edge'.(Accessed: 2018-03-08)2018, URL: https://aws.amazon.com/lambda/edge/
- 31 article'Facilitating Greener IT through Green Specifications'.IEEE Software313May 2014, 56-63URL: http://dx.doi.org/10.1109/MS.2014.19
- 32 article'GCM: a grid extension to Fractal for autonomous distributed components'.annals of telecommunications641-22009, 5-24URL: http://dx.doi.org/10.1007/s12243-008-0068-8
- 33 article 'Programming distributed and adaptable autonomous components--the GCM/ProActive framework'. Software: Practice and Experience May 2014
- 34 inproceedings 'Side-Channels Beyond the Cloud Edge : New Isolation Threats and Solutions'. IEEE International Conference on Cyber Security in Networking (CSNet) 2017 Rio de Janeiro, Brazil October 2017
- 35 article'Component-based architecture: the Fractal initiative'.Annals of telecommunications641February 2009, 1--4URL: https://doi.org/10.1007/s12243-009-0086-1
- 36 inproceedings'Automatic Exploration of Datacenter Performance Regimes'.Proceedings of the 1st Workshop on Automated Control for Datacenters and CloudsACDC '09New York, NY, USABarcelona, SpainACM2009, 1--6URL: http://doi.acm.org/10.1145/1555271.1555273
- 37 inproceedings'Fog computing and its role in the internet of things'.Proceedings of the first edition of the MCC workshop on Mobile cloud computingACM2012, 13--16
- 38 inproceedings'Constructive Privacy for Shared Genetic Data'.CLOSER 2018 - 8th International Conference on Cloud Computing and Services ScienceProceedings of CLOSER 2018Funchal, Madeira, PortugalMarch 2018, 1-8
- 39 inproceedings'CPL: A Core Language for Cloud Computing'.2016, 94--105URL: http://doi.acm.org/10.1145/2889443.2889452
- 40 inproceedings'Fogbow: A Middleware for the Federation of IaaS Clouds'.The 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)IEEE2016, 531--534
- 41 inproceedings'A Model-based Architecture for Autonomic and Heterogeneous Cloud Systems'.CLOSER 2018 - 8h International Conference on Cloud Computing and Services Science1Best Paper AwardFunchal, PortugalMarch 2018, 201-212
- 42 inproceedings'An Open Component Model and Its Support in Java'.Component-Based Software EngineeringBerlin, HeidelbergSpringer Berlin Heidelberg2004, 7--22
- 44 inproceedings'Towards Hierarchical Autonomous Control for Elastic Data Stream Processing in the Fog'.Euro-Par 2017: Parallel Processing Workshops: Euro-Par 2017 International Workshops, Santiago de Compostela, Spain, August 28-29, 2017, Revised Selected Papers10659Springer2018, 106-117
- 45 inbook'Combined Encryption and Watermarking Approaches for Scalable Multimedia Coding'.Advances in Multimedia Information Processing - PCM 2004: 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30 - December 3, 2004. Proceedings, Part IIIK. AizawaY. NakamuraS. SatohBerlin, HeidelbergSpringer Berlin Heidelberg2005, 356--363
- 46 inproceedings 'A Language for the Composition of Privacy-Enforcement Techniques'. IEEE RATSP 2015, The 2015 IEEE International Symposium on Recent Advances of Trust, Security and Privacy in Computing and Communications Helsinki, Finland August 2015
- 47 inproceedings 'Edge Computing Resource Management System: a Critical Building Block! Initiating the debate via OpenStack'. The USENIX Workshop on Hot Topics in Edge Computing (HotEdge'18) july 2018
- 48 inproceedings'Exploring Energy-Consistency Trade-Offs in Cassandra Cloud Storage System'.27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)October 2015, 146-153
- 49 inproceedings 'An Object Store Service for a Fog/Edge Computing Infrastructure based on IPFS and Scale-out NAS'. 1st IEEE International Conference on Fog and Edge Computing-ICFEC’2017 2017
- 50 article'Spanner: Google’s globally distributed database'.ACM Transactions on Computer Systems (TOCS)3132013, 8
- 51 article'Aeolus: A component model for the cloud'.Information and Computation239Supplement C2014, 100--121URL: http://www.sciencedirect.com/science/article/pii/S0890540114001424
- 52 article'Extensibility and Composability of a Multi-Stencil Domain Specific Framework'.International Journal of Parallel ProgrammingNovember 2017, URL: https://doi.org/10.1007/s10766-017-0539-5
- 53 inproceedings 'Virtual Machine Placement for Hybrid Cloud using Constraint Programming'. ICPADS 2017 2017
- 54 inproceedings 'Production Deployment Tools for IaaSes: an Overall Model and Survey'. IEEE International Conference on Future Internet of Things and Cloud (FiCloud) 2017 Prague, Czech Republic August 2017
- 55 inproceedings 'Comparative Experimental Analysis of the Quality-of-Service and Energy-Efficiency of VMs and Containers' Consolidation for Cloud Applications'. SoftCOM: International Conference on Software, Telecommunications and Computer Networks 2017
- 56 article'FPath and FScript: Language support for navigation and reliable reconfiguration of Fractal architectures'.annals of telecommunications - annales des télécommunications641February 2009, 45--63URL: https://doi.org/10.1007/s12243-008-0073-y
- 57 inproceedings'A framework for the coordination of multiple autonomic managers in cloud environments'.Self-Adaptive and Self-Organizing Systems (SASO), 2013 IEEE 7th International Conference onIEEE2013, 179--188
- 58 inproceedings'An updated performance comparison of virtual machines and linux containers'.Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium OnIEEE2015, 171--172
- 59 inproceedings'Deploying on the Grid with DeployWare'.Eighth IEEE International Symposium on Cluster Computing and the GridFranceMay 2008, 177-184
- 60 misc'Cloud Edge Computing: Beyond the Data Center (White Paper)'.(Accessed: 2020-02-08)January 2018, URL: https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf
- 61 misc'Edge Computing: Next Steps in Architecture, Design and Testing)'.(Accessed: 2020-02-08)January 2020, URL: https://www.openstack.org/edge-computing/edge-computing-next-steps-in-architecture-design-and-testing
- 62 misc'Open Network Automation Platform'.(Accessed: 2018-03-08)2018, URL: https://www.onap.org/
- 63 article'Edge-centric Computing: Vision and Challenges'.SIGCOMM Comput. Commun. Rev.455September 2015, 37--42URL: http://doi.acm.org/10.1145/2831347.2831354
- 64 article'Parasol and GreenSwitch: Managing Datacenters Powered by Renewable Energy'.SIGARCH Comput. Archit. News411March 2013, 51--64
- 65 inproceedings'GreenHadoop: Leveraging Green Energy in Data-processing Frameworks'.Proceedings of the 7th ACM European Conference on Computer SystemsEuroSys '12New York, NY, USABern, SwitzerlandACM2012, 57--70URL: http://doi.acm.org/10.1145/2168836.2168843
- 66 inproceedings 'Geographical Load Balancing for Online Service Applications in Distributed Datacenters'. in IEEE international conference on cloud computing (CLOUD 2013 2013
- 67 article'Optimal Task Placement with QoS Constraints in Geo-Distributed Data Centers Using DVFS'.IEEE Transactions on Computers647July 2015, 2049-2059URL: http://dx.doi.org/10.1109/TC.2014.2349510
- 68 inproceedings'You Can Teach Elephants to Dance: Agile VM Handoff for Edge Computing'.Proceedings of the Second ACM/IEEE Symposium on Edge ComputingSEC '17New York, NY, USASan Jose, CaliforniaACM2017, URL: http://doi.acm.org/10.1145/3132211.3134453
- 69 article'Network function virtualization: Challenges and opportunities for innovations'.IEEE Communications Magazine532February 2015, 90-97URL: http://dx.doi.org/10.1109/MCOM.2015.7045396
- 70 article'Investigating Energy Consumption and Performance Trade-Off for Interactive Cloud Application'.IEEE Transactions on Sustainable Computing22April 2017, 113-126URL: http://dx.doi.org/10.1109/TSUSC.2017.2714959
- 71 article'Btrplace: A flexible consolidation manager for highly available applications'.IEEE Transactions on dependable and Secure Computing1052013, 273--286
- 72 inproceedings'Cluster-wide Context Switch of Virtualized Jobs'.Proceedings of the Virtualization Technologies in Distributed Computing Workshop (co-locaed with ACM HPDC'10)HPDC '10New York, NY, USAChicago, IllinoisACM2010, 658--666URL: http://doi.acm.org/10.1145/1851476.1851574
- 73 inproceedings'Entropy: a consolidation manager for clusters'.Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environmentsACM2009, 41--50
- 74 article'SPL: An Extensible Language for Distributed Stream Processing'.toplas391March 2017, URL: http://doi.acm.org/10.1145/3039207
- 75 inproceedings'Towards Pay-As-You-Consume Cloud Computing'.2011 IEEE International Conference on Services ComputingJuly 2011, 370-377URL: http://dx.doi.org/10.1109/SCC.2011.38
- 76 article'GenInfoGuard: A Robust and Distortion-Free Watermarking Technique for Genetic Data'.PLOS ONE102February 2015, 1-22URL: https://doi.org/10.1371/journal.pone.0117717
- 77 article'Resource Management in Clouds: Survey and Research Challenges'.Journal of Network and Systems Management233July 2015, 567--619URL: https://doi.org/10.1007/s10922-014-9307-7
- 78 inproceedings'Energy cost optimization for geographically distributed heterogeneous data centers'.2015 Sixth International Green and Sustainable Computing Conference (IGSC)December 2015, 1-6URL: http://dx.doi.org/10.1109/IGCC.2015.7393677
- 79 book 'Learning Spark'. O'Reilly Media February 2015
- 80 article'The evolving philosophers problem: dynamic change management'.IEEE Transactions on Software Engineering1611November 1990, 1293-1306URL: http://dx.doi.org/10.1109/32.60317
- 81 inproceedings'Revising OpenStack to Operate Fog/Edge Computing Infrastructures'.The IEEE International Conference on Cloud Engineering (IC2E)April 2017, 138-148URL: http://dx.doi.org/10.1109/IC2E.2017.35
- 82 inproceedings'Pregel: A System for Large-scale Graph Processing'.Proceedings of the 2010 ACM SIGMOD International Conference on Management of DataSIGMOD '10New York, NY, USAIndianapolis, Indiana, USAACM2010, 135--146URL: http://doi.acm.org/10.1145/1807167.1807184
- 83 inproceedings'Virtual Machine Boot Time Model'.Parallel, Distributed and Network-based Processing (PDP), 2017 25th Euromicro International Conference onIEEE2017, 430--437
- 84 inproceedings'Load-based covert channels between Xen virtual machines'.Proceedings of the 2010 ACM Symposium on Applied ComputingACM2010, 173--180
- 85 inproceedings'Runtime Software Adaptation: Framework, Approaches, and Styles'.Companion of the 30th International Conference on Software EngineeringICSE Companion '08New York, NY, USALeipzig, GermanyACM2008, 899--910URL: http://doi.acm.org/10.1145/1370175.1370181
- 86 inproceedings'On Understanding the Energy Impact of Speculative Execution in Hadoop'.2015 IEEE International Conference on Data Science and Data Intensive SystemsDecember 2015, 396-403
- 87 article'Cooperative and reactive scheduling in large-scale virtualized platforms with DVMS'.Concurrency and Computation: Practice and Experience25122013, 1643--1655
- 88 article'Mobile-Edge Computing Architecture: The role of MEC in the Internet of Things'.IEEE Consumer Electronics Magazine54October 2016, 84-91URL: http://dx.doi.org/10.1109/MCE.2016.2590118
- 89 article'SLA guarantees for cloud services'.Future Generation Computer Systems54Supplement C2016, 233--246URL: http://www.sciencedirect.com/science/article/pii/S0167739X15000801
- 90 article'Exploiting Geo-Distributed Clouds for a E-Health Monitoring System With Minimum Service Delay and Privacy Preservation'.IEEE Journal of Biomedical and Health Informatics182March 2014, 430-439URL: http://dx.doi.org/10.1109/JBHI.2013.2292829
- 91 article'Energy-Efficient Data Centers'.IEEE Internet Computing2142017, 6-7URL: http://dx.doi.org/10.1109/MIC.2017.2911429
- 92 inproceedings'Components as Location Graphs'.Formal Aspects of Component Software - 11th International Symposium, FACS 2014, Bertinoro, Italy, September 10-12, 2014, Revised Selected Papers2014, 3--23URL: https://doi.org/10.1007/978-3-319-15317-9_1
- 93 book 'Component Software: Beyond Object-Oriented Programming'. Boston, MA, USA Addison-Wesley Longman Publishing Co., Inc. 2002
- 94 inproceedings'Characterizing Performance and Energy-Efficiency of the RAMCloud Storage System'.2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)June 2017, 1488-1498URL: http://dx.doi.org/10.1109/ICDCS.2017.51
- 95 inproceedings'Data Warehousing and Analytics Infrastructure at Facebook'.Proceedings of the 2010 ACM SIGMOD International Conference on Management of DataSIGMOD '10Indianapolis, Indiana, USAACM Press2010, 1013--1020URL: http://doi.acm.org/10.1145/1807167.1807278
- 96 book 'Hadoop: The Definitive Guide'. O'Reilly Media April 2015
- 97 inproceedings'Opportunities and challenges for data center demand response'.International Green Computing ConferenceNovember 2014, 1-10URL: http://dx.doi.org/10.1109/IGCC.2014.7039172
- 98 inproceedings'Security implications of memory deduplication in a virtualized environment'.2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)IEEE2013, 1--12
- 99 article'Managing performance overhead of virtual machines in cloud computing: A survey, state of the art, and future directions'.Proceedings of the IEEE10212014, 11--31
- 100 inproceedings'On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems'.2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)May 2016, 750-759URL: http://dx.doi.org/10.1109/IPDPS.2016.50
- 101 inproceedings'Eley: On the Effectiveness of Burst Buffers for Big Data Processing in HPC Systems'.2017 IEEE International Conference on Cluster Computing (CLUSTER)September 2017, 87-91URL: http://dx.doi.org/10.1109/CLUSTER.2017.73
- 102 inproceedings'Discretized Streams: Fault-tolerant Streaming Computation at Scale'.Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems PrinciplesSOSP '13New York, NY, USAFarminton, PennsylvaniaACM2013, 423--438URL: http://doi.acm.org/10.1145/2517349.2522737