## Section: New Results

### Efficient Queries and Compact Data Structures

#### Planar Spanner of geometric graphs

Participants : Nicolas Bonichon, Cyril Gavoille, Nicolas Hanusse, David Ilcinkas. k -graphs are geometric graphs that appear in the context of graph navigation. The shortest-path metric of these graphs is known to approximate the Euclidean complete graph up to a factor depending on the cone number k and the dimension of the space.

TD-Delaunay graphs, a.k.a. triangular-distance Delaunay triangulations introduced by Chew, have been shown to be plane 2-spanners of the 2D Euclidean complete graph, i.e., the distance in the TD-Delaunay graph between any two points is no more than twice the distance in the plane.

Orthogonal surfaces are geometric objects defined from independent sets of points of the Euclidean space. Orthogonal surfaces are well studied in combinatorics (orders, integer programming) and in algebra. From orthogonal surfaces, geometric graphs, called geodesic embeddings can be built.

We have introduce a specific subgraph of the 6 -graph defined in the 2D Euclidean space, namely the -graph, composed of the even-cone edges of the -graph. Our main contribution is to show that these graphs are exactly the TD-Delaunay graphs, and are strongly connected to the geodesic embeddings of orthogonal surfaces of coplanar points in the 3D Euclidean space.

Using these new bridges between these three fields, we establish:

• Every -graph is the union of two spanning TD-Delaunay graphs. In particular, 6 -graphs are 2-spanners of the Euclidean graph. It was not known that 6 -graphs are t -spanners for some constant t , and 7 -graphs were only known to be t -spanners for .

• Every plane triangulation is TD-Delaunay realizable, i.e., every combinatorial plane graph for which all its interior faces are triangles is the TD-Delaunay graph of some point set in the plane. Such realizability property does not hold for classical Delaunay triangulations.

In collaboration with Ljubomir Perković, we have also worked on the question of bounded degree planar spanner: what is the minimum such that there exists a planar spanner of degree at most for any point set? We have proposed an algorithm that computes a 6-spanner of degree at most 6. The best previous known bound on the maximum degree of planar spanner was 14   with a stretch factor of 3.53.

#### Compression and Short Data Structures

##### Routing with Short Tables

Participants : Cyril Gavoille, Nicolas Hanusse, David Ilcinkas.

There are several techniques to manage sub-linear size routing tables (in the number of nodes of the platform) while guaranteeing almost shortest paths (cf.   for a survey of routing techniques).

Some techniques provide routes of length at most 1 + times the length of the shortest one (which is the definition of a stretch factor of 1 + ) while maintaining a poly-logarithmic number of entries per routing table   ,  ,  . However, these techniques are not universal in the sense that they apply only on some class of underlying topologies. Universal schemes exist. Typically they achieve -entry local routing tables for a stretch factor of 3 in the worst case   ,  . Some experiments have shown that such methods, although universal, work very well in practice, in average, on realistic scale-free or existing topologies   .

While the fundamental question is to determine the best stretch-space trade-off for universal schemes, the challenge for platform routing would be to design specific schemes supporting reasonable dynamic changes in the topology or in the metric, at least for a limited class of relevant topologies. In this direction  have constructed (in polynomial time) network topologies for which nodes can be labeled once such that whatever the link weights vary in time, shortest path routing tables with compacity k can be designed, i.e., for each routing table the set of destinations using the same first outgoing edge can be grouped in at most k ranges of consecutive labels.

One other aspect of the problem would be to model a realistic typical platform topology. Natural parameters (or characteristic) for this are its low dimensionality: low Euclidean or near Euclidean networks, low growing dimension, or more generally, low doubling dimension.

In 2007, we have improved compact routing scheme for planar networks, and more generally for networks excluding a fixed minor   . This later family of networks includes (but is not rectrict to) networks embeddable on surfaces of bounded genus and networks of bounded treewidth. The stretch factor of our scheme is constant and the size of each routing table is only polylogarithmic (independently of the degree of the nodes), and the scheme does not require renaming (or a new addressing) of the nodes: it is name-independent. More importantly, the scheme can be constructed efficiently in polynomial time, and complexities do not hid large constant as we may encounter in Minor Graph Theory. This construction has been achieved by the design of new sparse cover for planar graphs, solving a problem open since STOC '93.

In   , we have shown that routing if outerplanar networks can be done along the shortest paths with O(logn) -bit labels, where n is the number of nodes in the network, extending a result of Fraigniaud et al. obtained for trees. The solution actually can be generalized to k -celullar networks, which is roughly a network that is the union of k outerplanar networks. It is worth to mention that such a scheme can be constructed in quadratic time.

In 2007, we also gave an invited lecture on compact routing schemes   at a workshop on Peer-to-Peer, Routing in Complex Graphs, and Network Coding in Thomson Labs in Paris.

In 2008, we have proposed a minimum stretch compact name-independent routing   . This scheme is the based of the Compact Routing Simulator we are developping in the Alcatel-Lucent Bell project.

##### Succinct Representation of Underlying Topologies

In order to optimize applications the platform topology itself must be discovered, and thus represented in memory with some data structures. The size of the representation is an important parameter, for instance, in order to optimize the throughput during the exploration phase of the platform.

Classical data structures for representing a graph (matrix or list) can be significantly improved when the targeted graph falls in some specific classes or obeys to some properties: the graph has bounded genus (embeddable on surface of fixed genus), bounded tree-width (or c -decomposable), or embeddabble into a bounded page number   ,  . Typically, planar topologies with n nodes (thus embeddable on the plane with no edge crossings) can by efficiently coded in linear time with at most 5n + o(n) bits supporting adjacency queries in constant time. This improves the classical adjacency list within a non negligible logn factor on the size (the size is about 6nlogn bits for edge list), and also on the query time   ,  ,  .

In 2008, we gave a compact encoding scheme of pagenumber k graphs   .

##### Local Data Structures and Other Queries

The basic routing scheme and the overlay networks must also allow us to route other queries than routing driven by applications. Typically, divide-and-conquer parallel algorithms require to compute many nearest common ancestor (NCA) queries in some tree decomposition. In a large scale platform, if the current tree structure is fully or partially distributed, then the physical location of the NCA in the platform must be optimized. More precisely, the NCA computation must be performed from distributed pieces of information, and then addressed via the routing overlay network (cf.   for distributed NCA algorithms).

Recently, a theory of localized data structures has been developed (initialized by   ; see   for a survey). One associates with each node a label such that some given function (or predicate) of the node can be extracted from two or more labels. Theses labels are usually joined to the addresses or inserted into a global database index.

In relation with the project, queries involving the flow computation between any sink-target pair of a capacitated network is of great interest   . Dynamic labeling schemes are also available for tree models   ,  , and need further work for their adaptation to more general topologies.

Finally, localized data structures have applications to platforms implementing large database XML file types. Roughly speaking pieces of a large XML file are distributed along some platform, and some queries (typically some SELECT ... FROM extractions) involve many tree ancestor queries   , the XML file structure being a tree. In this framework, distributed label-based data structures avoid the storing of a huge classical index database.

In 2007, we have proved that it is possible to assigned with each node of n -node planar networks a label of 2logn + O(loglogn) bits so that adjacency between two nodes can be retrieved from there labels   . Classical representations of planar graphs in the distributed setting where based on the Three Schnyder Trees decomposition, leading to 3logn + O(log*n) bit labels (FOCS '01). An intriguing question is to know whether clogn -bit representation exists for planar graphs with c<2 .

For trees, we have can solve k -ancestry and distance-k queries with shorter labels   ,  . Previous solutions achieve logn + O(k2loglogn) -bit labels [Alstrup-Bille-Rauhe 2005], whereas we have prove that logn + O(kloglogn) -bit labels suffice. For interval graphs, we have given an optimal distance labeling scheme   , and we proposed a localized and compact data structure for comparability graphs   .

In   ,  ,  , we also analyzed the locality of the construction of sparse spanners. In   , we proposed an efficient first-order model checking using short labels.

Finally, we have started a collaboration with Andrew Twigg (Thomson - Labs) and Bruno Courcelle (LaBRI) about connectivity in semi-dynamic planar networks (see preliminary results here   and here   ). In this model, the must precompute some localized data-structure (given as a label associate with each node) and for a planar graph G , so that connectivity between any two nodes in where X is any subset of nodes or edges, can be determined from the labels of the two nodes and the labels of the nodes (or end-point of edges) of X . This field looks promising since it capture a kind of dynamicity of the network, and we hope to generalize this model and our results.

##### Distributed Greedy Coloring

Participants : Cyril Gavoille, Ralf Klasing, Adrian Kosowski.

Distributed Greedy Coloring is an interesting and intuitive variation of the standard Coloring problem. Given an order among the colors, a coloring is said to be greedy if there does not exist a vertex for which its associated color can be replaced by a color of lower position in the fixed order without violating the property that neighboring vertices must receive different colors. In  , we consider the problems of Greedy Coloring and Largest First Coloring (a variant of greedy coloring with strengthened constraints) in the Linial model of distributed computation, providing lower and upper bounds and a comparison to the ( + 1) -Coloring and Maximal Independent Set problems, with being the maximum vertex degree in G .

##### Compression of Data Warehouse

Participants : Nicolas Hanusse, Radu Tofan.

In  , we propose a new view selection algorithm. Such algorithm takes as input a fact table and computes a set of views to store in order to speed up queries. The performance of view selection algorithm is usually measured by three criteria: (1) the amount of memory to store the selected views, (2) the query response time and (3) the time complexity of this algorithm. The two first measurements deal with the output of the algorithm. No existing solutions give good trade-offs between amount of memory and queries cost with a small time complexity. We propose in this paper an algorithm guaranteeing a constant approximation factor of queries response time with respect to the optimal solution. Moreover, the time complexity for a D -dimensional fact table is O(D*2D) corresponding to the fastest known algorithm. We provide an experimental comparison with two other well known algorithms showing that our approach also give good performance in terms of memory. Experiments are done in a centralized setting but our algorithm can easily be adapted in a parallel setting.

We also proposed a new algorithm that allow the administrator or user of a SGBD to choose which part of the data cube to optimize. This problem is called in the litterature the views selection problem . The goal consists in chosing the best part of the whole data cube to precompute. Our contribution is to consider that the main constraint is the time to answer to individual queries whereas the memory constraint is usually taken  .

The next step consists in turning our approach into a parallel and distributed algorithm. We are currently experiencing a parallel algorithm with a theoretical guarantee of performance. More precisaly, given a constant f , the query time is at most f times the optimal query (defined whenever the result has already been computed).

##### Maximal Frequent Itemsets.

It turns out that our solution can be adapted to the problem of finding quikly the maximal frequent itemsets within a transaction tables  . A transaction consists in a list of items. For a given frequency, we aim at computing the maximal itemsets that are frequent in list of transactions. To our knowledge, there is no parallel algorithm with a guarantee of performance that compute the maximal frequent itemsets. Our solution for the view selection algorithm should be experienced on real instances.

#### Distributed Algorithms

##### Overlay and Small World Networks

An overlay network is a virtual network whose nodes correspond either to processors or to resources of the network. Virtual links may depend on the application; for instance, different overlay networks can be designed for routing and broadcasting.

These overlay networks should support insertion and deletion of users/resources, and thus they inherently have a high dynamism.

We should distinguish structured and unstructured overlay networks:

• In the first case, one aims at designing a network in which queries can be answered efficiently: greedy routing should work well (without backtracking), the spreading of a piece of information should take a very short time and few messages. The natural topology of these networks are graph of small diameter and bounded degree (De Bruijn graph for instance). However, dynamic maintenance of a precise structure is difficult and any perturbation of the topology gives no guarantee for the desired tasks.

• In the case of unstructured networks, there is no strict topology control. For the information retrieval task, the only attempt to bound the total number of messages consists of optimizing a flooding by taking into account statistics stored at each peer: number of requests that found an item traversing a given link, ...

In both approaches, the physical topology is not involved. To our knowledge, there exists only one attempt in this direction. The work of Abraham and Malhki  deals with the design of routing tables for stable platforms.

We are interested in designing overlay topologies that take into account the physical topology.

Another work is promising. If we relax the condition of designing an overlay network with a precise topology but with some topological properties, we might construct very efficient overlay networks. Two directions can be considered: random graphs and small-world networks.

Random graphs are promising for broadcast and have been proposed for the update of replicated databases in order to minimize the total number of messages and the time complexity   ,  . The underlying topology is the complete graph but the communication graph (pairs of nodes that effectively interact) is much more sparse. At each pulse of its local clock, each node tries to send or receive any new piece of information. The advantage of this approach is fault-tolerance. However, this epidemic spreading leads to a waste of messages since any node can receive many times the same update. We are interested in fixing this drawback and we think that it should be possible.

For several queries, recent solutions use small-world networks. This approach is inspired from experiments in social sciences   . It suggests that adding a few (non uniform) random and uncoordinated virtual long links to every node leads to shrink drastically the diameter of the network. Moreover, paths with a small number of hops can be found  ,  ,  .

Solutions based on network augmentation (i.e. by adding virtual links to a base network) have proved to be very promising for large scale networks. This technique is referred to as turning a network into a small-world network, also called the small-worldization process. Indeed, it allows to transform many arbitrary networks into networks in which search operations can be performed in a greedy fashion and very quickly (typically in time poly-logarithmic in the size of the network). This property implies that some information can be easily (or locally) accessed like the distance between nodes. More formally, a network is f -navigable if a greedy routing can be used to get routing paths of O(f) hops. Recently, many authors aim at finding some networks that be turned into logO(1) -navigable network.

Our goal is to study more precisely the algorithmic performance of these new small-world networks (w.r.t. time, memory, pertinence, fault-tolerance, auto-stabilization, ...) and to propose new networks of this kind, i.e. to construct the augmentation of the base network as well as to conceive the corresponding navigation algorithm. Like classical algorithms for routing and navigation (that are essentially based on greedy algorithms), the proposed solutions have to take into account that no entity has a global knowledge of the network. A first result in this direction is promising. In  , we proposed an economic distributed algorithm to turn a bounded growth network into a small-world. Moreover, the practical challenge will be to adapt such constructions to dynamic networks, at least under the models that are identified as relevant.

Can the small-worldization process be supported in dynamic platforms? Up to now, the literature on small-world networks only deals with the routing task. We are convinced that small-world topologies are also relevant for other tasks: quick broadcast, search in presence of faulty nodes, .... In general, we think that maintaining a small-world topology can be much more realistic than maintaining a rigidly structured overlay network and much more efficient for several tasks in unstructured overlay networks.

In 2007, we have two contributions dealing with overlay networks: (1) in  , there is a formal description of an algorithm turning any network into a n1/3 -navigable network. This article is particularly interesting since it is the first one that considers any input network in the small-worldization process; (2) in  ,  , we prove that local knowledge is not enough to search quickly for a target node in scale-free networks. Recent studies showed that many real networks are scale-free: the distribution of nodes degree follows a power law on the form with  [2, 3] , that is the number of nodes of degree k is proportional to . More precisely, we formally prove that in usual scale-free models, it takes (n1/2) steps to reach the target.

In 2008, we gave a small stretch polylogarithmic network navigability scheme using compact metrics   .

##### Image Retrieval in a Distributed Network

In  , we propose a new concept for browsing and searching in large collections of content-based indexed images. Our approach is inspired by greedy routing algorithms used in distributed networks. We define a navigation graph whose vertices represent images. The edges of the navigation graph are computed according to a similarity measure between indexed images. The resulting graph can be seen as an ad-hoc network of images in which a greedy routing algorithm can be applied for retrieval purposes. Experiments are done in a centralized setting and could be easily adapted to a distributed setting.

##### Mobile Agent Computing

In  , we consider networks in which there exists an harmful node, called black hole, destroying any incoming mobile agent. The black hole search problem consists for a team of mobile agents to locate the black hole in the network. We prove that, for this problem, the pebble model is computationally as powerful as the whiteboard model; furthermore the complexity is exactly the same. More precisely, we prove that a team of two asynchronous agents, each endowed with a single identical pebble (that can be placed only on nodes, and with no more than one pebble per node) can locate the black hole in an arbitrary network of known topology; this can be done with (nlogn) moves, where n is the number of nodes, even when the links are not FIFO.

In the effort to understand the algorithmic limitations of computing by a swarm of robots, the research has focused on the minimal capabilities that allow a problem to be solved. The weakest of the commonly used models is Asynch where the autonomous mobile robots, endowed with visibility sensors (but otherwise unable to communicate), operate in Look-Compute-Move cycles performed asynchronously for each robot. The robots are often assumed (or required to be) oblivious: they keep no memory of observations and computations made in previous cycles. In the paper   , we consider the setting when the robots are dispersed in an anonymous and unlabeled graph, and they must perform the very basic task of exploration : within finite time every node must be visited by at least one robot and the robots must enter a quiescent state. The complexity measure of a solution is the number of robots used to perform the task. We study the case when the graph is an arbitrary tree and establish some unexpected results. We first prove that there are n -node trees where (n) robots are necessary; this holds even if the maximum degree is 4. On the other hand, we show that if the maximum degree is 3, it is possible to explore with only robots. The proof of the result is constructive. Finally, we prove that the size of the team is asymptotically optimal : we show that there are trees of degree 3 whose exploration requires robots.

In  , we consider the problem of periodic graph exploration in which a mobile entity with constant memory, an agent , has to visit all n nodes of an arbitrary undirected graph G in a periodic manner. Graphs are supposed to be anonymous, that is, nodes are unlabeled. However, while visiting a node, the robot has to distinguish between edges incident to it. For each node v the endpoints of the edges incident to v are uniquely identified by different integer labels called port numbers . We are interested in minimisation of the length of the exploration period. This problem is unsolvable if the local port numbers are set arbitrarily [L. Budach: Automata and labyrinths, Math. Nachrichten 86(1): 195-282 (1978) ]. However, surprisingly small periods can be achieved when assigning carefully the local port numbers. Dobrev et al. [S. Dobrev, J. Jansson, K. Sadakane, W.-K. Sung: Finding Short Right-Hand-on-the-Wall Walks in Graphs, 12th Colloquium on Structural Information and Communication Complexity SIROCCO, LNCS 3499, 127-139, 2005 ] described an algorithm for assigning port numbers, and an oblivious agent (i.e. agent with no memory) using it, such that the agent explores all graphs of size n within period 10n . Providing the agent with a constant number of memory bits, the optimal length of the period was proved in   to be no more than 3.75n (using a different assignment of the port numbers). In this paper, we improve both these bounds. More precisely, we show a period of length at most for oblivious agents, and a period of length at most 3.5n for agents with constant memory. Moreover, we give the first non-trivial lower bound, 2.8n , on the period length for the oblivious case.

In  , we consider the problem of exploring an anonymous undirected graph using an oblivious robot. The studied exploration strategies are designed so that the next edge in the robo t's walk is chosen using only local information, and so that some local equity (fairness) criterion is satisfied for the adjacent undirected edges. Such strategies can be seen as an attempt to derandomize random walks, and are natural undirected counterparts of the rotor-router model for symmetric directed graphs. The first of the studied strategies, known as Oldest-First (OF), always chooses the neighboring edge for which the most time has elapsed since its last traversal. Unlike in the case of symmetric directed graphs, we show that such a strategy in some cases leads to exponential cover time. We then consider another strategy called Least-Used-First (LUF) which always uses adjacent edges which have been traversed the smallest number of times. We show that any Least-Used-First exploration covers a graph G = (V, E) of diameter D within time O(D|E|) , and in the long run traverses all edges of G with the same frequency.

The rotor-router model , also called the Propp machine , was first considered as a deterministic alternative to the random walk. It is known that the route in an undirected graph G = (V, E), where |V| = n and |E| = m, adopted by an agent controlled by the rotor-router mechanism forms eventually an Euler tour based on arcs obtained via replacing each edge in G by two arcs with opposite direction. The process of ushering the agent to an Euler tour is referred to as the lock-in problem . In recent work [V. Yanovski, I.A. Wagner, A.M. Bruckstein: A Distributed Ant Algorithm for Efficiently Patrolling a Network, Algorithmica 37: 165–186 (2003) ], Yanovski et al. proved that independently of the initial configuration of the rotor-router mechanism in G the agent locks-in in time bounded by 2mD, where D is the diameter of G. In  , we examine the dependence of the lock-in time on the initial configuration of the rotor-router mechanism. The case study is performed in the form of a game between a player intending to lock-in the agent in an Euler tour as quickly as possible and its adversary with the counter objective. First, we observe that in certain (easy) cases the lock-in can be achieved in time O(m) . On the other hand we show that if adversary is solely responsible for the assignment of ports and pointers, the lock-in time (m·D) can be enforced in any graph with m edges and diameter D. Furthermore, we show that if provides its own port numbering after the initial setup of pointers by , the complexity of the lock-in problem is bounded by O(m·min{logm, D}) . We also propose a class of graphs in which the lock-in requires time (m·logm). In the remaining two cases we show that the lock-in requires time (m·D) in graphs with the worst-case topology. In addition, however, we present non-trivial classes of graphs with a large diameter in which the lock-in time is O(m).

In  , we consider the model of exploration of an undirected graph G by a single agent which is called the rotor-router mechanism or the Propp machine (among other names). Let v indicate the edge adjacent to a node v which the agent took on its last exit from v . The next time when the agent enters node v , first a “rotor” at node v advances pointer v to the edge which is next after the edge v in a fixed cyclic order of the edges adjacent to v . Then the agent is directed onto edge v to move to the next node. It was shown before that after initial O(mD) steps, the agent periodically follows one established Eulerian cycle, that is, in each period of 2m consecutive steps the agent traverses each edge exactly twice, once in each direction. The parameters m and D are the number of edges in G and the diameter of G . We investigate robustness of such exploration in presence of faults in the pointers v or dynamic changes in the graph. We show that after the exploration establishes an Eulerian cycle,

• if at some step the values of k pointers v are arbitrarily changed, then a new Eulerian cycle is established within O(km) steps;

• if at some step k edges are added to the graph, then a new Eulerian cycle is established within O(km) steps;

• if at some step an edge is deleted from the graph, then a new Eulerian cycle is established within O( m) steps, where is the smallest number of edges in a cycle in graph G containing the deleted edge.

Our proofs are based on the relation between Eulerian cycles and spanning trees known as the “BEST” Theorem (after de Bruijn, van Aardenne-Ehrenfest, Smith and Tutte).

##### Other Results

Within the wider context of the project, we have published two book chapters on data gathering and energy consumption in wireless networks, respectively  ,  . We have also considered the problems of modeling of wireless networks  , energy efficiency in wireless networks  , efficient realization of specific classes of permutation networks  , and broadcasting in radio networks  .

Logo Inria