## Section: New Results

### Information systems

Participants : Sara Alouf, Eitan Altman, Konstantin Avrachenkov, Abdulhalim Dandoush, Alain Jean-Marie, Philippe Nain, Danil Nemirovsky, Sreenath Ramanath, Yuedong Xu.

#### Peer-to-peer networks

##### Content dissemination in P2P networks

In [85] and [84] , E. Altman, P. Nain and Y. Xu, in collaboration with A. Shwartz (Technion), study the transient behavior of some peer-to-peer (P2P) networks. This work has two objectives: the first one is to study rigorously the transient behavior of some P2P networks whenever the information is replicated and disseminated according to epidemic-like dynamics, the second one is to use the insight gained from the previous analysis in order to predict how efficient are measures taken against P2P networks. A stochastic (Markov) model is introduced which extends a classical epidemic model and characterizes the P2P swarm behavior in presence of free riding peers. Another model is also considered in which a peer initiates a contact with another peer chosen randomly. In both cases, the network is shown to exhibit a phase transition: a small change in the parameters causes a large change in the behavior of the network. It is shown, in particular, how the phase transition affects measures that content provider networks may take against P2P networks that distribute non-authorized music or books, and what is the efficiency of counter-measures.

##### Storage in distributed/peer-to-peer systems

Distributed systems using a network of peers have become an alternative solution for storing data. These systems are based on three pillars: data fragmentation and dissemination among the peers, redundancy mechanisms to cope with peers churn and repair mechanisms to recover lost or temporarily unavailable data. Traditional redundancy schemes are replication and erasure codes. A new class of network coding (regenerating codes) has been proposed recently. In prior efforts, A. Dandoush, S. Alouf, and P. Nain have studied the performance of peer-to-peer storage systems in terms of data lifetime and availability using the tradional redundancy schemes. In [78] , [14] they compare the performance of distributed storage systems that use regenerating codes to those that use traditional redundant codes.

##### Real-time control of contents download

The question of optimally prefetching data during the navigation over
a hyperliked document has been studied by A. Jean-Marie, jointly with
O. Morad (CNRS and University of Montpellier 2), as part of the
Vooddo project (funded by the “Multimedia” Program of the Anr ).
A Markov-Decision theoretic model has been developped, and solution
algorithms have been implemented. Preliminary results have been
exposed in [71] and during the *Probability Models in
Performance Analysis* workshop (London, September 2010).

##### BitTyrant

The success of BitTorrent has fostered the development of variants to its basic components. Some of the variants adopt greedy approaches aiming at exploiting the intrinsic altruism of the original version of BitTorrent in order to maximize the benefit of participating to a torrent. G. Neglia, together with D. Carra (Univ. Verona) and P. Michiardi (Institut Eurecom), has studied BitTyrant, a recently proposed strategic client. The research is described in Maestro 2008 activity report. Results have been extended and supported by PlanetLab experiments in [87] .

#### Online social networks

In [39] , K. Avrachenkov, in collaboration with B. Ribeiro and D. Towsley (Univ. Massachusetts), studies online social networks and proposes a hybrid sampling scheme that mixes independent uniform node sampling and random walk (RW)-based crawling. The authors show that their sampling method combines the strengths of both uniform and RW sampling while minimizing their drawbacks. In particular, the method increases the spectral gap of the random walk, and hence, accelerates convergence to the stationary distribution. The proposed method resembles PageRank but unlike PageRank preserves time-reversibility. Applying the hybrid RW to the problem of estimating degree distributions of social networks demonstrates promising results.

#### Document ranking and clustering on the Web

A random walk can be used as a centrality measure of a directed graph. However, if the graph is reducible, the random walk will be absorbed in some subset of nodes and will never visit the rest of the graph. In Google PageRank the problem was solved by the introduction of uniform random jumps with some probability. Up to the present, there is no final answer to the question about the choice of this probability. In [25] , K. Avrachenkov and D. Nemirovsky, in collaboration with V. Borkar (Tata Institute of Fundamental Research, India), propose to use a parameter-free centrality measure which is based on the notion of a quasi-stationary distribution. Specifically, the authors suggest four quasi-stationary based centrality measures, analyze them and conclude that they produce approximately the same ranking.

In the third edition of WePS (Web Person Search) campaign K. Avrachenkov, in collaboration with E. Smirnova and B. Trousse (Axis , Inria ) [66] , has undertaken the person name disambiguation problem referred to as a clustering task. The aim was to make use of intrinsic link relationships among Web pages for name resolution in Web search results. To date, link structure has not been used for this purpose. However, Web graph can be a rich source of information about latent semantic similarity between pages. In their approach they hypothesize that pages referring to one person should be linked through the Web graph structure, namely through topically related pages. Their clustering algorithm has obtained the first official place in the WePS competition.

#### Fair resource allocation

"How can we allocate network resources fairly among different classes of users?" In [74] , E. Altman, K. Avrachenkov and S. Ramanath introduce T-scale and multiscale fairness to address this question. This new concept allows them to distribute the network resources fairly among different classes of traffic. They demonstrate these concepts in a number of examples from spectrum allocation and indoor-outdoor scenarios.