Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

GOSSPLE : A radically new approach to navigating the digital information universe

GOSSPLE is the topic of the ERC Starting Grant led by Anne-Marie Kermarrec.

While the Internet has fully moved into homes, creating tremendous opportunities to exploit the huge amount of resources at the edge of the network, the Web has changed dramatically over the past years. There has been an exponential growth of user-generated content (Flickr, Youtube, Delicious, ...) and a spectacular development of social networks (Twitter, FaceBook, etc.). This represents a fantastic potential in leveraging such kinds of information about the users: their circles of friends, their interests, their activities, the content they generate. This also reveals striking evidence that navigating the Internet goes beyond traditional search engines. New and powerful tools that could empower individuals in ways that the Internet search will never be able to do are required.

The objective of Gossple is to provide an innovative and fully decentralized approach to navigating the digital information universe by placing users affinities and preferences at the heart of the navigation process. Building on the peer to peer communication paradigm and harnessing the power of gossip-based algorithms, Gossple aims at personalizing Web navigation, by means of a fully decentralized solution, for the sake of scalability and privacy. The Gossple challenges have been published in SSS 2009 [55] .

The Gossple anonymous social network

Participants : Xiao Bai, Marin Bertier, Davide Frey, Anne-Marie Kermarrec, Vincent Leroy.

This work [25] has been done in collaboration with Prof. Rachid Guerraoui (EPFL). Gossple exploits the social dimension of the Internet to get “related” users indirectly connected and refine each other's filtering procedures through implicit preferences. The network is organized around such preferences and affinities between users. Such a network of affinities is at the heart of Gossple . The Gossple anonymous network provides each user with a personalized view of the network through a thrifty decentralized protocol that automatically infer personalized connections in Internet-scale systems. Gossple nodes continuously gossip digests of their corresponding interest profiles and locally compute a personalized view of the network which is then leveraged to improve their Web navigation. The view covers multiple interests without any explicit support (such as explicit social links or ontology) and without violating anonymity : the association between users and profiles is hidden.

Basically, every Gossple node has a proxy, chosen randomly, gossiping its profile digest on its behalf ; the node transmits its profile to its proxy in an encrypted manner through an intermediary, which cannot decrypt the profile. To reduce bandwidth consumption, the gossip exchange procedure is thrifty : nodes do not exchange profiles but only Bloom filters of those until time reveals that the two nodes might indeed benefit from the exchange. To limit the number of profiles maintained by each node, while encompassing the various interests of the user associated with the node, we introduce a new set cosine similarity , as a generalization of the classical cosine similarity metric and an effective heuristic to compute it.

Query expansion in Gossple

Participants : Marin Bertier, Davide Frey, Anne-Marie Kermarrec, Vincent Leroy.

This work [32] has been done in collaboration with Prof. Rachid Guerraoui (EPFL). A query expansion system seeks to extend a set of keywords (depicting a query) with additional ones to improve the query results. We consider here a collaborative tagging system, such as Delicious or CiteUlike, where every node is associated with a tagging profile. We describe how to use the Gossple network to expand queries in a personalized way and significantly improve the completeness (recall) and accuracy (precision) of the results, with respect to the state of the art personalized centralized approach, namely Social Ranking. Note Gossple 's personalized approach also automatically handles cases like homonym ambiguity. For instance, the terms “computer” or “fruit” will be added to the “apple” query depending on the node's profile.

This is applied in a collaborative tagging system such as delicious where users tag items with tags. The way we achieve the query expansion is as follows. We use Gossple personalized network on each node (called the GNet ) to compute a data structure we call TagMap , a personalized view of the relations between the tags in the node's profile and those in its GNet . The TagMap is the only source of information that is used for the query expansion and it is updated periodically to reflect the changes in the GNet . Only the scores associated with the tags affected by the modifications need to be updated. A query per se is then expanded using the TagMap through a centrality algorithm we call TagRank , which we derived from the celebrated PageRank algorithm. While PageRank computes the relative importance of Web pages (eigenvector centrality), TagRank computes the relative importance of tags on a given node. Our TagRank algorithm estimates the relevance of each tag in the TagMap with respect to the query and assigns a score to the tag. Our results obtained on real traces crawled from Delicous show that the Gossple query expansion mechanism improves both recall and precision.

Gossip-based top-k processing in Gossple

Participants : Xiao Bai, Marin Bertier, Anne-Marie Kermarrec, Vincent Leroy.

This work [25] has been done in collaboration with Prof. Rachid Guerraoui (EPFL). A fine grained personalisation to process top-k queries requires to maintain inverted lists on a user basis, relying on the information held by users that share interests. This is almost impossible to achieve in a centralized approach as the storage required is prohibitively large and the maintenance of millions of users' inverted lists would overwhelm a central server. Alternatively, each user may be in charge of storing her entire personal inverted lists. This requires each user to store the information stored by all related users. This number potentially grows linearly with the number of users and would be also ultimately be prohibitively large.

Instead, we propose the first fully decentralized personalized top-k algorithm. The inverted lists are not pre-computed, but computed on the fly based on information collected in a fully decentralized manner in the network. Each user identifies its personal network by gossiping user profiles and measuring similarity between users. Yet, each user stores a very small number of full profiles (say 20) along with the ID of the users of her personal network. Each top-k query is then gossiped in the network, harvesting at each hop the relevant information. Partial results are remotely computed and sent back to the requester who sees her request refined by the minute. While the network is maintained at a low frequency to avoid overloading the network, top-k queries speeds up that frequency, refreshing the part of the network involved in the query and generating a wave of updates in the personalization process. Results obtained on a 10,000 Delicious trace show that with each node storing 20 profiles, top-k queries are satisfied in less than 10 cycles.

Recommendation in Gossple

Participants : Davide Frey, Anne-Marie Kermarrec, Vincent Leroy, Afshin Moin, Christopher Thraves.

We are working on a decentralized recommender system to be used in peer to peer systems. This recommender system relies on epidemic protocols to form a personal network for each peer, using a correlation measure suitable for decentralized systems. A random walk graph model is then used to detect further the confidence between users. This confidence finally serves as the weight for the ratings of the neighbors in the personal network in order to predict the unknown ratings of the central user. At the end, a diffusion phase is added to retreive a higher number of recommendations.

Papeer is a novel user-centric gossip-based paper indexing platform, through which users can search for scientific papers, share them with their collaborators as well as find new people to collaborate with on new research topics. Different from web-based tagging platforms, Papeer's decentralized architecture makes locally stored tags and content available to other users with similar research interests, through a personalized interest-based social network. The same social network constitutes the basis for Papeer's ability to recommend the coworkers who are most interested in a given paper, or who are most likely to be interested in a given paper or topic. We validated Papeer's recommendation approach by means of an experimental evaluation on a data trace with 13000 users.

Gossip protocols in practice: NAT-resilient protocols

Participant : Anne-Marie Kermarrec.

Gossip peer sampling protocols now represent a solid basis to build and maintain peer to peer (p2p) overlay networks. They typically provide peers with a random sample of the network and maintain connectivity in highly dynamic settings. They rely on the assumption that, at any time, each peer is able to establish a communication with any of the peers of the sample provided by the protocol. Yet, this ignores the fact that there is a significant proportion of peers that now sit behind NAT devices (70% is a fair ratio in the current Internet), preventing direct communication without specific mechanisms. This has been largely ignored so far in the community. Our experiments demonstrate that the presence of NATs, introducing some restrictions on the communication between peers, significantly hurts both the randomness of the provided samples and the connectivity of the p2p overlay network, in particular in the presence of high rate of peers arrivals, departures and failures (aka churn). We proposed a NAT resilient gossip peer sampling protocol, called Nylon that accounts for the presence of NATs. Nylon is fully decentralized and spreads evenly the extra load caused by the presence of NATs, between peers. Nylon ensures that a peer can always establish a communication, and therefore initiates a gossip, with any peer in its sample. This is achieved through a simple, yet efficient mechanism, establishing a path of relays between peers. Our results show that the randomness of the generated samples is preserved, that the connectivity is not impacted even in the presence of high churn and a high ratio of peers sitting behind NAT devices. This work has been published in ICDCS 2009 [60] .

Search over social networks

Participants : Anne-Marie Kermarrec, Guang Tan.

This project targets advanced information needs that can hardly be satisfied by generic search engines even in the predictable future, and thus require human's explicit effort. Specificly we study a popular network application: the online question/answer services. Different from existing solutions provided by centralized sites (e.g., Yahoo! Answers), our sysem, called AskBuddies, takes a decentralized architecture over social networks such as MySpace, Friendster, MSN and Yahoo Messenger. The system aims to find for a query the top-k best answerers, considering both knowledge match and social distance which we believe plays an important role in providing incentive. Efficient algorithms of identifying short chains of acquaintance, or referral chains, and of distributed top-k processing will be devised and implemented. Through deploying and running a real system we will examine user behaviors in such an application, and how social networks may benefit distributed information retrieval in more general contexts.

Cold start link prediction in social network

Participant : Vincent Leroy.

This work has been done in collaboration with B. Barla Cambazoglu and F. Bonchi from Yahoo Research, Spain. In the traditional link prediction problem, a snapshot of a social network is used as a starting point to predict, by means of graph-theoretic measures, the links that are likely to appear in the future. In this paper, we introduce Cold Start Link Prediction as the problem of predicting the structure of a social network when the network itself is totally missing while some other information regarding the nodes is available. We propose a two-phase method based on the Bootstrap Probabilistic Graph . The first phase generates an implicit social network under the form of a probabilistic graph. The second phase applies probabilistic graph-based measures to produce the final prediction. We assess our method empirically over a large data collection obtained from Flickr, using interest groups as the initial information. The experiments confirm the effectiveness of our approach.


Logo Inria