Section: New Results
Distributed Collaborative Systems - Collaborative Knowledge Building
Introduction
Distributed collaborative systems (DCS) facilitate and coordinate collaboration among multiple users who jointly fulfill common tasks over computer networks. The explosion of Web 2.0 and especially wiki systems showed that a simple distributed collaborative system can transform communities of strangers into a community of collaborators. This is the main lesson taught by Wikipedia. Even if many DCS are currently available, most of them rely on a centralized architecture and consequently suffer of intrinsic problems of centralized architectures: lack of fault tolerance, poor scalability, costly infrastructure, problems of privacy.
Our main work focused on migrating DCS to pure peer-to-peer architecture. It requires developing new algorithms in order to enable collaborative editing of complex data and massive collaboration.
This year, we made several contributions: we extended algorithms to manage complex data types such as semantic wikis, we developed an algorithm that scales in terms of number of sites and number of edits, we proposed a novel architecture for deploying wikis over structured peer-to-peer networks and we proposed an approach for easing group collaboration over shared workspaces.
Scalable Optimistic Replication Algorithms for Peer-to-Peer Networks
Participants : Pascal Molli, Pascal Urso, Stéphane Weiss.
Several collaborative editing systems are becoming massive: they support a huge number of users to obtain quickly a huge amount of data. For instance, Wikipedia is edited by 7.5 million of users and got 10 million of articles in only 6 years. However, most of collaborative editing systems are centralized with costly scalability and poor fault tolerance. To overcome these limitations, we aim to provide a peer-to-peer collaborative editing system.
Peer-to-peer systems rely on replication to ensure scalability. A single object is replicated a limited number of times in structured networks (such as Distributed Hash Tables) or a unbounded number of times in unstructured peer-to-peer networks. In all cases, replication requires to define and maintain consistency of copies. Most of the approaches for maintaining consistency do no support peer-to-peer constraints such as churn while the others rely on data “tombstones”. In these approaches, a deleted object is replaced by a tombstone instead of removing it from the document model. Tombstones cannot be directly removed without compromising the document consistency. Therefore, the overhead required to manage the document grows continuously.
This year, we designed a new optimistic replication algorithm called Logoot [38] that ensures consistency for linear structures. Logoot tolerates a large number of copies, and does not require the use of tombstones. This approach is based on non-mutable and totally ordered object position identifiers. Logoot supports multiple strategies [45] to build these identifiers. The time complexity of Logoot is only logarithmic according to the document size. We evaluated and validated the Logoot scalability and compared it with tombstone-based solutions on real data extracted from Wikipedia. These data are all the modifications ever produced on thirty page among the most edited and longest pages of the Wikipedia.
Distributed Collaborative Systems over Peer-to-Peer Structured Networks
Participants : Pascal Molli, Gérald Oster, Sergiu Dumitriu.
The ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. Last years, we studied and proposed solution based on unstructured peer-to-peer networks. If these results were promising, the chosen architecture implies that the whole content (whole wiki data) is replicated on every peer-to-peer node. In many cases, this assumption is not acceptable. Therefore, this year, we proposed a peer-to-peer solution for distributing and managing dynamic content over a peer-to-peer structured network. The proposed solution [58] , [32] combines two widely studied technologies: Distributed HashTables (DHT) and optimistic replication. In our “universal wiki” engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability, as well as suitability for large-scale scenarios.
A first prototype has been implemented in collaboration with Rubén Mondéjar, a PhD student from Universitat Rovira i Virgili, Catalonia (Spain). The implementation is based on Damon [30] , a distributed AOP middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system transparently usable by third party wiki engines. Finally, UniWiki has been proved viable and fairly efficient in large-scale scenarios.
Easy Collaboration over Shared Workspaces
Participants : Claudia-Lavinia Ignat, Pascal Molli, Gérald Oster.
Existing tools for supporting parallel work feature some disadvantages that prevent them to be widely used. Very often they require a complex installation and creation of accounts for all group members. Users need to learn and deal with complex commands for efficiently using these collaborative tools. Some tools require users to abandon their favorite editors and impose them to use a certain co-authorship application. In [29] , we proposed the DooSo6 collaboration tool that offers support for parallel work, requires no installation, no creation of accounts and that is easy to use, users being able to continue working with their favorite editors. User authentication is achieved by means of a capability-based mechanism. A capability is defined as a couple (object reference, access right). If a user possesses this capability he/she has the specified right to the referenced object. The system manages capabilities for publishing and updating shared projects. The prototype relies on the data synchronizer So6 (http://www.libresource.org/ ).
Distributed Collaborative Knowledge Building
Participants : Hala Skaf-Molli, Gérôme Canals, Pascal Molli, Charbel Rahhal, Pascal Urso, Stéphane Weiss.
Semantic wikis are new generation of wikis. They combine the advantage of Web 2.0 and Semantic Web. Existing semantic wikis are based on centralized architecture. This architecture is in contradiction with the distributed social process of knowledge building [64] . The objective of this research is to build peer-to-peer semantic wikis for collaborative knowledge building. We are working on the following problems:
-
Building distributed Semantic Wikis for distributed collaborative knowledge building
-
Knowledge personalization in distributed Semantic Wikis.
-
Human-Computer collaboration for collaborative knowledge building.
We propose two approaches of peer-to-peer semantic wikis: Swooki approach and DSMW approach. Both approaches are based on optimistic replication algorithms. The main difference is the replication algorithm and the supported processes.
Collaborative Knowledge Building over Unstructured Peer-to-Peer Semantic Wikis
Swooki (http://wooki.sf.net/ ) is composed of a set of interconnected Semantic Wikis servers that forms the peer-to-peer network. Wikis pages and related semantic annotations are replicated over the network. Each peer offers all the services of a semantic wiki server. Swooki is built on unstructured peer-to-peer network. A peer can join and leave the network at each moment.
Users collaborate to edit wiki pages and their related semantic annotations. A modification on a copy is executed locally, and then it is broadcasted to other peers in the network to be integrated locally at each node. The system is correct if it respects the CCI (Causality, Convergence and Intention preservation ) consistency model.
To synchronize replicated semantic wiki pages, Swooki adapts the Synchronization algorithm woot [73] . woot is designed to synchronize linear structures such as wiki pages but it is not designed to synchronize non-linear structures such as Semantic data. Semantic data forms a RDF graph. We extend the woot algorithm to synchronize semantic data and to ensure the CCI consistency model on this data [43] , [35] . Swooki integrates also algorithms that support an undo mechanism [34] for reverting any modification of any user at anytime. Swooki is the first peer-to-peer semantic wiki.
Collaborative Knowledge Building over Trusted Semantic Wikis Networks
The main objective of replication algorithms of Swooki are providing better performance and fault tolerance. Another interesting objective could be supporting collaborative modes that preserve the privacy of users. In this case, every user maintains her own semantic wiki server. She can decide to publish pages and integrated pages published by other users [33] . This is the principal of the DSMW approach. The collaboration in DSMW is based on the publish/subscribe model. The publication, the propagation and the integration of modifications are under the control of the user.
This mode of work can be generalized to communities. A community can maintain a semantic wiki server. The community can then decide to publish some pages to other communities and integrate pages published by other communities. These collaborative networks ensure autonomy of communities and preserve privacy of community. In addition, this is compatible with the social organization of knowledge networks.
To develop this system, we need algorithms to synchronize the network and algorithms to manage publication and integration of modifications. DSMW uses the Logoot [38] algorithm to synchronize the semantic wikis pages. Logoot is an optimized version of woot , it ensures the convergence and intention preservation if the causality is ensured.
DSMW uses the publish/subscribe to propagate modifications. We developed the DSMW ontology to formalize the publish/subscribe model and we developed the needed algorithms to populate this ontology [33] . We demonstrate that DSMW algorithms ensure the causality, therefore, Logoot ensures the convergence and intentions preservation in DSMW .
We have implemented these algorithms as an extension of Semantic MediaWiki . This first version of DSMW was released in October 2009 at the address http://www.dsmw.org .
Knowledge Personalization in Distributed Semantic Wikis
In semantic wikis, wiki pages are annotated with semantic data to facilitate the navigation, information retrieving and ontology emerging. Semantic data represents the shared knowledge base which describes the common understanding of the community. However, in a collaborative knowledge building process the knowledge is basically created by individuals who are involved in a social process [61] . Therefore, it is fundamental to support personal knowledge building in a differentiated way. Currently there are no available semantic wikis that support both personal and shared understandings. In order to overcome this problem, we propose a peer-to-peer collaborative knowledge building process and extend semantic wikis with personal annotations facilities to express personal understanding. In this work, we detail the personal semantic annotation model and show its implementation in distributed semantic wikis. We also detail an evaluation study which shows that personal annotations demand less cognitive efforts than semantic data and are very useful to enrich the shared knowledge base [36] , [37] , [44] . This is a joint research with the University la Plata, Argentina.