Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Visual Analytics

Participants : Daniel Archambault, Romain Bourqui, Frédéric Gilbert, Pierre-Yves Koenig, Guy Melançon, Paolo Simonetto, Faraz Zaidi.

The overarching driving vision of visual analytics is to turn the information overload into an opportunity: Just as information visualization has changed our view on databases, the goal of Visual Analytics is to make our way of processing data and information transparent for an analytic discourse. The visualization of these processes will provide the means of communicating about them, instead of being left with the results. Visual Analytics will foster the constructive evaluation, correction and rapid improvement of our processes and models and – ultimately – the improvement of our knowledge and our decisions.

This description of Visual Analytics borrowed from [1] thus calls for interactive visual representations supporting analytical processes involved in the collection of data, its inspection and classification for instance. Visual analytics solutions should help users develop hypothesis about phenomenon under study, ultimately leading them to validated conclusions supported by sound visual representations [30] . It is with no surprise that Visual Analytics attracts the software industry in an effort to develop visual solutions improving classical interfaces to information systems [18] .

A good example of a real-life situation where visual analytics can be used is the recent VAST Contest 2009. Participants were challenged to produce and combine tools helping analysts to tackle different sets of log data describing the activity of embassy employees among which one was suspected to filter out classified data. Our participation to the contest won us two awards(URL pointing at the VAST Contest website.)[16] .

Part of our solution relied on the visual inspection of graph patterns in a social network describing the links of employees with other employees and external partners, as a mean to sort out potential criminals. A second investigation method was based on the design of a timeline-based graphical representation of all employees' activities during a full month. Visually pointing at unusual patterns in working hours and physical presence helped to rapidly identify potential criminals.

Interactive graph mining

. Interactive graph mining is what we aim at designing and realize. Efficient visual analytics requires to astutely combine interaction with together with graph statistics and graph drawing. Building effective visualization systems is difficult as it requires to combine analytics based on data analysis brought to understandable and intuitive graphical representations equipped with adequate user interaction.

The use of graph hierarchies still remains a central paradigm we exploit. Often after graph hierarchy construction, there exists several large metanodes which can contain tens or hundreds of thousands of nodes. In steerable graph drawing systems, these metanodes can be a problem as they could take a half an hour or more to draw in their entirety. However, frequently, a user is interested in nodes that are close to a particular node or subgraph present in the hierarchy.

We have designed TugGraph, a system for exploring paths and proximity around nodes and subgraphs part of a hierarchy of subgraphs [11] . The approach modifies a pre-existing hierarchy in order to see how a node or subgraph of interest extends out into the larger graph. The system works well on graphs of hundreds of thousands of nodes and millions of edges and is able to present this information in a matter of seconds. TugGraph is a follow-up of previous work published in 2008 [3] . Figure 7 shows paths emanating out from UBC into the larger Internet.

Figure 7. Exploration of the Net05 data set using (a) initial hierarchy decomposition and (b)-(g) TugGraph. The graph hierarchy shows a good initial decomposition but is unable to go further since the attribute information on this data set is minimal. TugGraph, however, shows how UBC (University of British Columbia, Vancouver, CA) connects to the Internet.

Content Analysis (CA) typically contributes to visual analytics advances by providing methods and techniques to explore and analyze contents of a set of documents in order to discover patterns and hidden knowledge  [85] [93] . Document Content Visualization Systems can be used as a tool for CA where the goal is to represent textual contents of a set of documents in a visual form so as to facilitate the process of mining and discovering patterns in a collection of documents  [75] , [56] .

A typical application of CA in the domain of web is the analysis of the set of web pages browsed by a user in order to find required information. While browsing a web page having external links; it is imperative for the users to browse each and every external link if further information is required. This task is not only time consuming but makes it difficult for the users to relate contents of web pages to each other. Moreover apart from going over a single web page, most of the time, users tend to collect a set of web pages rather than a single web page to obtain information  [94] . In case of browsing, it means that a user would explore the links in the selected pages further to extract more information.

Now, it turns out that the networks emerging from CA or web foraging typically are scale free and small world making them hard to analyze and manipulate. Astutely exploiting these scale-free and small world properties, we developed techniques to support content analysis that help users analyze these networks representing the textual contents of a set of web pages visually [17] . The proposed system addresses two main problems in the analysis of complex networks. First is revealing the community structures hidden in the network through simplification of the graph and clustering. And second, it presents a visualization system that helps highlight the important concepts that relate different clusters. The image below shows the various steps of the process to achieve a visualization from a complex small world and scale free graph to a much more readable and organized graph.

Figure 8. Pipeline architecture of the process computing communities out of a scale-free network of documents and extracted concepts.


Logo Inria