Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Partnerships and Cooperations
XML PDF e-pub
PDF e-Pub

Section: Overall Objectives

Overall Objectives

Today data is being generated at an unprecedented rate, so much that 90created in the past two years. Such significant increase of data volume is due to the new ways that we gather data: from software tools that record system and user activities; from sensors and scientific instruments that monitor our built and natural environment; from medical instruments that enable genomic diagnosis of patients; and from user-initiated sources on the Web or social networks. Data often comes with semantics, enriching its interpretation and enhancing its value. Importantly, we observe that in today’s data-intensive application, variety is the norm, and is likely to re- main so for a while. This is because different applications are best served by different kinds of data: traditional commerce-oriented applications use relational databases, Web content management systems handle semistruc- tured documents, sensors provide numerical streams, science applications manipulate arrays, highly heterogeneous data sets is often exported in RDF graphs, software system logs consist of structured text etc. At the scale and speed of consumption of today’s Big Data, unifying data across such formats into a single architecture (approach formerly known as extract-transform-load in a data warehouse context) is no longer feasible. Instead, Cedar aims at inventing expressive models and highly efficient data management tools, focused from the start on Big Data variety. Our tools will be designed for deployment in the cloud, and validated at large scale.