LEADS: Large-Scale Elastic Architecture for Data-as-a-Service

Etienne Riviere
Université de Neuchâtel
Topics recommended for the 2016-2017 Work Programme: 

The topics of interests for the LEADS project and its members can be classified along two main lines: research on novel cloud service models and research on novel cloud infrastructures. Research on cloud services should address the challenges and reluctances (e.g. as identified by ENISA) associated to the shift to cloud computing for companies. Cloud services should address concerns on security and privacy. This is a difficult problem especially when clouds support computations and applications logic and not only blind storage. Research on privacy-preserving processing is required and due to the difficulty and importance of the task, should be given specific funding, oriented at fundamental research. It seems too early to consider privacy-preservation and security in clouds only as part of innovation and integration projects. The potential of exploitation of cloud services offering such guarantees is very important. It should not be neglected. Research on novel cloud services should also consider the importance for companies to link existing infrastructures and software solutions, and to link existing data, to data and systems available in the cloud. Finally, research on cloud infrastructures should consider the opportunity of geographically distributed cloud services to integrate with the energy distribution network and with consumers and producers of data. This is complementary to making cloud elements energy-efficient, but has a higher potential for environmental footprint reduction.

Projects major results: 

The LEADS project works towards answering the demand of companies to exploit the wealth of public data available on today's Internet, to combine it with private company data and to apply business-specific processing on masses of historical and real-time data. Companies do not always have the will or capacity to exploit in-house the large computing facilities required for these operations. The LEADS answer is to build a shared Data-as-a-Service platform running on an innovative infrastructure formed by a collection of geographically distributed micro-clouds. The LEADS project has contributed new techniques and software integrations for crawling, storing, querying and analysing in real-time publicly available Web data, and for operating the service on top of the complex, hierarchical and dynamic infrastructure formed by the micro-clouds. Specificities of the infrastructure characteristics led to the addition of multi-site and hybrid fault and consistency models to the Infinispan open source data grid, partitioning techniques for the ZooKeeper coordination service, a fully distributed and locality-aware version of the Apache Nutch Web crawler, and query and data placement schedulers that take into account the nature and cost associated to each micro-clouds. A query language supporting both declarative statements and real-time/streaming operations, with the ability to register specific business operations on data as part of (real-time) queries and interfaces to defined queries (based on Apatar) or to exploit results visually, were also developed and are currently integrated. Finally, the project features an example business case and application provided by adidas that combines the use of all components of the project and drives the evaluation of the platform over a collection of deployed micro-clouds.

Potential exploitation strategy: 

The Data-as-a-Service (DaaS) model itself has potential for long-term exploitation by being provided either by a collaboration between client companies or by a specific third-party commercial provider. The opportunity to establish such a provider company or to bootstrap new projects towards the exploitation of the DaaS principle are still being discussed in the project. Individual components of the platform have a high potential for exploitation, or are already exploited. Many of the project outcomes are open source. The storage layer, which also supports the base querying capabilities is based on, and extends Infinispan, an open source product and the base for the JBoss data grid technology. Several additions from the project are already exploited in this context, and further additions towards, e.g., the interaction with the Apache Hadoop software stack, are immediate exploitation assets. System support for decentralised cloud infrastructures have potential for exploitation for energy-aware and local cloud offerings who are blossoming thanks to high-speed networks and environmental considerations.

An update since the last Concertation meeting (March 2014): 

The LEADS project is seeking to establish collaboration activities with the CloudSpaces FP7 project (http://cloudspaces.eu/) both technically and by means of joint events. CloudSpaces investigate innovative solutions for personal clouds and share technical challenges with LEADS in terms of service partitioning, elasticity and platform awareness. LEADS is also in contact with the BigFoot project (http://bigfootproject.eu/) in this context. LEADS organised the CloudDP 2014 workshop (http://clouddp14.unine.ch/) on its key scientific interests, data management and cloud infrastructures, at the EuroSys 2014 conference in Amsterdam in April 2014. The organisation of the workshop was a joint work with the LinkedDesign FP7 project (http://www.linkeddesign.eu/). LEADS will propose the organisation of the next edition of Cloud 2015 with the next edition of EuroSys, in April 2015 in Bordeaux. We would like to involve the HARNESS FP7 project (http://www.harness-project.eu/) who works on new generation cloud computing platforms.