BigFoot: Big Data Analytics of Digital Footprints
Further areas will be identified at the concertation meeting.
The Bigfoot project, launched in the last quarter of 2012, represents a 3-year engagement to design and implement an all-in-one, optimised and efficient solution to the storage and analysis of large volumes of data. Using existing technologies (e.g. Hadoop and Spark) and new (e.g. NoDB) open-source projects, Bigfoot targets automatic and self-tuned deployments of storage and processing engines, enriched by several components aiming at optimising operations and at an efficient resource utilisation (OpenStack). The current and expected impact is to heavily contribute with BigFoot components to the open-source community, both in the context of cloud computing (BigFoot is an active contributor of the Sahara project within OpenStack), and BigData (BigFoot contributes with modules that can be used to patch existing Hadoop deployments). From the scientific perspective, BigFoot is pushing novel architectures and mechanisms for an efficient utilisation of cloud resources, new storage engines and a new integrated system that supports both batch and interactive analytics.
BigFoot is central to the interests of the project's industrial partners: Symantec and GridPocket. BigFoot will accelerate data analysis in business units at Symantec: we cite Symantec Security Response, a team of experts providing 24/7 security data analysis, the Brightmail department analysing spam and the Deepsight team analysing real-time data collected by sensors distributed on the Internet. This will enable better protection for customers running Symantec's security software. GridPocket operates on SmartGrids, a rapidly growing business. It will use BigFoot's results to enhance its platform's scalability, and to design and deploy novel techniques for metering data analysis. Applications developed within BigFoot will help utility companies and other customers of GridPocket understand and possibly reduce their energy consumption, and to increase grid reliability through the use of failure detection, consumption forecasting and load analysis. Bigfoot's academic partners benefit by transferring knowledge via lectures and laboratory sessions, and industrial relations with partners that are interested in the project's outputs. Furthermore, BigFoot's openStack-based platform is already in use by several fellow researchers.
BigFoot intends to deliver an enhanced Hadoop ecosystem, with contributions that can be packaged together as a custom distribution of Hadoop, with additional components that are not present in current stable version, in addition to several improvements to existing components of the original software. Interoperability and compatibility is ensured by maintaining complete API compatibility with the vanilla version of Hadoop. BigFoot's contributions can also be exploited by users of other Hadoop distributions, since they are released as a set of patches to upstream software. These patches are continuously evolving, and they are released on the project's GitHub and BitBucket pages. Within the several components that have been recently updated, we highlight a module for OpenStack's Sahara which allows automated deployment of Spark clusters; OSMeF, a measurement framework for OpenStack; an implementation of decision trees for Spark; schedism, a simulation software deployed to test and evaluate scheduling techniques; and a patched version of Hadoop which enables more effective preemption within the system.