Demo Paper "Large-Scale Social-Media Analytics on Stratosphere" Accepted at WWW 2013

27 Mar 2013

Our demo submission
"Large-Scale Social-Media Analytics on Stratosphere"
by Christoph Boden, Marcel Karnstedt, Miriam Fernandez and Volker Markl
has been accepted for WWW 2013 in Rio de Janeiro, Brazil.

Visit our demo, and talk to us if you are attending WWW 2013.

Abstract:
The importance of social-media platforms and online communities - in business as well as public context - is more and more acknowledged and appreciated by industry and researchers alike. Consequently, a wide range of analytics has been proposed to understand, steer, and exploit the mechanics and laws driving their functionality and creating the resulting benefits. However, analysts usually face significant problems in scaling existing and novel approaches to match the data volume and size of modern online communities. In this work, we propose and demonstrate the usage of the massively parallel data prossesing system Stratosphere, based on second order functions as an extended notion of the MapReduce paradigm, to provide a new level of scalability to such social-media analytics. Based on the popular example of role analysis, we present and illustrate how this massively parallel approach can be leveraged to scale out complex data-mining tasks, while providing a programming approach that eases the formulation of complete analytical workflows.

ICDE 2013 Demo Preview

21 Nov 2012

This is a preview of our demo that will be presented at ICDE 2013 in Brisbane.
The demo shows how static code analysis can be leveraged to reordered UDF operators in data flow programs.

Detailed information can be found in our papers which are available on the publication page.

Stratosphere Demo Paper Accepted for BTW 2013

12 Nov 2012

Our demo submission
"Applying Stratosphere for Big Data Analytics"
has been accepted for BTW 2013 in Magdeburg, Germany.
The demo focuses on Stratosphere's query language Meteor, which has been presented in our paper "Meteor/Sopremo: An Extensible Query Language and Operator Model" [pdf] at the BigData workshop associated with VLDB 2012 in Istanbul.

Visit our demo, and talk to us if you are going to attend BTW 2013.

Abstract:
Analyzing big data sets as they occur in modern business and science applications requires query languages that allow for the specification of complex data processing tasks. Moreover, these ideally declarative query specifications have to be optimized, parallelized and scheduled for processing on massively parallel data processing platforms. This paper demonstrates the application of Stratosphere to different kinds of Big Data Analytics tasks. Using examples from different application domains, we show how to formulate analytical tasks as Meteor queries and execute them with Stratosphere. These examples include data cleansing and information extraction tasks, and a correlation analysis of microblogging and stock trade volume data that we describe in detail in this paper.

Stratosphere Demo Accepted for ICDE 2013

15 Oct 2012

Our demo submission
"Peeking into the Optimization of Data Flow Programs with MapReduce-style UDFs"
has been accepted for ICDE 2013 in Brisbane, Australia.
The demo illustrates the contributions of our VLDB 2012 paper "Opening the Black Boxes in Data Flow Optimization" [PDF] and [Poster PDF].

Visit our poster, enjoy the demo, and talk to us if you are going to attend ICDE 2013.

Abstract:
Data flows are a popular abstraction to define data-intensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude.
We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties.
We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and non-relational data flow programs which highlight the salient features of our approach.

Version 0.2 Released

21 Aug 2012

We are happy to announce that version 0.2 of the Stratosphere System has been released. It has a lot of performance improvements as well as a bunch of exciting new features like:

  • The new Sopremo Algebra Layer and the Meteor Scripting Language
  • The whole new tuple data model for the PACT API
  • Fault tolerance through local checkpoints
  • A ton of performance improvements on all layers
  • Support for plug-ins on the data flow channel layer
  • Many new library classes (for example new Input-/Output-Formats)

For a complete list of new features, check out the change log.