Stratosphere » About

Stratosphere Layers The Technische Universität Berlin, Humboldt Universität zu Berlin, and the Hasso-Plattner-Institut in Potsdam are jointly researching "Information Management on the Cloud" through the "Stratosphere" Collaborative Research Unit funded by the Deutsche Forschungsgemeinschaft (DFG).

Stratosphere aims at considerably advancing the state-of-art in data processing on parallel, adaptive architectures. Stratosphere (named after the layer of the atmosphere above the clouds) explores the power of massively parallel computing for complex information management applications. Building on the expertise of the participating researchers, we aim to develop a novel, database-inspired approach to analyze, aggregate, and query very large collections of either textual or (semi-)structured data on a virtualized, massively parallel cluster architecture.

Stratosphere conducts research in the areas of massively parallel data processing engines, a programming model for parallel data programming, robust optimization of declarative data flow programs, continuous re-optimization and adaptation of the execution, data cleansing, and text mining. The unit will validate its work through a benchmark of the overall system performance and by demonstrators in the areas of climate research, the biosciences and linked open data. The goal of Stratosphere is to jointly research and build a large-scale data processor based on concepts of robust and adaptive execution. We are researching the PACT programming model that extends the functional map/reduce programming model with additional second order functions. As execution platform we use the Nephele system, a massively parallel data flow engine which is also researched and developed in the project. We are examining real-world use-cases in the area of climate research, information extraction and integration of unstructured data in the life-sciences, as well as linked open data and social network graph data.

The project is carried out jointly by Prof. Volker Markl (TU Berlin, Database Systems and Information Management Group), who will act as speaker of the unit, as well as Prof. Odej Kao (TU Berlin, Distributed Systems Group), Prof. Johann-Christoph Freytag, (HU Berlin, Database and Information Systems Group), Prof. Ulf Leser (HU Berlin, Knowledge Management in Bioinformatics), and Prof. Felix Naumann (HPI Potsdam, Database and Information Systems Group).