ADMS 2013
Fourth International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures

 
Monday, August 26, 2013
 
In conjunction with VLDB 2013
Riva del Garda, Trento, Italy
 
 
 
  Links
 
 
 
 
Workshop Overview

The objective of this one-day workshop is to investigate opportunities in accelerating data management systems and workloads (which include traditional OLTP, data warehousing/OLAP, ETL, Streaming/Real-time, Business Analytics, and XML/RDF Processing) using processors (e.g., commodity and specialized Multi-core, GPUs, and FPGAs), storage systems (e.g., Storage-class Memories like SSDs and Phase-change Memory), and hybrid programming models like CUDA, OpenCL, and OpenACC.

The current data management scenario is characterized by the following trends: traditional OLTP and OLAP/data warehousing systems are being used for increasing complex workloads (e.g., Petabyte of data, complex queries under real-time constraints, etc.); applications are becoming far more distributed, often consisting of different data processing components; non-traditional domains such as bio-informatics, social networking, mobile computing, sensor applications, gaming are generating growing quantities of data of different types; economical and energy constraints are leading to greater consolidation and virtualization of resources; and analyzing vast quantities of complex data is becoming more important than traditional transactional processing.

At the same time, there have been tremendous improvements in the CPU and memory technologies. Newer processors are more capable in the CPU and memory capabilities and are optimized for multiple application domains. Commodity systems are increasingly using multi-core processors with more than 4 cores per chip and enterprise-class systems are using processors with 8 cores per chip, where each core can execute upto 4 simultaneous threads. Specialized multi-core processors such as the GPUs have brought the computational capabilities of supercomputers to cheaper commodity machines. On the storage front, FLASH-based solid state devices (SSDs) are becoming smaller in size, cheaper in price, and larger in capacity. Exotic technologies like Phase-change memory are on the near-term horizon and can be game-changers in the way data is stored and processed.

In spite of the trends, currently there is limited usage of these technologies in data management domain. Naive usage of multi-core processors or SSDs often leads to unbalanced system. It is therefore important to evaluate applications in a holistic manner to ensure effective utilization of CPU and memory resources. This workshop aims to understand impact of modern hardware technologies on accelerating core components of data management workloads. Specifically, the workshop hopes to explore the interplay between overall system design, core algorithms, query optimization strategies, programming approaches, performance modelling and evaluation, etc., from the perspective of data management applications.

Topics of Interest

The suggested topics of interest include, but are not restricted to:

  • Hardware and System Issues in Domain-specific Accelerators
  • New Programming Methodologies for Data Management Problems on Modern Hardware
  • Query Processing for Hybrid Architectures
  • Large-scale I/O-intensive (Big Data) Applications
  • Parallelizing/Accelerating Analytical (e.g., Data Mining) Workloads
  • Autonomic Tuning for Data Management Workloads on Hybrid Architectures
  • Algorithms for Accelerating Multi-modal Multi-tiered Systems
  • Energy Efficient Software-Hardware Co-design for Data Management Workloads
  • Parallelizing non-traditional (e.g., graph mining) workloads
  • Algorithms and Performance Models for modern Storage Sub-systems
  • Data Layout Issues for Modern Memory and Storage Hierarchies
  • Novel Applications of Low-Power Processors (e.g., ARM Processor based systems)
  • New Benchmarking Methodologies for Storage-class Memories


Keynote Presentations

Hadoop: Past, Present, and (possibly) Future

Milind Bhandarkar, Chief Scientist, Machine Learning Platforms, Pivotal Inc. (Slides)

Bio:  Milind Bhandarkar was the founding member of the team at Yahoo! that took Apache Hadoop from 20-node prototype to datacenter-scale production system, and has been contributing and working with Hadoop since version 0.1.0. He started the Yahoo! Grid solutions team focused on training, consulting, and supporting hundreds of new migrants to Hadoop. Parallel programming languages and paradigms has been his area of focus for over 20 years, and his area of specialization for PhD (Computer Science) from University of Illinois at Urbana-Champaign. He worked at the Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo! and Linkedin. Currently, he is the Chief Scientist, Machine Learning Platforms, at Pivotal Inc.

Active Storage: Exploring a Scalable, Compute-In-Storage model by extending the Blue Gene/Q architecture with Integrated Non-volatile Memory

Blake Fitch, Senior Technical Staff Member, IBM T. J. Watson Research Center (Slides)

Bio:  Blake G. Fitch first joined the IBM T.J. Watson Research Center in 1985 as a student. He received his bachelor's degree in Computer Science from Antioch College in 1987. From 1987 until present, he has remained with IBM to pursue interests in distributed and parallel systems. In 1990 he joined the Scalable Parallel Systems group, contributing to the research and development that culminated in the IBM scalable parallel system (SP) product. Since then, his research interests have focused on application frameworks and programming models suitable for production parallel computing environments. Practical application of this work includes contributions to the transputer based control system for IBM's CMOS S/390 mainframes (IBM Boeblingen, Germany 1994) and the architecture of IBM's Automatic Fingerprint Identification System parallel application (IBM Hursley, UK, 1996). In 1999, he joined the Blue Gene Project as the application architect for BlueMatter, a scalable classical molecular dynamics package. Mr. Fitch is currently a Senior Technical Staff Member at IBM Research and is the architect and technical lead for the Active Storage project. The Active Storage project aims to integrate non-volatile memory into highly scalable parallel system architectures(currently IBM Blue Gene/Q) and to explore system software and applications that leverage the new capabilities of such systems.

Workshop Program

9 am-5.30 pm, Meeting Room 300

8.50 am: Welcome Comments

9-10.30 am: Keynote by Milind Bhandarkar, Chief Scientist, Machine Learning Platforms, Pivotal Inc.

Hadoop: Past, Present, and (possibly) Future (Slides)

Apache Hadoop has rapidly become the de facto data processing platform, and is often mentioned synonymously with "Big Data". Hadoop started as a project within Apache Lucene and Nutch to scale the content backend for web search engine. However, it is currently being used in majority of Fortune 500 companies, in many other application domains, such as fraud detection at credit card companies, healthcare analytics, churn detection and prevention at Telecom companies etc. In this talk, we will reminisce about the early days of Hadoop at Yahoo, and lessons learned in scaling this platform from a 20-node prototype to a datacenter-wide production deployment. We will give an overview of the current state of Hadoop ecosystem, and present some prominent patterns and use cases of this platform. We will also discuss how Hadoop is evolving, and its future as a platform for "Big Data" processing.

10.30-11 am: Coffee Break

11 am-12.30 pm noon Session 1: Compute Optimizations

12.30-2.00 pm: Lunch

2.00-3.30 pm Session 2: Memory/Storage Optimizations

3.30-4.00 pm: Coffee Break

4.00-5.30 pm: Keynote by Blake Fitch, Senior Technical Staff Member, IBM T. J. Watson Research Center

Active Storage: Exploring a Scalable, Compute-In-Storage model by extending the Blue Gene/Q architecture with Integrated Non-volatile Memory (Slides)

Emerging storage class memories offer a set of challenges and opportunities in system architecture, programming models, and application design. We are exploring the close integration of emerging solid-state storage technologies in conjunction with high performance networks and integrated processing capability. Specifically, we consider the extension of the Blue Gene/Q architecture by integrating Flash into the node to enable a scalable, data-centric computing platform. We are using BG/Q as a rapid prototyping platform allowing us to build a research system based on an infrastructure with proven scalability to thousands of nodes. Our work also involves enabling a Linux environment with standard network interfaces on the BG/Q hardware. We plan to explore applications of this system architecture including existing file systems and middleware as well as more aggressive compute-in-storage approaches. Compute-in-storage is intended to enable the use of high performance (HPC) programming techniques (MPI) to implement data-centric algorithms (e.g. sort, join, graph) that execute on processing elements embedded within a storage system. This presentation will review the architectural extension to BG/Q, present a progress report on the project, and describe some early results.

Important Dates

  • Paper Submission: Friday, June 28, 2013 (Updated)
  • Notification of Acceptance: Friday, July 19, 2013
  • Camera-ready Submission: Monday, July 29, 2013
  • Workshop Date: Monday, August 26, 2013

Submission Instructions

The workshop proceedings will be published by VLDB.

Submission Site 

All submissions will be handled electronically via EasyChair.

Formatting Guidelines 

We will use the same document templates as the VLDB13 conference. You can find them here.

It is the authors' responsibility to ensure that their submissions adhere strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review. 

The paper length is limited to 12 pages. Submissions of lesser length are acceptable as long as they adhere to the VLDB format. 

Organization

Workshop Co-Chairs

       For questions regarding the workshop please send email to contact@adms-conf.org.

Program Committee

  • Peter Baumann, Jacobs University
  • John Davis, Microsoft Research
  • Gregory Diamos, Nvidia
  • Christophe Dubach, University of Edinburgh
  • Frank Dehne, Carleton University
  • Maya Gokhale, Lawrence Livermore National Laboratory
  • Francesco Fusco, ETH Zurich
  • Tirthankar Lahiri, Oracle
  • Alfons Kemper, TU Munich
  • Rajaram Krishnamurthy, IBM
  • Stefan Manegold, CWI
  • C. Mohan, IBM Almaden Research
  • Nadathur Satish, Intel
  • Ji-Yong Shin, Cornell University
  • Sayantan Sur, Intel