ADMS 2014
Fifth International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures

 
Monday, September 1, 2014
 
In conjunction with VLDB 2014
Hangzhou, China
 
 
 
  Links
 
 
 
 
 
Workshop Overview

The objective of this one-day workshop is to investigate opportunities in accelerating data management systems and workloads (which include traditional OLTP, data warehousing/OLAP, ETL, Streaming/Real-time, Business Analytics, and XML/RDF Processing) using processors (e.g., commodity and specialized Multi-core, GPUs, FPGAs, and ASICs), storage systems (e.g., Storage-class Memories like SSDs and Phase-change Memory), and programming models like MapReduce, GraphLab, CUDA, OpenCL, and OpenACC.

The current data management scenario is characterized by the following trends: traditional OLTP and OLAP/data warehousing systems are being used for increasing complex workloads (e.g., Petabyte of data, complex queries under real-time constraints, etc.); applications are becoming far more distributed, often consisting of different data processing components; non-traditional domains such as bio-informatics, social networking, mobile computing, sensor applications, gaming are generating growing quantities of data of different types; economical and energy constraints are leading to greater consolidation and virtualization of resources; and analyzing vast quantities of complex data is becoming more important than traditional transactional processing.

At the same time, there have been tremendous improvements in the CPU and memory technologies. Newer processors are more capable in the CPU and memory capabilities and are optimized for multiple application domains. Commodity systems are increasingly using multi-core processors with more than 6 cores per chip and enterprise-class systems are using processors with 8 cores per chip, where each core can execute upto 4 simultaneous threads. Specialized multi-core processors such as the GPUs have brought the computational capabilities of supercomputers to cheaper commodity machines. On the storage front, FLASH-based solid state devices (SSDs) are becoming smaller in size, cheaper in price, and larger in capacity. Exotic technologies like Phase-change memory are on the near-term horizon and can be game-changers in the way data is stored and processed.

In spite of the trends, currently there is limited usage of these technologies in data management domain. Naive usage of multi-core processors or SSDs often leads to unbalanced system. It is therefore important to evaluate applications in a holistic manner to ensure effective utilization of CPU and memory resources. This workshop aims to understand impact of modern hardware technologies on accelerating core components of data management workloads. Specifically, the workshop hopes to explore the interplay between overall system design, core algorithms, query optimization strategies, programming approaches, performance modelling and evaluation, etc., from the perspective of data management applications.

Topics of Interest

The suggested topics of interest include, but are not restricted to:

  • Hardware and System Issues in Domain-specific Accelerators
  • New Programming Methodologies for Data Management Problems on Modern Hardware
  • Query Processing for Hybrid Architectures
  • Large-scale I/O-intensive (Big Data) Applications
  • Parallelizing/Accelerating Analytical (e.g., Data Mining) Workloads
  • Autonomic Tuning for Data Management Workloads on Hybrid Architectures
  • Algorithms for Accelerating Multi-modal Multi-tiered Systems
  • Energy Efficient Software-Hardware Co-design for Data Management Workloads
  • Parallelizing non-traditional (e.g., graph mining) workloads
  • Algorithms and Performance Models for modern Storage Sub-systems
  • Exploitation of specialized ASICs
  • Novel Applications of Low-Power Processors and FPGAs
  • Exploitation of Transactional Memory for Database Workloads
  • Exploitation of Active Technologies (e.g., Active Memory, Active Storage, and Networking)

Workshop Program

8.45 am- 5 pm, Dragon Hotel, Diamond 4

8.45 am: Welcome Comments

8.50-10.20 am: Keynote by Prof. Dhabaleswar K. (DK) Panda, The Ohio State University

Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects (slides)

Bio:  Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. He has published over 300 papers in major journals and international conferences. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, High-Speed Ethernet and RDMA over Converged Enhanced Ethernet (RoCE). The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X software libraries, developed by his research group (http://mvapich.cse.ohio-state.edu ), are currently being used by more than 2,150 organizations worldwide (in 72 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 210,000 downloads of this software have taken place from the project's website alone. This software package is also available with the software stacks of many network and server vendors, and Linux distributors. The new RDMA-enabled Apache Hadoop package, consisting of acceleration for HDFS, MapReduce and RPC, is publicly available from http://hadoop-rdma.cse.ohio-state.edu . Dr. Panda's research has been supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, Cray, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM.

Abstract: Managing and processing large volumes of data is a significant challenge being faced by the Big Data community. This has substantial impact on designing and utilizing modern data management and processing systems in multiple tiers, from the front-end data accessing and serving to the back-end data analytics. This scenario has led to many emerging Big Data middleware systems to emerge, such as Memcached, HBase, Hadoop, and Spark. The design and deployment of modern clusters during the last decade has largely been fueled by the following three factors: 1) advances in multi-core/many-core technologies and accelerators, 2) Remote Direct Memory Access (RDMA)-enabled networking (InfiniBand and RoCE), and 3) Solid State Drives (SSDs). However, current Big Data middleware and the associated applications are not able to fully take advantage of these advanced features on modern clusters. This talk will examine opportunities and challenges in accelerating performance of Big Data middleware (including Memcached, HBase, Hadoop, and Spark) in different data management and processing tiers with the latest technologies available on modern clusters. An overview of the High-Performance Big Data project (http://hibd.cse.ohio-state.edu) will be presented. High-performance designs using RDMA to accelerate Memcached, HBase, Hadoop, and Spark frameworks on InfiniBand and RoCE clusters will be demonstrated. The presentation will also include initial results on optimizing performance of Memcached with SSD support. An overview of a set of benchmarks (OSU HiBD Benchmarks, OHB) to evaluate performance of different components in an isolated manner will be presented.


10.20-10.30 am: Break


Session 1: Compute Acceleration (10.30 am -12.15 pm)

  • Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs, Haicheng Wu (Georgia Institute of Technology), Daniel Zinn, Molham Aref, (LogicBox Inc.) and Sudhakar Yalamanchili (Georgia Institute of Technology) (slides)
  • Data Parallel Quadtree Indexing and Spatial Query Processing of Complex Polygon Data on GPUs, Jianting Zhang, Simin You, (City University of New York) and Le Gruenwald (The University of Oklahoma) (slides)
  • HASHI: An Application Specific Instruction Set Extension for Hashing, Oliver Arnold, Sebastian Haas, Gerhard Fettweis and Benjamin Schlegel, Thomas Kissinger, Tomas Karnagel, and Wolfgang Lehner (Technische Universität Dresden) (slides)
  • QTM: Modelling Query Execution with Tasks, Steffen Zeuch and Johann-Christoph Freytag (Humboldt Universität zu Berlin) (slides)

12.15-1.30 pm: Lunch


Session 2: Memory/Storage Acceleration (1:30-3.15 pm)

  • Flash-Conscious Cache Population for Enterprise Database Workloads, Hyojun Kim (IBM Research, Almaden), Ioannis Koltsidas, Nikolas Ioannou (IBM Research, Zurich), Sangeetha Seshadri, Paul Muech, Clement Dickey, and Lawrence Chiu (IBM Research, Almaden) (slides)
  • A Prolegomenon on OLTP Database Systems for Non-Volatile Memory, Justin Debrabant (Brown University), Joy Arulraj, Andrew Pavlo, (Carnegie Mellon University), Michael Stonebraker (MIT CSAIL), Stan Zdonik (Brown University) and Subramanya Dulloor (Intel Labs) (slides)
  • An Approach for Hybrid-Memory Scaling Columnar In-Memory Databases, Bernhard Höppner (SAP AG), Ahmadshah Waizy, (Fujitsu Technology Solutions GmbH) and Hannes Rauhe (SAP AG) (slides)
  • ERIS: A NUMA-Aware In-Memory Storage Engine for Analytical Workload, Thomas Kissinger, Tim Kiefer, Benjamin Schlegel, Dirk Habich, Daniel Molka, and Wolfgang Lehner (Technische Universität Dresden) (slides)

3.15-3.30 pm: Break


3.30-5 pm: Keynote by Tirthankar Lahiri, Oracle Corp.

Oracle's In-Memory Data Management Strategy: In-Memory in all Tiers, and for all Workloads (slides)

Bio:  Tirthankar Lahiri is the Vice President of Development at Oracle, and is responsible for the Data Technologies area for the Oracle Database (this area coves Data, Space, and Transaction management) as well as the Oracle TimesTen In-Memory Database. Tirthankar has 18 years of experience in the Database industry. He has worked extensively in a variety of Database Systems areas, for which he holds multiple patents: Manageability, Performance, Scalability, High Availability, Caching, Distributed Concurrency Control, In-Memory Data Management, etc. Tirthankar has a B.Tech in Computer Science from IIT, Kharagpur, and an MS in Electrical Engineering from Stanford University. He was in the PhD program at Stanford and his research areas included Multiprocessor Operating Systems and Semi-Structured Data.

Abstract: We describe Oracle's pragmatic two-pronged approach for delivering In-Memory Technology to both OLTP as well as Analytics usecases. On the one hand, Oracle TimesTen is a specialized, memory-resident relational database, designed for ultra-low response time. TimesTen typically runs within the application-tier as an embeddable database and is deployed by thousands of customers requiring low-latency database access. On the other-hand, the new Oracle 12c Database In-Memory option delivers general-purpose in-memory capabilities for the vast range of enterprise applications and massive database sizes supported by Oracle Database. Since the In-Memory option is built seamlessly into the Oracle Database engine, it is fully compatible with all of the functionality and high availability mechanisms of the Oracle Database, and can be used by applications without any changes whatsoever. The In-Memory option provides a unique dual-format row/column in-memory representation thus avoiding the tradeoffs inherent in single-format in-memory databases. Unlike traditional in-memory databases, the new In-Memory option does not limit the size of the database to the size of available DRAM: numerous optimizations spanning DRAM, Flash, Disk, as well as machines in a RAC cluster, allow databases of virtually unlimited size. We describe the new In-Memory option in the context of the full spectrum of Oracle's numerous storage optimizations, showing that in-memory data management is an important and natural evolutionary enhancement to the existing deep technology stack of the Oracle Database.


Important Dates

  • Paper Submission: Friday, June 27, 2014, 11.59 pm PST. (Submission is closed.)
  • Notification of Acceptance: Friday, July 18, 2014
  • Camera-ready Submission: Monday, July 28, 2014
  • Workshop Date: Monday, September 1, 2014

Submission Instructions

The workshop proceedings will be published by VLDB and indexed via DBLP.

Submission Site 

All submissions will be handled electronically via EasyChair.

Formatting Guidelines 

We will use the same document templates as the VLDB14 conference. You can find them here.

It is the authors' responsibility to ensure that their submissions adhere strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review. 

The paper length is limited to 12 pages. Submissions of lesser length are acceptable as long as they adhere to the VLDB format. 

Organization

Workshop Co-Chairs

       For questions regarding the workshop please send email to contact@adms-conf.org.

Program Committee

  • Gustavo Alonso, ETH Zurich
  • Nipun Agarwal, Oracle Labs
  • T. Araki, NEC
  • Sean Baxter, Nvidia
  • David Cunningham, Google
  • Christophe Dubach, University of Edinburgh
  • Blake Fitch, IBM Watson Research
  • Franz Faerber, SAP
  • Arun Jagatheesan, Samsung
  • Kajan Kanagaratnam, IBM Toronto
  • Alfons Kemper, TU Munich
  • Rajaram Krishnamurthy, IBM
  • Qiong Luo, HKUST
  • Stefan Manegold, CWI
  • C. Mohan, IBM Almaden Research
  • Nadathur Satish, Intel
  • Sayantan Sur, Intel
  • Xiaodong Zhang, Ohio State University