ADMS 2021
Twelth International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures

 
Monday, August 16, 2021
In conjunction with VLDB 2021

Virtual Conference via Whoa and Zoom
(1.30-9.30 pm Copenhagen/7.30am-3.30pm EST/4.30am-12.30 pm PST)

 
 
  Links
 
 
 
 
 
 
 
 
 
 
 
 
Workshop Overview

The objective of this one-day workshop is to investigate opportunities in accelerating analytics workloads and data management systems which include traditional OLTP, data warehousing/OLAP, HTAP, ETL, Streaming/Real-time Processing, Business Analytics (including machine learning and deep learning workloads), and Data Visualization, using modern processors (e.g., commodity and specialized Multi-core, Many-core, GPUs, and FPGAs), processing systems (e.g., hybrid, massively-distributed clusters, and cloud based distributed computing infrastructure), networking infrastructures (e.g., RDMA over InfiniBand), memory and storage systems (e.g., storage-class Memories like SSDs, active memories, NVRams, and Phase-change Memory), multi-core and distributed programming paradigms like CUDA/OpenCL, MPI/OpenMP, and MapReduce/Spark, and integration with data-science frameworks such as Sklearn, TensorFlow, or PyTorch. Exploratory topics such as DNA-based storage or quantum algorithms are also within the preview of the ADMS workshop.

The current data management scenario is characterized by the following trends: traditional OLTP and OLAP/data warehousing systems are being used for increasing complex workloads (e.g., Petabyte of data, complex queries under real-time constraints, etc.); applications are becoming far more distributed, often consisting of different data processing components; non-traditional domains such as bio-informatics, social networking, mobile computing, sensor applications, gaming are generating growing quantities of data of different types; economical and energy constraints are leading to greater consolidation and virtualization of resources; and analyzing vast quantities of complex data is becoming more important than traditional transactional processing.

At the same time, there have been tremendous improvements in the CPU and memory technologies. Newer processors are more capable in the compute and memory capabilities, are power-efficient, and are optimized for multiple application domains. Commodity systems are increasingly using multi-core processors with more than 6 cores per chip and enterprise-class systems are using processors with at least 32 cores per chip. Specialized multi-core processors such as the GPUs have brought the computational capabilities of supercomputers to cheaper commodity machines. On the storage front, FLASH-based solid state devices (SSDs) are becoming smaller in size, cheaper in price, and larger in capacity. Exotic technologies like Phase-change memory are on the near-term horizon and can be game-changers in the way data is stored and processed.

In spite of the trends, currently there is limited usage of these technologies in data management domain. Naive exploitation of multi-core processors or SSDs often leads to unbalanced systems. It is, therefore, important to evaluate applications in a holistic manner to ensure effective utilization of CPU and memory resources. This workshop aims to understand impact of modern hardware technologies on accelerating core components of data management workloads. Specifically, the workshop hopes to explore the interplay between overall system design, core algorithms, query optimization strategies, programming approaches, performance modelling and evaluation, etc., from the perspective of data management applications.

Topics of Interest

The suggested topics of interest include, but are not restricted to:

  • Hardware and System Issues in Domain-specific Accelerators
  • New Programming Methodologies for Data Management Problems on Modern Hardware
  • Query Processing for Hybrid Architectures
  • Large-scale I/O-intensive (Big Data) Applications
  • Parallelizing/Accelerating Machine Learning/Deep Learning Workloads
  • Autonomic Tuning for Data Management Workloads on Hybrid Architectures
  • Algorithms for Accelerating Multi-modal Multi-tiered Systems
  • Energy Efficient Software-Hardware Co-design for Data Management Workloads
  • Parallelizing non-traditional (e.g., graph mining) workloads
  • Algorithms and Performance Models for modern Storage Sub-systems
  • Exploitation of specialized ASICs
  • Novel Applications of Low-Power Processors and FPGAs
  • Exploitation of Transactional Memory for Database Workloads
  • Exploitation of Active Technologies (e.g., Active Memory, Active Storage, and Networking)
  • New Benchmarking Methodologies for Accelerated Workloads
  • Applications of HPC Techniques for Data Management Workloads
  • Acceleration in the Cloud Environments
  • Accelerating Data Science/Machine Learning Workloads
  • Exploratory topics such as DNA-storage, Quantum Technologies

Keynote Speakers

  • Jia Shi, Oracle

    Jia Shi is Vice President of Exadata Development at Oracle. She leads a team of very talented software developers to create the features that enable Exadata to be the fastest, most scalable and most highly available platform for running all types of Oracle Database workloads! Her team has built many marquee features, such as Smart Flash Cache, Smart Flash Log, and Persistent Memory Data and Commit Accelerators. She holds MS and BS in Computer Science from Stanford University.

    Under the Hood of an Exadata Transaction - How did we harness the power of Persistent Memory?

    Persistent memory is a new silicon technology, adding a distinct storage tier of performance, capacity, and price between DRAM and Flash. The persistent memory is physically present on the memory bus of the storage server resulting in reads at memory speed, much faster than flash. Writes are persistent, surviving power cycles, unlike DRAM. Oracle has engineered Exadata Smart PMEM Cache and Exadata Smart PMEM Log capabilities with Intel Optane Persistent Memory to achieve this significant boost in Oracle Database OLTP performance. Jia Shi, VP of Exadata Development, will describe the engineering details of their implementation in Exadata and how Oracle's software development teams are innovating with this technology.

  • Nikolay Sakharnykh, Nvidia

    Nikolay Sakharnykh is a senior AI developer technology manager at NVIDIA. He leads a team working on accelerating data analytics and machine learning applications on GPUs. He is interested in novel memory management techniques.

    Fast Data Compression on GPU and DPU

    Database applications manage large amounts of data. Often the interconnect between the processors and storage systems is the main bottleneck. Lossless data compression is one of the most common approaches to alleviate data transfer bottleneck. However, many compression methods were designed decades ago and can’t be easily parallelized on modern massively-parallel architectures like GPUs. In this keynote we review both well-known and novel compression techniques. We present efficient parallelization strategies and deep dive into performance analysis on modern GPUs. We will also introduce DPU compression engines, cover example usage scenarios and compare GPU- and DPU-based compression solutions.

  • Michael Gschwind, Facebook

    Michael Gschwind leads Accelerated Content Understanding at Facebook AI. He has previously held leadership positions at Huawei and IBM leading hardware and software development of AI and general-purpose systems. He is a founding member of the MLPerf benchmarking consortium, an IEEE Fellow the author of over 100 technical papers and holds over 800 patents in the field.

    Inference @ Scale: Accelerating AI models for over a billion users

    AI models are at the foundation of Facebook communities, identifying relevant and interesting content our users delight at interacting with, translating content to transcend barriers, and keeping our communities safe by identifying inappropriate content, such as bullying, domestic violence and terrorism in images, videos, and text. As we are looking for ever higher quality, larger scale models to deliver on our mission to connect users and build safe communities, AI accelerators provide the foundation for scaling up quality, while keeping power consumption manageable and sustainable, and delivering on our sustainability commitments. Open benchmarks such as MLPerf play an important role in this new application space to help foster innovation by creating a level playing field and common reference point for novel solutions.

Workshop Schedule
1.30-9.30pm CET/Copenhangen
7.30am-3.30pm EST/New York
4.30am-12.30pm PST/San Francisco


Session 1 (1.30-3.30pm CET, 7.30-9.30am EST, 4.30-6.30am PST)

  • Delayed Parity Update for Bridging the Gap between Replication and Erasure Coding in Server-based Storage, Takayuki Fukatani, Hieu Hanh Le and Haruo Yokota, Tokyo Institute of Technology (Paper, Presentation)

  • Extending In-Memory OLTP with Persistent Memory, Hillel Avni, Nir Pachter, Aharon Avitzur and Vladi Vexler, Huawei (Paper, Presentation)

  • One Buffer Manager to Rule Them All: Using Distributed Memory with Cache Coherence over RDMA, Magdalena Pröbstl, Philipp Fent, Maximilian Schüle, Moritz Sichert, Thomas Neumann and Alfons Kemper, Technical University of Munich (Paper, Presentation)

  • Evaluating Lightweight Integer Compression Algorithms in Column-Oriented In-Memory DBMS, Linus Heinzl, Ben Hurdelhey, Martin Boissier, Michael Perscheid and Hasso Plattner, Hasso Plattner Institute (Paper, Presentation)


Break (3.30-4pm CET, 9.30-10.0am EST, 6.30-7am EST)


Session 2 (4-5.30pm CET, 10-11.30am EST, 7-8.30am PST)

  • OneJoin: Cross-Architecture, Scalable Edit SimilarityJoin for DNA Data Storage Using oneAPI, Eugenio Marinelli and Raja Appuswamy, EURECOM (Paper, Presentation)
  • (Keynote 1): Fast Data Compression on GPU and DPU, Nikolay Sakharnykh, Nvidia (Presentation)


Break (5.30-6pm CET, 11.30am -12pm EST, 8.30-9am PST)


Session 3 (6-7.30pm CET, 12-1.30pm EST, 9-10.30am PST)

  • Highlighting the Performance Diversity of Analytical Queries using VOILA, Tim Gubner and Peter Boncz, CWI (Paper, Presentation)
  • (Keynote 2): Under the Hood of an Exadata Transaction - How did we harness the power of Persistent Memory?, Jia Shi, Oracle (Presentation)


Break (7.30-8pm CET, 1.30-2pm EST, 10.30-11am PST)


Session 4 (8-9.30pm CET, 2-3.30pm EST, 11am-12.30pm PST)

  • Scaling Joins to a Thousand GPUs, Hao Gao and Nikolay Sakharnykh, Nvidia (Paper, Presentation)
  • (Keynote 3): Inference @ Scale: Accelerating AI models for over a billion users, Michael Gschwind, Facebook (Presentation)


Organization

Workshop Co-Chairs

       For questions regarding the workshop please send email to contact@adms-conf.org.

Program Committee

  • Bulent Abali, IBM Research
  • Meena Arunachalam, Intel
  • Spyros Blanas, The Ohio State University
  • Periklis Chrysogelos, EPFL
  • Yoav Etsion, Technion - Israel Institute of Technology
  • Michael Gschwind, Facebook
  • Niclas Hedam, ITU Denmark
  • Nuwen Jayasena, AMD Research
  • Anuva Kulkarni, Motional
  • Qiong Luo, HKUST
  • Danica Porobic, Oracle
  • Suprio Ray, University of New Brunswick
  • Eva Sitaridi, AWS
  • Nikolay Sakharnykh, Nvidia
  • Berni Schiefer, Snowflake
  • Sayantan Sur, Mellanox/Nvidia

Important Dates

  • Paper Submission: Monday, 28 June, 2021, 9 am EST
  • Notification of Acceptance: Friday, 16 July, 2021
  • Camera-ready Submission: Friday, 24 July, 2021
  • Workshop Date: Monday, 16 August, 2021

Submission Instructions

Submission Site 

All submissions will be handled electronically via EasyChair.

Formatting Guidelines 

We will use the same document templates as the VLDB21 conference. You can find them here.

It is the authors' responsibility to ensure that their submissions adhere strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review. 

As per the VLDB submission guidelines, the paper length for a full paper is limited to 12 pages, excluding bibliography. However, shorter papers (at least 6 pages of content) are encouraged as well.