ADMS 2021

ADMS 2021
Twelth International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures

Monday, August 16, 2021
In conjunction with VLDB 2021

Virtual Conference via Whoa and Zoom
(1.30-9.30 pm Copenhagen/7.30am-3.30pm EST/4.30am-12.30 pm PST)

Links

Overview

Topics of Interest

Important Dates

Submission Instructions

ADMS 2012

ADMS 2011

The objective of this one-day workshop is to investigate opportunities in accelerating analytics workloads and data management systems which include traditional OLTP, data warehousing/OLAP, HTAP, ETL, Streaming/Real-time Processing, Business Analytics (including machine learning and deep learning workloads), and Data Visualization, using modern processors (e.g., commodity and specialized Multi-core, Many-core, GPUs, and FPGAs), processing systems (e.g., hybrid, massively-distributed clusters, and cloud based distributed computing infrastructure), networking infrastructures (e.g., RDMA over InfiniBand), memory and storage systems (e.g., storage-class Memories like SSDs, active memories, NVRams, and Phase-change Memory), multi-core and distributed programming paradigms like CUDA/OpenCL, MPI/OpenMP, and MapReduce/Spark, and integration with data-science frameworks such as Sklearn, TensorFlow, or PyTorch. Exploratory topics such as DNA-based storage or quantum algorithms are also within the preview of the ADMS workshop.

The current data management scenario is characterized by the following trends: traditional OLTP and OLAP/data warehousing systems are being used for increasing complex workloads (e.g., Petabyte of data, complex queries under real-time constraints, etc.); applications are becoming far more distributed, often consisting of different data processing components; non-traditional domains such as bio-informatics, social networking, mobile computing, sensor applications, gaming are generating growing quantities of data of different types; economical and energy constraints are leading to greater consolidation and virtualization of resources; and analyzing vast quantities of complex data is becoming more important than traditional transactional processing.

At the same time, there have been tremendous improvements in the CPU and memory technologies. Newer processors are more capable in the compute and memory capabilities, are power-efficient, and are optimized for multiple application domains. Commodity systems are increasingly using multi-core processors with more than 6 cores per chip and enterprise-class systems are using processors with at least 32 cores per chip. Specialized multi-core processors such as the GPUs have brought the computational capabilities of supercomputers to cheaper commodity machines. On the storage front, FLASH-based solid state devices (SSDs) are becoming smaller in size, cheaper in price, and larger in capacity. Exotic technologies like Phase-change memory are on the near-term horizon and can be game-changers in the way data is stored and processed.

In spite of the trends, currently there is limited usage of these technologies in data management domain. Naive exploitation of multi-core processors or SSDs often leads to unbalanced systems. It is, therefore, important to evaluate applications in a holistic manner to ensure effective utilization of CPU and memory resources. This workshop aims to understand impact of modern hardware technologies on accelerating core components of data management workloads. Specifically, the workshop hopes to explore the interplay between overall system design, core algorithms, query optimization strategies, programming approaches, performance modelling and evaluation, etc., from the perspective of data management applications.

Topics of Interest

The suggested topics of interest include, but are not restricted to:

Hardware and System Issues in Domain-specific Accelerators

New Programming Methodologies for Data Management Problems on Modern Hardware

Query Processing for Hybrid Architectures

Large-scale I/O-intensive (Big Data) Applications

Parallelizing/Accelerating Machine Learning/Deep Learning Workloads

Autonomic Tuning for Data Management Workloads on Hybrid Architectures

Algorithms for Accelerating Multi-modal Multi-tiered Systems

Energy Efficient Software-Hardware Co-design for Data Management Workloads

Parallelizing non-traditional (e.g., graph mining) workloads

Algorithms and Performance Models for modern Storage Sub-systems

Exploitation of specialized ASICs

Novel Applications of Low-Power Processors and FPGAs

Exploitation of Transactional Memory for Database Workloads

Exploitation of Active Technologies (e.g., Active Memory, Active Storage, and Networking)

New Benchmarking Methodologies for Accelerated Workloads

Applications of HPC Techniques for Data Management Workloads

Acceleration in the Cloud Environments

Accelerating Data Science/Machine Learning Workloads

Exploratory topics such as DNA-storage, Quantum Technologies

Keynote Speakers

Jia Shi, Oracle
Jia Shi is Vice President of Exadata Development at Oracle. She leads a team of very talented software developers to create the features that enable Exadata to be the fastest, most scalable and most highly available platform for running all types of Oracle Database workloads! Her team has built many marquee features, such as Smart Flash Cache, Smart Flash Log, and Persistent Memory Data and Commit Accelerators. She holds MS and BS in Computer Science from Stanford University.
Under the Hood of an Exadata Transaction - How did we harness the power of Persistent Memory?
Persistent memory is a new silicon technology, adding a distinct storage tier of performance, capacity, and price between DRAM and Flash. The persistent memory is physically present on the memory bus of the storage server resulting in reads at memory speed, much faster than flash. Writes are persistent, surviving power cycles, unlike DRAM. Oracle has engineered Exadata Smart PMEM Cache and Exadata Smart PMEM Log capabilities with Intel Optane Persistent Memory to achieve this significant boost in Oracle Database OLTP performance. Jia Shi, VP of Exadata Development, will describe the engineering details of their implementation in Exadata and how Oracle's software development teams are innovating with this technology.

Nikolay Sakharnykh, Nvidia
Nikolay Sakharnykh is a senior AI developer technology manager at NVIDIA. He leads a team working on accelerating data analytics and machine learning applications on GPUs. He is interested in novel memory management techniques.

Fast Data Compression on GPU and DPU
Database applications manage large amounts of data. Often the interconnect between the processors and storage systems is the main bottleneck. Lossless data compression is one of the most common approaches to alleviate data transfer bottleneck. However, many compression methods were designed decades ago and can’t be easily parallelized on modern massively-parallel architectures like GPUs. In this keynote we review both well-known and novel compression techniques. We present efficient parallelization strategies and deep dive into performance analysis on modern GPUs. We will also introduce DPU compression engines, cover example usage scenarios and compare GPU- and DPU-based compression solutions.

Michael Gschwind, Facebook
Michael Gschwind leads Accelerated Content Understanding at Facebook AI. He has previously held leadership positions at Huawei and IBM leading hardware and software development of AI and general-purpose systems. He is a founding member of the MLPerf benchmarking consortium, an IEEE Fellow the author of over 100 technical papers and holds over 800 patents in the field.

Inference @ Scale: Accelerating AI models for over a billion users
AI models are at the foundation of Facebook communities, identifying relevant and interesting content our users delight at interacting with, translating content to transcend barriers, and keeping our communities safe by identifying inappropriate content, such as bullying, domestic violence and terrorism in images, videos, and text. As we are looking for ever higher quality, larger scale models to deliver on our mission to connect users and build safe communities, AI accelerators provide the foundation for scaling up quality, while keeping power consumption manageable and sustainable, and delivering on our sustainability commitments. Open benchmarks such as MLPerf play an important role in this new application space to help foster innovation by creating a level playing field and common reference point for novel solutions.

Workshop Schedule
1.30-9.30pm CET/Copenhangen
7.30am-3.30pm EST/New York
4.30am-12.30pm PST/San Francisco

Session 1 (1.30-3.30pm CET, 7.30-9.30am EST, 4.30-6.30am PST)

Delayed Parity Update for Bridging the Gap between Replication and Erasure Coding in Server-based Storage, Takayuki Fukatani, Hieu Hanh Le and Haruo Yokota, Tokyo Institute of Technology (Paper, Presentation)

Extending In-Memory OLTP with Persistent Memory, Hillel Avni, Nir Pachter, Aharon Avitzur and Vladi Vexler, Huawei (Paper, Presentation)

One Buffer Manager to Rule Them All: Using Distributed Memory with Cache Coherence over RDMA, Magdalena Pröbstl, Philipp Fent, Maximilian Schüle, Moritz Sichert, Thomas Neumann and Alfons Kemper, Technical University of Munich (Paper, Presentation)

Evaluating Lightweight Integer Compression Algorithms in Column-Oriented In-Memory DBMS, Linus Heinzl, Ben Hurdelhey, Martin Boissier, Michael Perscheid and Hasso Plattner, Hasso Plattner Institute (Paper, Presentation)

Break (3.30-4pm CET, 9.30-10.0am EST, 6.30-7am EST)

Session 2 (4-5.30pm CET, 10-11.30am EST, 7-8.30am PST)

OneJoin: Cross-Architecture, Scalable Edit SimilarityJoin for DNA Data Storage Using oneAPI, Eugenio Marinelli and Raja Appuswamy, EURECOM (Paper, Presentation)

(Keynote 1): Fast Data Compression on GPU and DPU, Nikolay Sakharnykh, Nvidia (Presentation)

Break (5.30-6pm CET, 11.30am -12pm EST, 8.30-9am PST)

Session 3 (6-7.30pm CET, 12-1.30pm EST, 9-10.30am PST)

Highlighting the Performance Diversity of Analytical Queries using VOILA, Tim Gubner and Peter Boncz, CWI (Paper, Presentation)

(Keynote 2): Under the Hood of an Exadata Transaction - How did we harness the power of Persistent Memory?, Jia Shi, Oracle (Presentation)

Break (7.30-8pm CET, 1.30-2pm EST, 10.30-11am PST)

Session 4 (8-9.30pm CET, 2-3.30pm EST, 11am-12.30pm PST)

Scaling Joins to a Thousand GPUs, Hao Gao and Nikolay Sakharnykh, Nvidia (Paper, Presentation)

(Keynote 3): Inference @ Scale: Accelerating AI models for over a billion users, Michael Gschwind, Facebook (Presentation)

Organization

Workshop Co-Chairs

Rajesh Bordawekar, IBM T.J. Watson Research Center

Tirthankar Lahiri, Oracle

For questions regarding the workshop please send email to contact@adms-conf.org.
Program Committee

Bulent Abali, IBM Research

Meena Arunachalam, Intel

Spyros Blanas, The Ohio State University

Periklis Chrysogelos, EPFL

Yoav Etsion, Technion - Israel Institute of Technology

Michael Gschwind, Facebook

Niclas Hedam, ITU Denmark

Nuwen Jayasena, AMD Research

Anuva Kulkarni, Motional

Qiong Luo, HKUST

Danica Porobic, Oracle

Suprio Ray, University of New Brunswick

Eva Sitaridi, AWS

Nikolay Sakharnykh, Nvidia

Berni Schiefer, Snowflake

Sayantan Sur, Mellanox/Nvidia

Important Dates

Paper Submission: Monday, 28 June, 2021, 9 am EST

Notification of Acceptance: Friday, 16 July, 2021

Camera-ready Submission: Friday, 24 July, 2021

Workshop Date: Monday, 16 August, 2021

Submission Instructions

Submission Site

All submissions will be handled electronically via EasyChair.

Formatting Guidelines

We will use the same document templates as the VLDB21 conference. You can find them here.
It is the authors' responsibility to ensure that their submissions adhere strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.

As per the VLDB submission guidelines, the paper length for a full paper is limited to 12 pages, excluding bibliography. However, shorter papers (at least 6 pages of content) are encouraged as well.