The objective of this one-day workshop is to investigate
opportunities in accelerating analytics workloads and data
management systems which include
traditional OLTP, data warehousing/OLAP, HTAP, ETL, Streaming/Real-time Processing,
Business Analytics (including machine learning and deep learning workloads), and Data Visualization, using modern processors (e.g., commodity and specialized
Multi-core, Many-core, GPUs, and FPGAs), processing systems (e.g., hybrid,
massively-distributed clusters, and cloud based distributed computing
infrastructure), networking infrastructures (e.g., RDMA over
InfiniBand), memory and storage systems (e.g., storage-class Memories
like SSDs, active memories, NVRams, and
Phase-change Memory), multi-core and
distributed programming paradigms like CUDA/OpenCL, MPI/OpenMP, and MapReduce/Spark, and integration with data-science frameworks such as
Sklearn, TensorFlow, or PyTorch. Exploratory topics such as DNA-based storage or quantum
algorithms are also within the preview of the ADMS workshop.
The current data management scenario is characterized by the following trends: traditional OLTP and
OLAP/data warehousing systems are being used for increasing complex workloads (e.g., Petabyte of data,
complex queries under real-time constraints, etc.); applications are becoming far more distributed, often
consisting of different data processing components; non-traditional domains such as bio-informatics, social
networking, mobile computing, sensor applications, gaming are generating growing quantities of data of
different types; economical and energy constraints are leading to greater consolidation and virtualization
of resources; and analyzing vast quantities of complex data is becoming more important than traditional
transactional processing.
At the same time, there have been tremendous improvements in the CPU
and memory technologies. Newer processors are more capable in the
compute and memory capabilities, are power-efficient, and are optimized for multiple application
domains. Commodity systems are increasingly using
multi-core processors with more than 6 cores per chip and
enterprise-class systems are using processors with at least 32 cores per
chip. Specialized multi-core processors such as the GPUs have
brought the computational capabilities of supercomputers to cheaper
commodity machines. On the storage front, FLASH-based solid state
devices (SSDs) are becoming smaller in size, cheaper in price, and larger in
capacity. Exotic technologies like Phase-change memory are on the
near-term horizon and can be game-changers in the way data is stored
and processed.
In spite of the trends, currently there is limited usage of
these technologies in data management domain. Naive exploitation of
multi-core processors or SSDs often leads to unbalanced systems. It
is, therefore, important to evaluate applications in a holistic manner
to ensure effective utilization of CPU and memory
resources. This workshop aims to understand impact of modern
hardware technologies on accelerating core components of data
management workloads. Specifically, the workshop hopes to explore
the interplay between overall system design, core algorithms, query optimization
strategies, programming approaches, performance modelling and
evaluation, etc., from the perspective of data management applications.
The suggested topics of
interest include, but are not restricted to:
- Hardware and System Issues in Domain-specific Accelerators
- New Programming Methodologies for Data Management Problems on Modern Hardware
- Query Processing for Hybrid Architectures
- Large-scale I/O-intensive (Big Data) Applications
- Parallelizing/Accelerating Machine Learning/Deep Learning Workloads
- Autonomic Tuning for Data Management Workloads on Hybrid Architectures
- Algorithms for Accelerating Multi-modal Multi-tiered Systems
- Energy Efficient Software-Hardware Co-design for Data Management Workloads
- Parallelizing non-traditional (e.g., graph mining) workloads
- Algorithms and Performance Models for modern Storage Sub-systems
- Exploitation of specialized ASICs
- Novel Applications of Low-Power Processors and FPGAs
- Exploitation of Transactional Memory for Database Workloads
- Exploitation of Active Technologies (e.g., Active Memory, Active
Storage, and Networking)
- New Benchmarking Methodologies for Accelerated Workloads
- Applications of HPC Techniques for Data Management Workloads
- Acceleration in the Cloud Environments
- Accelerating Data Science/Machine Learning Workloads
- Exploratory topics such as DNA-storage, Quantum Technologies
-
Jia
Shi, Oracle
Jia Shi is Vice President of Exadata Development at Oracle. She leads a team of very talented software developers to create the
features that enable Exadata to be the fastest, most scalable and most highly available platform for running all types of Oracle Database
workloads! Her team has built many marquee features, such as Smart
Flash Cache, Smart Flash Log, and Persistent Memory Data and Commit
Accelerators. She holds MS and BS in Computer Science from Stanford University.
Under the Hood of an Exadata Transaction - How did we harness the power of Persistent Memory?
Persistent memory is a new silicon technology, adding a distinct storage tier of performance, capacity,
and price between DRAM and Flash. The persistent memory is physically present on the memory bus of the storage server
resulting in reads at memory speed, much faster than flash. Writes are persistent, surviving power cycles, unlike DRAM.
Oracle has engineered Exadata Smart PMEM Cache and Exadata Smart PMEM Log capabilities with Intel Optane Persistent Memory
to achieve this significant boost in Oracle Database OLTP performance. Jia Shi, VP of Exadata Development, will describe
the engineering details of their implementation in Exadata and how
Oracle's software development teams are innovating with this
technology.
- Nikolay
Sakharnykh,
Nvidia
Nikolay Sakharnykh is a senior AI developer technology manager at
NVIDIA. He leads a team working on accelerating data analytics and
machine learning applications on GPUs. He is interested in novel
memory management techniques.
Fast Data
Compression on
GPU and DPU
Database applications manage large amounts of data. Often the
interconnect between the processors and storage systems is the main
bottleneck. Lossless data compression is one of the most common
approaches to alleviate data transfer bottleneck. However, many
compression methods were designed decades ago and can’t be easily
parallelized on modern massively-parallel architectures like GPUs. In
this keynote we review both well-known and novel compression
techniques. We present efficient parallelization strategies and deep
dive into performance analysis on modern GPUs. We will also introduce
DPU compression engines, cover example usage scenarios and compare
GPU- and DPU-based compression solutions.
- Michael Gschwind,
Facebook
Michael Gschwind leads Accelerated Content Understanding at
Facebook AI. He has previously held leadership positions at Huawei
and IBM leading hardware and software development of AI and
general-purpose systems. He is a founding member of the MLPerf
benchmarking consortium, an IEEE Fellow the author of over 100
technical papers and holds over 800 patents in the field.
Inference @ Scale: Accelerating AI models for over a billion users
AI models are at the foundation of Facebook communities,
identifying relevant and interesting content our users delight at
interacting with, translating content to transcend barriers, and
keeping our communities safe by identifying inappropriate content,
such as bullying, domestic violence and terrorism in images, videos,
and text. As we are looking for ever higher quality, larger scale
models to deliver on our mission to connect users and build safe
communities, AI accelerators provide the foundation for scaling up
quality, while keeping power consumption manageable and sustainable,
and delivering on our sustainability commitments. Open benchmarks such
as MLPerf play an important role in this new application space to help
foster innovation by creating a level playing field and common
reference point for novel solutions.
Session 1 (1.30-3.30pm CET, 7.30-9.30am EST,
4.30-6.30am PST)
- Delayed Parity Update for Bridging the
Gap between Replication and Erasure Coding in Server-based
Storage, Takayuki Fukatani,
Hieu Hanh Le and Haruo Yokota, Tokyo Institute of Technology
(Paper, Presentation)
- Extending In-Memory OLTP with
Persistent Memory, Hillel
Avni, Nir Pachter, Aharon Avitzur and Vladi Vexler, Huawei
(Paper, Presentation)
- One Buffer Manager to Rule Them All:
Using Distributed Memory with Cache Coherence over RDMA,
Magdalena Pröbstl, Philipp
Fent, Maximilian Schüle, Moritz Sichert, Thomas Neumann and Alfons
Kemper, Technical University of Munich (Paper, Presentation)
- Evaluating Lightweight Integer
Compression Algorithms in Column-Oriented In-Memory
DBMS, Linus Heinzl, Ben
Hurdelhey, Martin Boissier, Michael Perscheid and Hasso Plattner, Hasso Plattner Institute
(Paper, Presentation)
Break (3.30-4pm CET,
9.30-10.0am EST, 6.30-7am EST)
Session 2 (4-5.30pm CET, 10-11.30am EST,
7-8.30am PST)
-
OneJoin: Cross-Architecture, Scalable Edit SimilarityJoin for DNA Data Storage Using oneAPI,
Eugenio Marinelli and Raja
Appuswamy, EURECOM (Paper, Presentation)
-
(Keynote 1): Fast Data Compression on GPU and DPU,
Nikolay Sakharnykh,
Nvidia (Presentation)
Break (5.30-6pm CET,
11.30am -12pm EST,
8.30-9am PST)
Session 3 (6-7.30pm CET, 12-1.30pm EST,
9-10.30am PST)
-
Highlighting the Performance Diversity of Analytical Queries using VOILA,
Tim Gubner and Peter Boncz,
CWI (Paper, Presentation)
-
(Keynote 2): Under the Hood of an Exadata Transaction - How did we harness the power of Persistent Memory?,
Jia Shi, Oracle
(Presentation)
Break (7.30-8pm CET, 1.30-2pm EST,
10.30-11am PST)
Session 4 (8-9.30pm CET, 2-3.30pm EST,
11am-12.30pm PST)
-
Scaling Joins to a Thousand GPUs,
Hao Gao and Nikolay
Sakharnykh, Nvidia
(Paper, Presentation)
-
(Keynote 3): Inference @ Scale: Accelerating AI models for over a billion users,
Michael Gschwind,
Facebook
(Presentation)
Workshop Co-Chairs
For questions regarding the
workshop please send email to contact@adms-conf.org.
Program Committee
- Bulent Abali, IBM Research
- Meena Arunachalam, Intel
- Spyros Blanas, The Ohio State University
- Periklis Chrysogelos, EPFL
- Yoav Etsion, Technion - Israel Institute of Technology
- Michael Gschwind, Facebook
- Niclas Hedam, ITU Denmark
- Nuwen Jayasena, AMD Research
- Anuva Kulkarni, Motional
- Qiong Luo, HKUST
- Danica Porobic, Oracle
- Suprio Ray, University of New Brunswick
- Eva Sitaridi, AWS
- Nikolay Sakharnykh, Nvidia
- Berni Schiefer, Snowflake
- Sayantan Sur, Mellanox/Nvidia
- Paper Submission: Monday, 28 June, 2021, 9 am EST
- Notification of Acceptance: Friday, 16 July, 2021
- Camera-ready Submission: Friday, 24 July, 2021
- Workshop Date: Monday, 16 August, 2021
Submission Site
All submissions will be handled electronically via EasyChair.
Formatting Guidelines
We will use the same document templates as the VLDB21 conference. You can find them here.
It is the authors' responsibility to ensure that
their submissions adhere
strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.
As per the VLDB submission guidelines, the paper length for a full paper is
limited to 12 pages, excluding
bibliography. However, shorter
papers (at least 6 pages of content) are encouraged as
well.
|