The objective of this one-day workshop is to investigate opportunities in
accelerating analytics workloads and data management systems. Over the years, the scope of database
analytics has changed substantially, beginning with traditional OLAP, data warehousing, ETL, to HTAP,
Streaming/Real-time Processing, Edge/IoT, and finally to machine learning and deep learning workloads
such as Generative AI or Vector/semantic databases. Increasing use of Large Language Models(LLMs) for
as a source for knowledge extraction for various end uses (e.g., in an AI assistant or Agentic system),
creates new opportunities for database systens. At the same time, hardware and software capabilities have
seen tremendous improvements. The workshop aims to explore how database analytics can be accelerated
using modern processors (e.g., commodity and specialized Multi-core, Many-core, chiplets, GPUs, and
FPGAs), processing systems (e.g., hybrid, massively-distributed clusters, and cloud based distributed
computing infrastructure), networking infrastructures (e.g., RDMA over InfiniBand), memory and storage
systems (e.g., storage-class Memories like SSDs, active memories, NVRAM, and Phase-change Memory),
multi-core and distributed programming paradigms like CUDA, MPI/OpenMP, and
MapReduce/Spark, and integration with data-science/deep-learning frameworks such as Sklearn,
TensorFlow, or PyTorch. Exploratory topics such as DNA-based storage or quantum algorithms are also
within the preview of the ADMS workshop. The intent of the ADMS workshop is to bring together people
from diverse fields such as computer architecture, high-performance computing, systems, and
programming languages to address key functionality and scalability problems in data management.
The current data management scenario is characterized by the following trends: traditional OLTP and
OLAP/data warehousing systems are being used for increasing complex workloads (e.g., integration of various AI technologies, Petabyte of data,
complex queries under real-time constraints, etc.); applications are becoming far more distributed, often
consisting of different data processing components; non-traditional domains such as bio-informatics, social
networking, mobile computing, sensor applications, gaming are generating growing quantities of data of
different types; economical and energy constraints are leading to greater consolidation and virtualization
of resources; and analyzing vast quantities of complex data is becoming more important than traditional
transactional processing.
At the same time, there have been tremendous improvements in the CPU
and memory technologies. Newer processors are more capable in the
compute and memory capabilities, are power-efficient, and are optimized for multiple application
domains. Commodity systems are increasingly using
multi-core processors with more than 6 cores per chip and
enterprise-class systems are using processors with at least 32 cores per
chip. Specialized multi-core processors such as the GPUs have brought the computational capabilities of supercomputers to cheaper
commodity machines. On the storage front, FLASH-based solid state
devices (SSDs) are becoming smaller in size, cheaper in price, and larger in
capacity. Exotic technologies like Phase-change memory are on the
near-term horizon and can be game-changers in the way data is stored
and processed.
In spite of the trends, currently there is limited usage of
these technologies in data management domain. Naive exploitation of
multi-core processors or SSDs often leads to unbalanced systems. It
is, therefore, important to evaluate applications in a holistic manner
to ensure effective utilization of CPU and memory
resources. This workshop aims to understand impact of modern
hardware technologies on accelerating core components of data
management workloads. Specifically, the workshop hopes to explore
the interplay between overall system design, core algorithms, query optimization
strategies, programming approaches, performance modelling and
evaluation, etc., from the perspective of data management applications.
The suggested topics of
interest include, but are not restricted to:
- Hardware and System Issues in Domain-specific Accelerators
- New Programming Methodologies for Data Management Problems on Modern Hardware
- Query Processing for Hybrid Architectures
- Large-scale I/O-intensive (Big Data) Applications
- Parallelizing/Accelerating Machine Learning/Deep Learning Workloads
- Accelerating training, inference, and storage of Large Language Models for Generative AI
- Autonomic Tuning for Data Management Workloads on Hybrid Architectures
- Algorithms for Accelerating Multi-modal Multi-tiered Systems
- Applications of GPUs and other data-parallel accelerators
- Energy Efficient Software-Hardware Co-design for Data Management Workloads
- Parallelizing non-traditional (e.g., graph mining) workloads
- Algorithms and Performance Models for modern Storage Sub-systems
- Exploitation of specialized ASICs
- Novel Applications of Low-Power Processors and FPGAs
- Exploitation of Transactional Memory for Database Workloads
- Exploitation of Active Technologies (e.g., Active Memory, Active
Storage, and Networking)
- New Benchmarking Methodologies for Accelerated Workloads
- Applications of HPC Techniques for Data Management Workloads
- Acceleration in the Cloud Environments
- Accelerating Data Science/Machine Learning Workloads
- Exploratory topics such as Generative AI, DNA-storage, Quantum Technologies
Title: Using what we know about our data: compact metadata and amortisation to exploit locality and sparsity Prof. Paul H J Kelly, Imperial College, London
Abstract: Parallelism is basically easy. Especially when the algorithms are dumb. But when we get smarter, and use more complicated data structures to support strategies that avoid redundant computation, things get messier. We need metadata that maps where we can find the non-zeroes, and where the repeated values are. This makes locality, parallelization and scheduling dependent on the actual data. This talk is about processing the metadata at runtime to determine how the computation itself should be done. The talk isn't about new research results; instead we map the problem space and some of the techniques, drawing on our diverse experience.
Title: On the role of storage and hardware acceleration in modern data management systems Vincent Hsu, IBM Storage, and Haris Pozidis, IBM Research.
Abstract: We are living in the era of data. Data is being generated and stored at unprecedented scales, estimated at 150 zettabytes in 2024, more than 90% of which is unstructured. This data can only be useful if it is used to generate insights, drive decisions and improve business processes. While this has been the case with structured data, using relational data management systems, such capabilities have been largely absent for unstructured data. With generative AI (GenAI) it is possible to process unstructured data, extract and encode (embed) semantic nibbles of information and organize these nibbles in so-called vector management systems (vector DBs), similarly to traditional relational databases. Analogous to SQL queries, vector DBs can be searched with natural language queries, using vector similarity search technology.
Data retrieval is a critical piece of information extraction and the driving mechanism of knowledge generation. By combining the efficiency and determinism of SQL search in relational DBs with the expressiveness of similarity search in vector DBs, one obtains a powerful combination, able to extract the most hidden of data insights. This is the first pillar of modern data management systems.
Another key trend in data management technology is the convergence of storage and data processing systems. Storage is where data live, access controls are maintained and where security can be easily enforced. This is also the most natural, cost-savvy, and energy efficient place to execute the entire data transformation pipeline. Hybrid relational and vector DBs can be embedded in the storage system for higher data security, data freshness and cost control. We argue that this is the second pillar of modern data management systems.
HW acceleration, offered by GPUs or other specialized accelerators, has been the enabler of the GenAI era, offering unprecedented processing capacity. In addition to its application in foundation model training and inference, HW acceleration has a pivotal role to play in data processing pipelines and hybrid search systems. GPUs and other accelerators can offer dramatic improvements in energy/performance efficiency and enable levels of scalability that are beyond the reach of traditional CPU-based computing. HW acceleration is the third pillar in the evolution of data management systems.
In this talk we will be discussing the above three main pillars of modern data management systems, as they are evolving to cater to the needs of GenAI and agentic applications and workflows.
- (8.35-9 am) High Throughput GPU-Accelerated FSST String Compression Tim Anema, Delft University of Technology, Joost Hoozemans, Voltron Data, Zaid Al-Ars, Delft University of Technology, and H. Peter Hofstee, IBM
- (9-9.25 am) GPU-Accelerated Stochastic Gradient Descent for Scalable Operator Placement in Geo-Distributed Streaming Systems Tristan Joel Terhaag, Technische Universität Berlin, Xenofon Chatziliadis, Technische Universität Berlin, Eleni Tzirita Zacharatou, Hasso Plattner Institute, University of Potsdam, and Volker Markl, Technische Universität Berlin
- (9.30-9.55 am) A Hot Take on the Intel Analytics Accelerator for Database Management Systems Christos Laspias, Andrew Pavlo and Jignesh Patel, Carnegie Mellon University
- (10-10.30 am) Morning Break
- (10.30-10.55 am) A Data Aggregation Visualization System supported by Processing-in-Memory Junyoung Kim, Madhulika Balakumar and Kenneth Ross, Columbia University
- (11 am-12 pm) Keynote 1: Using what we know about our data: compact metadata and amortisation to exploit locality and sparsity Prof. Paul H J Kelly, Imperial College, London
- (12-1.30 pm) Lunch Break
- (1.30-1.55 pm) Demystifying CXL Memory Bandwidth Expansion for Analytical Workloads Georgiy Lebedev, Hamish Nicholson, Musa Ünal, Sanidhya Kashyap and Anastasia Ailamaki, EPFL
- (2-3 pm) Keynote 2: On the role of storage and hardware acceleration in modern data management systems Vincent Hsu, IBM Storage, and Haris Pozidis, IBM Research
- (3-3.30 pm) Afternoon Break
- (3.30-3.55 pm) CXL-Bench: Benchmarking Shared CXL Memory Access Marcel Weisgut, Hasso Plattner Institute, University of Potsdam, Daniel Ritter, SAP, Florian Schmeller, Hasso Plattner Institute, University of Potsdam, Pınar Tözün, IT University of Copenhagen, and Tilmann Rabl, Hasso Plattner Institute, University of Potsdam
- (4-4.25 pm) RISC-V Meets RDBMS: An Experimental Study of Database Performance on an Open Instruction Set Architecture Yizhe Zhang, Zhengyi Yang, Bocheng Han, University of New South Wales, Haoran Ning, Macquarie University, Xin Cao, John Shepherd, University of New South Wales, and Guanfeng Liu, Macquarie University
- (4.30-4.55 pm) Micro-architectural Exploration of the Relational Memory Engine (RME) in RISC-V and FireSim Cole Strickler, University of Kansas, Ju Hyoung Mun, Brandeis University, Connor Sullivan, University of Kansas, Denis Hoornaert, Technical University of Munich, Renato Mancuso, Manos Athanassoulis, Boston University, and Heechul Yun, University of Kansas
Workshop Co-Chairs
For questions regarding the
workshop please send email to contact@adms-conf.org.
Program Committee
- Francesco Fusco, IBM Research, Zurich
- Wentao Huang, National University of Singapore
- Julia Spindler, TUM
- Selim Tekin, Georgia Tech
- Hubert Mohr-Daurat, Imperial College, London
- Rathijit Sen, Microsoft
- Hong Min, IBM T. J. Watson Research Center
- Viktor Sanca, Oracle
- Subhadeep Sarkar, Brandeis University
- Paper Submission: Monday, 26 May, 2025, 9 am EST
- Notification of Acceptance: Friday, 20 June, 2025
- Camera-ready Submission: Friday, 18 July, 2025
- Workshop Date: Monday, 1 September, 2025
Submission Site
All submissions will be handled electronically via EasyChair.
Publication and Formatting Guidelines
The ADMS'25 proceedings will be published as a part of the official VLDB Workshop Proceedings and indexed via DBLP.
We will use the same document templates as the VLDB conference. You can find them here.
It is the authors' responsibility to ensure that
their submissions adhere
strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.
As per the VLDB submission guidelines, the paper length for a full paper is
limited to 12 pages, excluding
bibliography. However, shorter
papers (at least 6 pages of content) are encouraged as
well.
|