Skip to content
Paper Copilotâ„¢, originally my personal project, is now open to the public. I deeply appreciate your feedback and support.
twitter x github-circle reddit

Paper Copilot Paper Copilotâ„¢ Research Toolbox
  • Statistics
    • AI/ML
      • AAAI
        • 2025
      • ICLR
        • 2025
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
        • 2018
        • 2017
        • 2013
        • 2014
      • ICML
        • 2024
        • 2023
      • NeurIPS
        • 2024
          • Main Conference
          • Datasets & Benchmarks
          • Creative AI
          • High School Projects
        • 2023
          • Main Conference
          • Datasets & Benchmarks
        • 2022
          • Main Conference
          • Datasets & Benchmarks
        • 2021
          • Main Conference
          • Datasets & Benchmarks
      • UAI
        • 2024
    • Data Mining
      • KDD
        • 2024
          • Research Track
          • Applied Data Science Track
        • 2025
          • Research Track
          • Applied Data Science Track
    • Graphics
      • SIGGRAPH
      • SIGGRAPH Asia
    • Multimedia
      • ACMMM
        • 2024
    • NLP
      • ACL
        • 2024
      • COLM
        • 2024
      • EMNLP
        • 2024
        • 2023
    • Robotics
      • CoRL
        • 2024
        • 2023
        • 2022
        • 2021
      • ICRA
        • 2025
      • IROS
        • 2025
      • RSS
        • 2025
    • Vision
      • 3DV
        • 2025
      • CVPR
        • 2025
      • ECCV
        • 2024
      • ICCV
      • WACV
        • 2025
  • Accepted Papers
    • AI/ML
      • ICLR
        • 2025
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
        • 2018
        • 2017
        • 2014
        • 2013
      • NeurIPS
        • 2024
          • Main Conference
          • Dataset & Benchmark
        • 2023
          • Main Conference
          • Dataset & Benchmark
        • 2022
          • Main Conference
          • Dataset & Benchmark
        • 2021
          • Main Conference
          • Dataset & Benchmark
      • ICML
        • 2024
        • 2023
    • Graphics
      • SIGGRAPH
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
      • SIGGRAPH Asia
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
        • 2018
    • Vision
      • CVPR
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
        • 2019
        • 2018
        • 2017
        • 2016
        • 2015
        • 2014
        • 2013
      • ICCV
        • 2023
        • 2021
        • 2019
        • 2017
        • 2015
        • 2013
      • ECCV
        • 2024
        • 2022
        • 2020
        • 2018
      • WACV
        • 2024
        • 2023
        • 2022
        • 2021
        • 2020
  • Countdown
  • Map
    • 3D
    • 2D
  • Contact Us
    • About Us
    • Acknowledgment
    • Report Issues
  • twitter x github-circle reddit

Tag: SIGMOD

Home » SIGMOD

Balsa: Learning a Query Optimizer Without Expert Demonstrations

Query Optimization

Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica

UC Berkeley

Portals
  • pdf
  • balsa
  • arXiv
  • ACM
Abstract

Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learning. Balsa first learns basic knowledge from a simple, environment-agnostic simulator, followed by safe learning in real execution. On the Join Order Benchmark, Balsa matches the performance of two expert query optimizers, both open-source and commercial, with two hours of learning, and outperforms them by up to 2.8$\times$ in workload runtime after a few more hours. Balsa thus opens the possibility of automatically learning to optimize in future compute environments where expert-designed optimizers do not exist.

2022 SIGMOD

Selectivity Functions of Range Queries are Learnable

Query Models for Cardinality Estimation

Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj K Agarwal, Debmalya Panigrahi, Sudeepa Roy, Jun Yang

Duke University

Portals
  • pdf
  • Project
  • ACM
Abstract

This paper explores the use of machine learning for estimating the selectivity of range queries in database systems. Using classic learning theory for real-valued functions based on shattering dimension, we show that the selectivity function of a range space with bounded VC-dimension is learnable. As many popular classes of queries (e.g., orthogonal range search, inequalities involving linear combination of attributes, distance-based search, etc.) represent range spaces with finite VC-dimension, our result immediately implies that their selectivity functions are also learnable. To the best of our knowledge, this is the first attempt at formally explaining the role of machine learning techniques in selectivity estimation, and complements the growing literature in empirical studies in this direction. Supplementing these theoretical results, our experimental results demonstrate that, empirically, even a basic learning algorithm with generic models is able to produce accurate predictions across settings, matching state-of-art methods designed for specific queries, and using training sample sizes commensurate with our theory.

2022 SIGMOD

Estimating Cardinalities with Deep Sketches

Query Models for Cardinality Estimation

Andreas Kipf, Dimitri Vorona, Jonas M

Technical University of Munich; University of Amsterdam; Centrum Wiskunde & Informatica

Portals
  • pdf
  • learnedcardinal...
  • arXiv
  • ACM
Abstract

We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators.

2019 SIGMOD

Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach

Data Models for Cardinality Estimation

Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang, Makoto Onizuka

Shenzhen University; Osaka University; Nagoya University; The University of NewSouth Wales

Portals
  • pdf
  • arXiv
  • ACM
Abstract

Due to the outstanding capability of capturing underlying data distributions, deep learning techniques have been recently utilized for a series of traditional database problems. In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. Answering this problem accurately and efficiently is essential to many data management applications, especially for query optimization. Moreover, in some applications the estimated cardinality is supposed to be consistent and interpretable. Hence a monotonic estimation w.r.t. the query threshold is preferred. We propose a novel and generic method that can be applied to any data type and distance function. Our method consists of a feature extraction model and a regression model. The feature extraction model transforms original data and threshold to a Hamming space, in which a deep learning-based regression model is utilized to exploit the incremental property of cardinality w.r.t. the threshold for both accuracy and monotonicity. We develop a training strategy tailored to our model as well as techniques for fast estimation. We also discuss how to handle updates. We demonstrate the accuracy and the efficiency of our method through experiments, and show how it improves the performance of a query optimizer.

2020 SIGMOD

The Case for a Learned Sorting Algorithm

Database Sorting

Ani Kristo, Kapil Vaidya, Ugur

Brown University; MIT; Intel Labs

Portals
  • pdf
  • Project
  • ACM
Abstract

Sorting is one of the most fundamental algorithms in Computer Science and a common operation in databases not just for sorting query results but also as part of joins (i.e., sort-merge-join) or indexing. In this work, we introduce a new type of distribution sort that leverages a learned model of the empirical CDF of the data. Our algorithm uses a model to efficiently get an approximation of the scaled empirical CDF for each record key and map it to the corresponding position in the output array. We then apply a deterministic sorting algorithm that works well on nearly-sorted arrays (e.g., Insertion Sort) to establish a totally sorted order. We compared this algorithm against common sorting approaches and measured its performance for up to 1 billion normally-distributed double-precision keys. The results show that our approach yields an average 3.38x performance improvement over C++ STL sort, which is an optimized Quicksort hybrid, 1.49x improvement over sequential Radix Sort, and 5.54x improvement over a C++ implementation of Timsort, which is the default sorting function for Java and Python.

2020 SIGMOD

ALEX: An Updatable Adaptive Learned Index

Updatable index

Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, Tim Kraska

Massachusetts Institute of Technology; Microsoft Research; Arizona State University; Georgia Institute of Technology

Portals
  • pdf
  • YouTube
  • Project
  • ALEX
  • arXiv
  • ACM
Abstract

Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+Trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.

2020 SIGMOD

Flood: Learning Multi-dimensional Indexes

Multi-dimensional Index
Error: Cannot create object