Paper list about adopting machine learning techniques into data management tasks. Mainly consider ones published in top data management venues.
DB4ML - An In-Memory Database Kernel with Machine Learning Support. SIGMOD 2020, 159-173. Paper
Optimizing Machine Learning Workloads in Collaborative Environments. SIGMOD 2020, 1701-1716. Paper
Dynamic Parameter Allocation in Parameter Servers. PVLDB 13(11), 1877 - 1890, 2020. Paper
Crossbow: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers. PVLDB 12(11), 1399-1413, 2019. Paper
PS2: Parameter Server on Spark. SIGMOD 2019: 376-388. Paper
MLlib*: Fast Training of GLMs Using Spark MLlib. ICDE 2019: 1778-1789. Paper
On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML. PVLDB 11(12): 1755-1768, 2018. Paper
FlexPS: Flexible Parallelism Control in Parameter Server Architecture. PVLDB 11(5): 566-579, 2018. Paper
A Cost-based Optimizer for Gradient Descent Optimization. SIGMOD 2017, 977-992. Paper
SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning. CIDR 2017. Paper
SystemML: Declarative Machine Learning on Spark. PVLDB 9(13): 1425-1436, 2016. Paper Project
MLbase: A Distributed Machine-learning System. CIDR 2013. Paper Project
An Intermediate Representation for Optimizing Machine Learning Pipelines. PVLDB 12(11), 1553-1567, 2019. Paper
Democratizing Data Science through Interactive Curation of ML Pipelines. SIGMOD 2019, 1171-1188. Paper
Helix: Holistic Optimization for Accelerating Iterative Machine Learning. PVLDB 12(4), 446-460, 2018. Paper
KeystoneML: Optimizing pipelines for large-scale advanced analytics. ICDE 2017: 535–546. Paper Project
Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent. SIGMOD 2019, 1517-1534. Paper
SketchML: Accelerating Distributed Machine Learning with Data Sketches. SIGMOD 2018, 1269-1284. Paper
Compressed Linear Algebra for Large-Scale Machine Learning. PVLDB 9(12): 960-971, 2016. Paper
SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. PVLDB 13(11), 1919 - 1932, 2020. Paper
Enabling and Optimizing Non-linear Feature Interactions in Linear Algebra Over Normalized Data. SIGMOD 2019, 1571-1588. Paper
Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning. PVLDB 12(7): 807-821, 2019. Paper
A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics. PVLDB 11(13): 2168-2182, 2018. Paper Project
Towards Linear Algebra over Normalized Data. PVLDB 10(11): 1214-1225, 2017. Paper
Scalable Linear Algebra on a Relational Database System. ICDE 2017: 523-534. . Paper
Learning Generalized Linear Models Over Normalized Data. SIGMOD 2015: 1969-1984. Paper
Vertica-ML: Distributed Machine Learning in Vertica Database. SIGMOD 2020, pages: 755-768. Paper
Declarative Recursive Computation on an RDBMS. PVLDB 12(7): 822-835, 2019. Paper
In-Database Learning with Sparse Tensors. PODS 2018: 325-340. Paper
ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4), 348-361, 2018. Paper
The BUDS Language for Distributed Bayesian Machine Learning. SIGMOD 2017, 961-976. Paper
Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? PVLDB 11(3), 366-379, 2017. Paper
Learning Linear Regression Models over Factorized Joins. SIGMOD 2016, 3-18. Paper
The MADlib Analytics Library or MAD Skills, the SQL. PVLDB 5(12): 1700-1711, 2012. Paper
Sketching Linear Classifiers over Data Streams. SIGMOD 2018: 757-772. Paper Code
DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions. SIGMOD 2018, 1363-1376. Paper
Scalable Training of Hierarchical Topic Models. PVLDB 11(7), 826-839, 2018. Paper
LDA*: A Robust and Large-scale Topic Modeling System. PVLDB 10(11), 1406-1417, 2017. Paper
Scalable Kernel Density Classification via Threshold-Based Pruning. SIGMOD 2017, 945-959. Paper
Heterogeneity-aware Distributed Parameter Servers. SIGMOD 2017: 463-478. Paper
WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation. PVLDB 9(10): 744-755, 2016. Paper
Exploiting Matrix Dependency for Efficient Distributed Matrix Computation. SIGMOD 2015: 93-105. Paper