Index Selection Evaluation

This repository contains the source code for the evaluation platform presented in the paper Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. As part of this paper, we re-implemented 8 index selection algorithms (references listed below):

The drop heuristic [1]
An algorithm similar to the initial AutoAdmin algorithm [2]
An algorithm loosely following the DB2 advisor index selection [3]
The Relaxation algorithm [4]
CoPhy's approach [5]
Dexter [6]
The Extend algorithm [7]
An algorithm loosely following SQLServer's DTA Anytime index selection [8]

The implementations of the algorithms can be found under selection/algorithms. Documentation, also regarding the parameters of the algorithms, is part of the source files.

While some of the chosen algorithms are related to tools employed in commercial DBMS products, the re-implemented algorithms do not fully reflect the behavior and performance of the original tools, which may be continuously enhanced and optimized.

Citing the project

If you use (parts of) our implementation for research on index selection, please consider citing our paper:

@article{DBLP:journals/pvldb/KossmannHJS20,
  author    = {Jan Kossmann and
               Stefan Halfpap and
               Marcel Jankrift and
               Rainer Schlosser},
  title     = {Magic mirror in my hand, which is the best in the land? An Experimental
               Evaluation of Index Selection Algorithms},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {13},
  number    = {11},
  pages     = {2382--2395},
  year      = {2020},
  url       = {http://www.vldb.org/pvldb/vol13/p2382-kossmann.pdf}
}

Cost Estimation

The evaluation platform allows utilizing cost estimations that are either based on hypothetical (what-if) or actual indexes. By default hypothetical indexes are used. For PostgreSQL, HypoPG is used to provide what-if capabilities. To assess the accuracy of HypoPG, a comparison of cost estimations based on actual and hypothetical indexes for the TPC-H and TPC-DS benchmarks is provided in this repository as well.

Further details regarding cost estimation will be provided by the corresponding paper as soon as it is published.

Usage

Install script:

./scripts/install.sh

Run index selection evaluation for the TPC-H benchmark:

python3 -m selection benchmark_results/tpch_wo_2_17_20/config.json (If the last parameter is omitted, the default config example_configs/config.json is used)

Run tests:

python3 -m unittest discover tests

Get coverage:

coverage run --source=selection -m unittest discover tests/
coverage html
open htmlcov/index.html

Extensibility

Adding new/custom workloads

Create a new folder in custom_workloads/ and place SQL text files in there. See, for example, custom_worloads/example or custom_worloads/Custom_vldb.
Create a corresponding configuration file that references this folder (JSON key "benchmark_name"), the database to use (JSON key "database_name"), and query files to include (JSON key "queries"). See, for example, example_configs/config_custom.json or example_configs/config_custom_vldb.json.

Adding a new algorithm:

Create a new algorithm class, based on selection/algorithms/example_algorithm.py

Algorithms can use the provided Workload information and cost estimation interface.

For example, (within a SelectionAlgorithm) the following code would estimate the cost for each workload query without indexes:
```
query_costs_without_indexes = {}
for query in workload.queries:
    query_costs_without_indexes[query] = self.cost_evaluation.calculate_cost(
        Workload([query]), set()
    )
```
Add algorithm class name in selection/index_selection_evaluation.py to this dictionary:

ALGORITHMS = {'extend': ExtendAlgorithm,
              'drop': DropHeuristicAlgorithm}

Create or adjust configuration files

Supporting a new database system:

The platform and algorithms support different database systems by using the DatabaseConnector interface, also indirectly via the CostEvaluation interface.

A new database connector class must be added for a new system.

Formatting and Linting

The code can be automatically formatted and linted by calling ./scripts/format.sh and ./scripts/lint.sh from the main folder.

References

[1] (Drop): Kyu-Young Whang: Index Selection in Relational Databases. FODO 1985: 487-500

[2] (AutoAdmin): Surajit Chaudhuri, Vivek R. Narasayya: An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. VLDB 1997: 146-155

[3] (DB2Advis): Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, Alan Skelley: DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. ICDE 2000: 101-110

[4] (Relaxation): Nicolas Bruno, Surajit Chaudhuri: Automatic Physical Database Tuning: A Relaxation-based Approach. SIGMOD Conference 2005: 227-238

[5] (CoPhy): Debabrata Dash, Neoklis Polyzotis, Anastasia Ailamaki: CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads. Proc. VLDB Endow. 4(6): 362-372 (2011)

[6] (Dexter): Andrew Kane: https://medium.com/@ankane/introducing-dexter-the-automatic-indexer-for-postgres-5f8fa8b28f27

[7] (Extend): Rainer Schlosser, Jan Kossmann, Martin Boissier: Efficient Scalable Multi-attribute Index Selection Using Recursive Strategies. ICDE 2019: 1238-1249

[8] (Anytime): Not published yet. DTA documentation: https://docs.microsoft.com/de-de/sql/tools/dta/dta-utility?view=sql-server-ver15

Acknowledgements

We thank S. Chaudhuri, V. Narasayya, J. Rouhaud, A. Sharma, and D. Zilio for support, detailed answers, providing source code, and bug fixes.

Name		Name	Last commit message	Last commit date
Latest commit History 350 Commits
benchmark_results		benchmark_results
custom_workloads		custom_workloads
example_configs		example_configs
hypopg @ 57d711b		hypopg @ 57d711b
scripts		scripts
selection		selection
tests		tests
tpcds-kit @ 7fcaf27		tpcds-kit @ 7fcaf27
tpch-kit @ c9f35c0		tpch-kit @ c9f35c0
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Index Selection Evaluation

Citing the project

Cost Estimation

Usage

Extensibility

Adding new/custom workloads

Adding a new algorithm:

Supporting a new database system:

Formatting and Linting

References

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

hyrise/index_selection_evaluation

Folders and files

Latest commit

History

Repository files navigation

Index Selection Evaluation

Citing the project

Cost Estimation

Usage

Extensibility

Adding new/custom workloads

Adding a new algorithm:

Supporting a new database system:

Formatting and Linting

References

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages