Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upsert Small Segment Merger Task [Minion] #14305

Closed
tibrewalpratik17 opened this issue Oct 25, 2024 · 2 comments
Closed

Upsert Small Segment Merger Task [Minion] #14305

tibrewalpratik17 opened this issue Oct 25, 2024 · 2 comments
Labels
PEP-Request Pinot Enhancement Proposal request to be reviewed. upsert

Comments

@tibrewalpratik17
Copy link
Contributor

The concept of compaction traditionally refers to the process of making something denser or more tightly packed. In its current implementation, the Upsert-Compaction task in Apache Pinot operates at the segment level, where it rebuilds individual segments by removing unused or invalid rows. This approach has proven highly effective in controlling the disk usage of upsert tables.

However this task focuses on addressing the issue of the continuously growing number of segments in upsert tables. To mitigate this challenge, we propose a multi-segment compaction model for upsert tables. In this model, multiple segments will be combined and re-uploaded as a single, consolidated segment, with invalid or unused rows removed. This approach aims to reduce the overall segment count while maintaining the storage efficiency benefits of the current upsert-compaction mechanism.

Sharing the design doc here for review and feedback from the community.

@Jackie-Jiang Jackie-Jiang added upsert PEP-Request Pinot Enhancement Proposal request to be reviewed. labels Nov 4, 2024
@Jackie-Jiang
Copy link
Contributor

cc @klsince

@tibrewalpratik17
Copy link
Contributor Author

Closing as completed via #14477

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PEP-Request Pinot Enhancement Proposal request to be reviewed. upsert
Projects
None yet
Development

No branches or pull requests

2 participants