Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upsert small segment merger task in minions #14477

Merged
merged 4 commits into from
Dec 12, 2024

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Nov 18, 2024

PR related to the PEP request: #14305

Here, we are adding a new minion task to merge small segments in an upsert table. More implementation details in the design doc of the linked issue.

Test plan: Enabled this in one of infinite retention tables in Uber. The tables had ~35k segments initially and after enabling this task for ~2 days we were able to reach ~2k segments. The curve also flattens post reaching ~2k segments. We are using the default configs of this task and the table is generating ~500 segments daily. See attached screenshot.

Screenshot 2024-11-21 at 4 28 22 PM

Few details:

  • Unlike upsert compaction, we are enforcing enableSnapshot to be enabled for running this task. This makes sense from correctness perspective.
  • The new segment name formed example: compactmerged__table__5__1731976847331__0
  • The creation time of the new segment in ZK = max creation time in ZK of merged segments. Note the segment name doesn't have the index creation time value but the task running time.
  • The code merges LLC segment and UploadedRealtime segments which is desired.
  • Right now, we process one merging subtask per partition per task run. We will explore later of allowing multiple merging subtasks per partition per task run.

@tibrewalpratik17 tibrewalpratik17 added feature release-notes Referenced by PRs that need attention when compiling the next release notes upsert minion documentation labels Nov 18, 2024
@codecov-commenter
Copy link

codecov-commenter commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 21.93548% with 242 lines in your changes missing coverage. Please review.

Project coverage is 63.98%. Comparing base (59551e4) to head (6d77e17).
Report is 1455 commits behind head on master.

Files with missing lines Patch % Lines
...tcompactmerge/UpsertCompactMergeTaskGenerator.java 20.39% 157 Missing and 3 partials ⚠️
...rtcompactmerge/UpsertCompactMergeTaskExecutor.java 24.13% 65 Missing and 1 partial ⚠️
...ctmerge/UpsertCompactMergeTaskExecutorFactory.java 0.00% 6 Missing ⚠️
...t/processing/framework/SegmentProcessorConfig.java 44.44% 4 Missing and 1 partial ⚠️
...rocessing/framework/SegmentProcessorFramework.java 50.00% 1 Missing and 1 partial ⚠️
...UpsertCompactMergeTaskProgressObserverFactory.java 0.00% 2 Missing ⚠️
.../org/apache/pinot/core/common/MinionConstants.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14477      +/-   ##
============================================
+ Coverage     61.75%   63.98%   +2.23%     
- Complexity      207     1600    +1393     
============================================
  Files          2436     2696     +260     
  Lines        133233   148389   +15156     
  Branches      20636    22740    +2104     
============================================
+ Hits          82274    94946   +12672     
- Misses        44911    46476    +1565     
- Partials       6048     6967     +919     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.95% <21.93%> (+2.24%) ⬆️
java-21 63.86% <21.93%> (+2.24%) ⬆️
skip-bytebuffers-false 63.96% <21.93%> (+2.22%) ⬆️
skip-bytebuffers-true 63.84% <21.93%> (+36.12%) ⬆️
temurin 63.98% <21.93%> (+2.23%) ⬆️
unittests 63.98% <21.93%> (+2.23%) ⬆️
unittests1 56.21% <42.85%> (+9.32%) ⬆️
unittests2 34.51% <20.00%> (+6.78%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tibrewalpratik17 tibrewalpratik17 force-pushed the upsert_compact_merge branch 3 times, most recently from dc1df38 to 64cd7d6 Compare November 19, 2024 13:41
@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review November 19, 2024 20:36
@tibrewalpratik17
Copy link
Contributor Author

Marking it ready for review for early feedback!

@@ -199,4 +200,59 @@ public static class UpsertCompactionTask {
*/
public static final String NUM_SEGMENTS_BATCH_PER_SERVER_REQUEST = "numSegmentsBatchPerServerRequest";
}

public static class UpsertCompactMergeTask {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broad question: say we run both merge compaction and the segment refresh task for a table. How do we ensure that there aren't any race conditions, because it could be that the two tasks end up processing the same set of segments concurrently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with merge-compaction we would end up creating a new segment and refresh would just go and do a replaceSegment flow in metadata-manager. For refresh of LLC segment we should be good as the newly generated UploadedRealtimeSegment would override the keys which it should.
For refresh of UploadedRealtimeSegment, there's an issue right now where we use creation time to resolve for the latest uploaded segment, now there can be an edge case where creation time is same for both then whichever gets refreshed / uploaded later would dominate. But overall if you see it doesn't matter in the broad sense because for upserts we would still point to just one record per key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness wise we should be fine I guess, but this will still impede or impact the Segment Refresh task. Can you create a separate issue for this with some context on the problem so we could pick it up later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static final String DEFAULT_BUFFER_PERIOD = "2d";
private static final int DEFAULT_NUM_SEGMENTS_BATCH_PER_SERVER_REQUEST = 500;

public static class SegmentMergerMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private class or VisibleForTests?

LOGGER.info("Finished task: {} with configs: {}. Total time: {}ms", taskType, configs, (endMillis - startMillis));

List<SegmentConversionResult> results = new ArrayList<>();
for (File outputSegmentDir : outputSegmentDirs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be sure, could one task generate multiple new segments or just one? If multiple segments could be created, then how to handle cases when some segments failed to get uploaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation ensures generating one segment. But if you think about it even if we generate multiple segments, we don't need to think about rollback much. It's just that whatever segments got uploaded, validDocIds will be uploaded for those and the corresponding segments would be discarded as their docs will be invalid (assuming takeSnapshot has run once).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

For M:N merging, I think the computation of alreadyMergedSegments might need some extensions, as you may want to retry some of the input segments, instead of skipping them if they are in some segment's custom map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For M:N merging, I think the computation of alreadyMergedSegments might need some extensions, as you may want to retry some of the input segments, instead of skipping them if they are in some segment's custom map.

Yeah or we ensure always selecting complete 'x' segments while creating N segments such that sum over x = M. For the sake of simplicity, we are resolving this in task-generation itself currently ensuring to generate multiple X:1 subtasks instead.

LOGGER.info("Finished task: {} with configs: {}. Total time: {}ms", taskType, configs, (endMillis - startMillis));

List<SegmentConversionResult> results = new ArrayList<>();
for (File outputSegmentDir : outputSegmentDirs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

For M:N merging, I think the computation of alreadyMergedSegments might need some extensions, as you may want to retry some of the input segments, instead of skipping them if they are in some segment's custom map.

// get the max creation time of the small segments. This will be the index creation time for the new segment.
Optional<Long> maxCreationTimeOfMergingSegments = segmentMetadataList.stream().map(
SegmentMetadataImpl::getIndexCreationTime).reduce(Long::max);
if (maxCreationTimeOfMergingSegments.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: numInputSegments is always greater than 0 right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we have added the check in generator code to ensure numInputSegments > 1

int partitionID = entry.getKey();
List<SegmentMergerMetadata> segments = entry.getValue();
// task config thresholds
// TODO add output segment size as one of the thresholds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a issue for this for tracking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@klsince klsince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ankitsultana ankitsultana merged commit 626c45d into apache:master Dec 12, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation feature minion release-notes Referenced by PRs that need attention when compiling the next release notes upsert
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants