Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text-indexes leads to blocked consumption #14583

Open
tibrewalpratik17 opened this issue Dec 3, 2024 · 1 comment
Open

Text-indexes leads to blocked consumption #14583

tibrewalpratik17 opened this issue Dec 3, 2024 · 1 comment

Comments

@tibrewalpratik17
Copy link
Contributor

We have applied text-indexes on our partial-upsert tables. During segment commit, we are observing that the CONSUMING -> ONLINE thread gets blocked for hours on this flow:

"HelixTaskExecutor-message_handle_thread_22" #132 daemon prio=5 os_prio=0 cpu=36015.35ms elapsed=367482.10s tid=0x00007efc6f8ed800 nid=0xf7 in Object.wait()  [0x00007efc656fc000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait([email protected]/Native Method)
	- waiting on <no object reference available>
	at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5410)
	- waiting to re-lock in wait() <0x00007f11147aaa10> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.abortMerges(IndexWriter.java:2721)
	- waiting to re-lock in wait() <0x00007f11147aaa10> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2469)
	- waiting to re-lock in wait() <0x00007f11147aaa10> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2449)
	- locked <0x00007f11147b03d0> (a java.lang.Object)
	at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2441)
	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1364)
	at org.apache.pinot.segment.local.segment.creator.impl.text.LuceneTextIndexCreator.close(LuceneTextIndexCreator.java:195)
	at org.apache.pinot.segment.local.realtime.impl.invertedindex.RealtimeLuceneTextIndex.close(RealtimeLuceneTextIndex.java:179)
	at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer.lambda$close$0(MutableSegmentImpl.java:1330)
	at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer$$Lambda$2460/0x00007ee57b296108.accept(Unknown Source)
	at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer$$Lambda$2461/0x00007ee57b295cb0.accept(Unknown Source)
	at java.util.HashMap.forEach([email protected]/HashMap.java:1337)
	at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer.close(MutableSegmentImpl.java:1338)
	at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.destroy(MutableSegmentImpl.java:990)
	at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.doDestroy(RealtimeSegmentDataManager.java:1343)
	at org.apache.pinot.segment.local.data.manager.SegmentDataManager.destroy(SegmentDataManager.java:82)
	at org.apache.pinot.core.data.manager.BaseTableDataManager.closeSegment(BaseTableDataManager.java:393)
	at org.apache.pinot.core.data.manager.BaseTableDataManager.releaseSegment(BaseTableDataManager.java:382)
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromConsuming(SegmentOnlineOfflineStateModelFactory.java:142)
	at jdk.internal.reflect.GeneratedMethodAccessor2034.invoke(Unknown Source)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke([email protected]/Method.java:566)
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350)
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:278)
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)
	at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run([email protected]/Thread.java:829)

Since partial-upsert tables wait for segment build to complete, the new CONSUMING segment is not added to the replica where this huge wait time is observed.

Some more details:

  • Pinot version: 1.1
  • This is not happening consistently but erratically in one of the replicas. This leads to consumption being blocked in the stuck replica.
  • Restarting the affected host is the only way out now to get table freshness back up to latest segment.

@itschrispeck shared a similar issue in ES: elastic/elasticsearch#107513

@hpvd
Copy link

hpvd commented Dec 3, 2024

just fyi: there was some work in the field of index and ingestion with release in Pinot 1.2
https://github.com/apache/pinot/releases/tag/release-1.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants