You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have applied text-indexes on our partial-upsert tables. During segment commit, we are observing that the CONSUMING -> ONLINE thread gets blocked for hours on this flow:
"HelixTaskExecutor-message_handle_thread_22" #132 daemon prio=5 os_prio=0 cpu=36015.35ms elapsed=367482.10s tid=0x00007efc6f8ed800 nid=0xf7 in Object.wait() [0x00007efc656fc000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait([email protected]/Native Method)
- waiting on <no object reference available>
at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5410)
- waiting to re-lock in wait() <0x00007f11147aaa10> (a org.apache.lucene.index.IndexWriter)
at org.apache.lucene.index.IndexWriter.abortMerges(IndexWriter.java:2721)
- waiting to re-lock in wait() <0x00007f11147aaa10> (a org.apache.lucene.index.IndexWriter)
at org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2469)
- waiting to re-lock in wait() <0x00007f11147aaa10> (a org.apache.lucene.index.IndexWriter)
at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2449)
- locked <0x00007f11147b03d0> (a java.lang.Object)
at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2441)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1364)
at org.apache.pinot.segment.local.segment.creator.impl.text.LuceneTextIndexCreator.close(LuceneTextIndexCreator.java:195)
at org.apache.pinot.segment.local.realtime.impl.invertedindex.RealtimeLuceneTextIndex.close(RealtimeLuceneTextIndex.java:179)
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer.lambda$close$0(MutableSegmentImpl.java:1330)
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer$$Lambda$2460/0x00007ee57b296108.accept(Unknown Source)
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer$$Lambda$2461/0x00007ee57b295cb0.accept(Unknown Source)
at java.util.HashMap.forEach([email protected]/HashMap.java:1337)
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer.close(MutableSegmentImpl.java:1338)
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.destroy(MutableSegmentImpl.java:990)
at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.doDestroy(RealtimeSegmentDataManager.java:1343)
at org.apache.pinot.segment.local.data.manager.SegmentDataManager.destroy(SegmentDataManager.java:82)
at org.apache.pinot.core.data.manager.BaseTableDataManager.closeSegment(BaseTableDataManager.java:393)
at org.apache.pinot.core.data.manager.BaseTableDataManager.releaseSegment(BaseTableDataManager.java:382)
at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromConsuming(SegmentOnlineOfflineStateModelFactory.java:142)
at jdk.internal.reflect.GeneratedMethodAccessor2034.invoke(Unknown Source)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke([email protected]/Method.java:566)
at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350)
at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:278)
at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)
at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)
at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
at java.lang.Thread.run([email protected]/Thread.java:829)
Since partial-upsert tables wait for segment build to complete, the new CONSUMING segment is not added to the replica where this huge wait time is observed.
Some more details:
Pinot version: 1.1
This is not happening consistently but erratically in one of the replicas. This leads to consumption being blocked in the stuck replica.
Restarting the affected host is the only way out now to get table freshness back up to latest segment.
We have applied text-indexes on our partial-upsert tables. During segment commit, we are observing that the CONSUMING -> ONLINE thread gets blocked for hours on this flow:
Since partial-upsert tables wait for segment build to complete, the new CONSUMING segment is not added to the replica where this huge wait time is observed.
Some more details:
@itschrispeck shared a similar issue in ES: elastic/elasticsearch#107513
The text was updated successfully, but these errors were encountered: