Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndicesClusterStateService blocks applier thread acquiring IndexShard#engineMutex #108817

Open
DaveCTurner opened this issue May 20, 2024 · 1 comment
Labels
>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Meta label for distributed team

Comments

@DaveCTurner
Copy link
Contributor

Observed in the wild in a cluster state that took ~5s to apply. There were several GENERIC threads all performing recoveries and running here:

    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.shard.IndexShard.createEngine(IndexShard.java:2146)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:2119)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.shard.IndexShard.openEngineAndSkipTranslogRecovery(IndexShard.java:2095)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$2(PeerRecoveryTargetService.java:310)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.indices.recovery.PeerRecoveryTargetService$$Lambda/0x000000c802a52458.apply(Unknown Source)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:290)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.CompositeIndexEventListener.lambda$callListeners$10(CompositeIndexEventListener.java:284)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.CompositeIndexEventListener$$Lambda/0x000000c802a75000.accept(Unknown Source)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:100)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    java.base@21.0.2/java.lang.Thread.runWith(Thread.java:1596)
    java.base@21.0.2/java.lang.Thread.run(Thread.java:1583)

Meanwhile the applier thread appears to have been trying to close one of the recovering shards and was blocked here:

 0.0% [cpu=0.0%, other=0.0%] (0s out of 500ms) cpu usage by thread 'elasticsearch[es-es-search-74ff455b96-m8c5k][clusterApplierService#updateTask][T#1]'
  10/10 snapshots sharing following 21 elements
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.shard.IndexShard.close(IndexShard.java:1724)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.IndexService.lambda$closeShard$16(IndexService.java:651)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.IndexService$$Lambda/0x000000c802b8fdd0.accept(Unknown Source)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.action.ActionListener.run(ActionListener.java:433)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.IndexService.closeShard(IndexService.java:629)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.IndexService.removeShard(IndexService.java:589)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.indices.cluster.IndicesClusterStateService.removeIndicesAndShards(IndicesClusterStateService.java:499)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.indices.cluster.IndicesClusterStateService.doApplyClusterState(IndicesClusterStateService.java:302)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:260)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:530)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:156)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183)
    java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    java.base@21.0.2/java.lang.Thread.runWith(Thread.java:1596)
    java.base@21.0.2/java.lang.Thread.run(Thread.java:1583)

Relates #89821

@DaveCTurner DaveCTurner added >bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels May 20, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label May 20, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

2 participants