-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow LuceneSourceOperator to early terminate #108820
Conversation
Hi @iverase, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
// Note: if (maxPageSize - currentPagePos) is a small "remaining" interval, this could lead to slow collection with a | ||
// highly selective filter. Having a large "enough" difference between max- and minPageSize (and thus currentPagePos) | ||
// alleviates this issue. | ||
maxPageSize - currentPagePos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the scorer logic in detail, but I'm curious to understand what would be the implications of limiting the numDocs
here instead, eg. using Math.min(maxPageSize - currentPagePos, remainingDocs)
here rather than throwing and catching an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what I see in the code, this value does not take into account deleted docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes a lot of sense, thanks!
FWIW, it looks good to me, but please wait for Nhat or Nik for a check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @iverase.
We can make the LuceneSourceOperator collector to throw a CollectionTerminatedException whenever it has collected enough documents.
In the geo benchmarks, this is clearly showing as an issue as we are iterating all the matched documents to get the first 10:
We can make the LuceneSourceOperator collector to throw a CollectionTerminatedException whenever it has collected enough documents.