-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(search): Multishard cutoffs #1924
base: main
Are you sure you want to change the base?
Conversation
24b2025
to
fe7559a
Compare
My condolences to the one reviewing it 🙂 |
I actually wrote Vlad - Kostas 1-1 only to not submit the comment because it would have been noise. Anyway, when you wrote this I could not resist not to write this msg 😛 I would argue mine was simpler because it was mostly boilerplate (I haven't looked yours TBH so it could be the same case) so it's not really 1-1 and probably you are getting the 👑 👨🍳 |
fe7559a
to
c94d9ad
Compare
c94d9ad
to
30ea6b8
Compare
30ea6b8
to
883d326
Compare
Please take a quick look at this. There is still some polishment left but the general "structure" is ready. I'd wish for this to be merged before we announce search because it's important for fast multishard queries PS: Yes there is lots of code and it's complicated 🙂 I'd start looking from MultiShardSearch::Run because it's the main entry point w everything else being tailored to fit it's needs |
883d326
to
7755b92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed through this PR, it looks good.
What is missing, except for that TODO around how many items to retrieve?
Also see some minor questions/nits
size_t avg_shard_min = hits * intlog2(hits) / (12 + shard_set->size() / 10); | ||
avg_shard_min -= min(avg_shard_min, min(hits, size_t(5))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the rationale here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimentally from #1892 😄 Those formulas might not be final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd link to that, at least until you change it :)
// If all shards have at least avg min, keep the bare minimum needed to cover the request | ||
size_t limit = requested / shards + 1; | ||
|
||
// Aggregations like SORTBY and KNN reorder the result and thus introduce some variance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have to fetch all matching for SORTBY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No/Yes 🤓 We have to fetch all, but not serialize all. That is what the doc reference is for. If we find out when building the reply that some entries should have been included but are not serialized, we'll refill (see BuildSortedOrder())
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
8fe00de
to
52856f4
Compare
No description provided.