Reduce the maximum word proximity from 8 to 4 #3820
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an experiment to evaluate the impact of storing fewer word pairs. I am not 100% sure that it is implemented perfectly, but it is good enough to run some initial experiments I think.
It reduces the size of the indexed
smol-wiki-articles-3_4.csv
dataset from 3.49GB to 2.19GB, a reduction of 37.5%. In combination with #3819 (review) , we reduced the index size from 4.19GB to 2.18GB. This means that the index size would be almost halved between v1.2 and v1.3.For
movies.json
, we go from 212MB to 116MB.While I haven't launched any benchmark yet, indexing also feels significanty faster to me.
Search latency should also benefit a lot from it. However, I don't know what the impact on relevancy will be.