Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Don't review: E2e #46670

Closed
wants to merge 8 commits into from
Closed

[WIP] Don't review: E2e #46670

wants to merge 8 commits into from

Conversation

GideonPotok
Copy link
Contributor

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label May 20, 2024
@GideonPotok
Copy link
Contributor Author

@uros-db adding the map binaryValues to Mode doesn't persist the mappings as desired. It looks to me like the general pattern for accumulating anything in spark aggregation functions is by putting it into an InternalRow and having merge and update take internal row as parameter.

Is there another way you advise? Do I use a class that extends from InternalRow, which would include both the OpenHashMap[AnyRef, Long] as well as our Map[String, String] ? Let me know what the right pattern is because obviously having that element of state be a member of Mode implementation is not the standard way/ doesn't seem to work.

@uros-db
Copy link
Contributor

uros-db commented May 21, 2024

Yes, aggregation buffers are InternalRows in Spark, however - I would not suggest extending stuff as you described in order to get this working. This approach turned out to not be as straight-forward as we'd have hoped, and after all - this is yet another approach with an aux structure for proper collation-aware aggregation in Mode, so since we don't really have guarantees of better performance, I'd suggest sticking with GroupMapReduce for now (#46597). Sorry for leading you the wrong way with this one, I hoped this one would turn out better / easier to implement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants