[Inductor][Flex-attention] Support different sequence lengths for Query and Key/Value #126639

yanboliang · 2024-05-19T04:55:25Z

Fixes #ISSUE_NUMBER

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…ry and Key/Value

pytorch-bot · 2024-05-19T04:55:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126639

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 6 Unrelated Failures

As of commit 3678fe5 with merge base 41fb4bc ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3.8-clang10-onnx / test (default, 1, 2, linux.2xlarge) (gh)
Process completed with exit code 1.

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Chillee · 2024-05-19T06:38:40Z

test/inductor/test_flex_attention.py

+            self._check_equal(golden_out, ref_out, compiled_out, fudge_factor, "Out")
+
+            # Check gradients
+            q_fudge_factor = 2.5 * fudge_factor


Can we refactor this a bit lol. I think we have this code copied 3 different times in this file.

Chillee · 2024-05-19T06:39:00Z

torch/_inductor/kernel/flex_attention.py

@@ -188,34 +188,36 @@ def build_subgraph_buffer(

    Z = {{size("Q", 0)}}
    H = {{size("Q", 1)}}
-    N_CTX = {{size("Q", 2)}}
+    M = {{size("Q", 2)}}


Let's call this Q_LEN and KV_LEN?

yanboliang · 2024-06-01T05:02:38Z

closing this in favor of #127678

[Inductor][Flex-attention] Support different sequence lengths for Que…

a462b8a

…ry and Key/Value

pytorch-bot bot added ciflow/inductor module: inductor labels May 19, 2024

yanboliang marked this pull request as ready for review May 19, 2024 04:56

yanboliang requested review from albanD, jbschlosser and mikaylagawarecki as code owners May 19, 2024 04:56

yanboliang requested review from Chillee and drisspg May 19, 2024 04:56

Chillee reviewed May 19, 2024

View reviewed changes

Update

3678fe5

albanD removed their request for review May 20, 2024 14:31

yanboliang closed this Jun 1, 2024

yanboliang deleted the flex-diff-q branch June 1, 2024 05:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor][Flex-attention] Support different sequence lengths for Query and Key/Value #126639

[Inductor][Flex-attention] Support different sequence lengths for Query and Key/Value #126639

yanboliang commented May 19, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented May 19, 2024 •

edited

Chillee May 19, 2024

Chillee May 19, 2024

yanboliang commented Jun 1, 2024

[Inductor][Flex-attention] Support different sequence lengths for Query and Key/Value #126639

[Inductor][Flex-attention] Support different sequence lengths for Query and Key/Value #126639

Conversation

yanboliang commented May 19, 2024 • edited by pytorch-bot bot

pytorch-bot bot commented May 19, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126639

❌ 1 New Failure, 6 Unrelated Failures

Chillee May 19, 2024

Choose a reason for hiding this comment

Chillee May 19, 2024

Choose a reason for hiding this comment

yanboliang commented Jun 1, 2024

yanboliang commented May 19, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented May 19, 2024 •

edited