The implementation of ROPE seems a little off for k #30900

wzhcz8902 · 2024-05-20T03:02:37Z

System Info

main branch

Who can help?

@ArthurZucker @younesbelkada @bojone

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

transformers/src/transformers/models/llama/modeling_llama.py

Lines 160 to 184 in 481a957

    
           def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1): 
        
               """Applies Rotary Position Embedding to the query and key tensors. 
        
               Args: 
        
                   q (`torch.Tensor`): The query tensor. 
        
                   k (`torch.Tensor`): The key tensor. 
        
                   cos (`torch.Tensor`): The cosine part of the rotary embedding. 
        
                   sin (`torch.Tensor`): The sine part of the rotary embedding. 
        
                   position_ids (`torch.Tensor`, *optional*): 
        
                       Deprecated and unused. 
        
                   unsqueeze_dim (`int`, *optional*, defaults to 1): 
        
                       The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and 
        
                       sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note 
        
                       that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and 
        
                       k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes 
        
                       cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have 
        
                       the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2. 
        
               Returns: 
        
                   `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding. 
        
               """ 
        
               cos = cos.unsqueeze(unsqueeze_dim) 
        
               sin = sin.unsqueeze(unsqueeze_dim) 
        
               q_embed = (q * cos) + (rotate_half(q) * sin) 
        
               k_embed = (k * cos) + (rotate_half(k) * sin) 
        
               return q_embed, k_embed

The rotary embedding for q and k is done exactly the same, but should they be different so that when they do multiply together, we end up with a relative position as the difference between them. The current implementation results in a relative position as a sum of the positions of q and k, so I think we should put a negative sign in front of k's angle, thus a correct implementation for k may look like
k_embed = (k * cos) + (rotate_half(k) * (-sin))

Expected behavior

The angle between q and k should differ by a sign

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-05-20T08:48:45Z

Hey, inviting you to read https://blog.eleuther.ai/rotary-embeddings/ on which the current implementation of ROPE is based.

A quick test you can do is directly change this and check the outputs of the Llama 3 or 2 model 😉

wzhcz8902 mentioned this issue May 20, 2024

[BUG] rope implementation question bojone/rerope#18

Closed

ArthurZucker closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The implementation of ROPE seems a little off for k #30900

The implementation of ROPE seems a little off for k #30900

wzhcz8902 commented May 20, 2024 •

edited

ArthurZucker commented May 20, 2024

The implementation of ROPE seems a little off for k #30900

The implementation of ROPE seems a little off for k #30900

Comments

wzhcz8902 commented May 20, 2024 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented May 20, 2024

wzhcz8902 commented May 20, 2024 •

edited