PaliGemma fix attention mask for finetunes #30918

probicheaux · 2024-05-20T15:27:40Z

What does this PR do?

Previously, the PaliGemma attention mask was a 4d attention mask of all 1s, which preempted any downstream causal mask creation by Gemma. PaliGemma has full attention over the image tokens and the prefix of the prompt, and causal attention over the suffix. This PR implements that logic, allowing for finetuning PaliGemma without label leakage.

(Additionally, I noticed that using the GemmaTokenizerFast led to incorrect tokenization of tokens like character by character, whereas using GemmaTokenizer did proper tokenization. Be sure to set use_fast=False when instantiating the PaliGemma processor. I'll make an issue for this observation
edit: ah, looks like that is a known issue?).

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker @molbap @merveenoyan

ArthurZucker · 2024-05-21T07:49:56Z

Hey! Can you provide a reproducer for:

the tokenizer issues: it's probably just about the added tokens and the conversion
you are right, when you finetune, you want the non-prompt text to be causal vs now everything is ones.

molbap

Good catch, main comment is that separation between prefix and suffix is brittle as-is - we want to merge this fix as soon as possible though!

src/transformers/models/paligemma/modeling_paligemma.py

molbap · 2024-05-21T09:10:08Z

src/transformers/models/paligemma/modeling_paligemma.py

+        if sequence_length != 1:
+            causal_mask = torch.triu(causal_mask, diagonal=1)
+
+        mask = input_ids == self.config.prefix_suffix_separator_index


This is not very robust unfortunately: one or several \n tokens might be present in the prefix. Could this depend on the labels passed instead?

src/transformers/models/paligemma/modeling_paligemma.py

probicheaux · 2024-05-22T02:21:50Z

I accidentally ran style on the whole repo and now the tests are failing cause I had to force push away updating like every file (இ﹏இ`｡)

I think all the style errors are in code not touched by this change

probicheaux · 2024-05-22T02:24:13Z

Anyways OK I updated how we check where to split using a new index in the labels. Constructing the right labels array is now getting to be sort of tricky for new users, so now the PaliGemmaProcessor returns labels in BatchEncoding for easier finetuning. This is sort of necessary because the prefix and suffix might change length during tokenization and it'd be annoying to try to find the right index of that \n. Hope y'all are ok with this idea!!!

ArthurZucker

A lot of the logic you are re-implementing is natively supported in the tokenizer with the token_type_ids. In this case I think it would be wise to leverage create_token_type_ids_from_sequences checkout codegen for example.

molbap

Hey @probicheaux , left another review! I think your causal mask building is fine, we just need to remove training flag from the processor and pass the labels explicitly. Let me know if you have time to add the suggestions!

molbap · 2024-05-22T10:03:47Z

src/transformers/models/paligemma/processing_paligemma.py


        super().__init__(image_processor, tokenizer)

    def __call__(
        self,
        text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
+        suffix: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,


Suggested change

suffix: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,

labels: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,

molbap · 2024-05-22T10:04:21Z

src/transformers/models/paligemma/processing_paligemma.py

+            suffix (`str`, `List[str]`, `List[List[str]]`):
+                The suffix or batch of suffixes to be encoded. Only necessary for finetuning. See https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/paligemma/README.md
+                for more information.


Suggested change

suffix (`str`, `List[str]`, `List[List[str]]`):

The suffix or batch of suffixes to be encoded. Only necessary for finetuning. See https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/paligemma/README.md

for more information.

labels (`str`, `List[str]`, `List[List[str]]`):

The labels suffixes or batch of suffixes to be encoded. Only necessary for finetuning. See https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/paligemma/README.md

for more information.

Overall replacing suffix with labels where it's use, I think

molbap · 2024-05-22T10:11:54Z

src/transformers/models/paligemma/processing_paligemma.py

        else:
-            text_inputs = self.tokenizer(
+            inputs = self.tokenizer(


Here I think the key is using the text_pair option in the tokenizer along with return_token_type_ids=True.
So

text = ["Is the witholded tax rate equal to 31%?"] labels = ["Yes"] self.tokenizer(text=text, text_pair=labels, return_token_type_ids=True)

This will give the additional key to inputs of the form 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 1, 1]] and will have also correct shape for batched outputs.

Then you already have your mask from token_type_ids and don't need to specify the training bool flag below. The token for newline character has to be inserted before the first 1, and assigned the token type id 0 so that it's part of the prompt with full block attention.

molbap · 2024-05-22T10:14:36Z

src/transformers/models/paligemma/modeling_paligemma.py

+            causal_mask = self.construct_causal_mask_with_block_attention(
+                attention_mask, labels, text_mask, inputs_embeds
+            )


Here, the token_type_ids can be passed from the forward since inputs are unpacked and will contain them. Then, you already have your mask to build the causal mask!

molbap · 2024-05-22T10:22:14Z

src/transformers/models/paligemma/modeling_paligemma.py

+        mask = labels == self.config.prefix_suffix_separator_index
+        # Get the index of the first \n in each row
+        indices = mask.int().argmax(dim=1, keepdim=True)


Assuming you're getting the token_type_ids here, you just have to do something like that

Suggested change

mask = labels == self.config.prefix_suffix_separator_index

# Get the index of the first \n in each row

indices = mask.int().argmax(dim=1, keepdim=True)

indices = (token_type_ids == 1).int().argmax(dim=1)

probicheaux · 2024-05-22T17:50:57Z

Merged as part of #30967

probicheaux added 5 commits May 20, 2024 15:07

PaliGemma working causal attention

9ae0773

Formatting

0424fd6

Style

7c7b43b

Docstrings + remove commented code

ca5dcb5

Update docstring for PaliGemma Config

28ac09c

pcuenca requested a review from molbap May 20, 2024 21:37

molbap reviewed May 21, 2024

View reviewed changes

probicheaux added 2 commits May 22, 2024 01:48

PaliGemma - add separator ind to model/labels

52ba9ad

Refactor + docstring paligemma processor method

9a732f4

probicheaux force-pushed the paligemma-causal-attention-mask branch from 03cfe8e to 9a732f4 Compare May 22, 2024 02:17

Style

91d6499

probicheaux requested a review from molbap May 22, 2024 02:24

ArthurZucker reviewed May 22, 2024

View reviewed changes

molbap reviewed May 22, 2024

View reviewed changes

molbap mentioned this pull request May 22, 2024

Paligemma causal attention mask #30967

Merged

probicheaux closed this May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PaliGemma fix attention mask for finetunes #30918

PaliGemma fix attention mask for finetunes #30918

probicheaux commented May 20, 2024 •

edited

ArthurZucker commented May 21, 2024 •

edited

molbap left a comment

molbap May 21, 2024

probicheaux commented May 22, 2024

probicheaux commented May 22, 2024

ArthurZucker left a comment

molbap left a comment

molbap May 22, 2024

molbap May 22, 2024

molbap May 22, 2024

molbap May 22, 2024

molbap May 22, 2024

molbap May 22, 2024

probicheaux commented May 22, 2024

	suffix: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
	labels: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,

PaliGemma fix attention mask for finetunes #30918

PaliGemma fix attention mask for finetunes #30918

Conversation

probicheaux commented May 20, 2024 • edited

What does this PR do?

Before submitting

Who can review?

ArthurZucker commented May 21, 2024 • edited

molbap left a comment

Choose a reason for hiding this comment

molbap May 21, 2024

Choose a reason for hiding this comment

probicheaux commented May 22, 2024

probicheaux commented May 22, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

molbap left a comment

Choose a reason for hiding this comment

molbap May 22, 2024

Choose a reason for hiding this comment

molbap May 22, 2024

Choose a reason for hiding this comment

molbap May 22, 2024

Choose a reason for hiding this comment

molbap May 22, 2024

Choose a reason for hiding this comment

molbap May 22, 2024

Choose a reason for hiding this comment

molbap May 22, 2024

Choose a reason for hiding this comment

probicheaux commented May 22, 2024

probicheaux commented May 20, 2024 •

edited

ArthurZucker commented May 21, 2024 •

edited