-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Title: CUDA RuntimeError: Unspecified Launch Failure during Training #30913
Comments
Hi @Hongjie1Chu ! |
I too am facing a similar issue. |
Update: I downgraded my PEFT to 10.0 and Transformers to 4.39.0 and it is working fine now |
thanks for your answer! |
Has there been a solution for this yet? I tried using the latest version of transformers and it still gave this issue. I want to use some of the new quantization methods. |
Hi ! |
System Info
transformers
version: 4.41.0Who can help?
@ArthurZucker @younesbelkada @muellerzr
Why does this error occur when passing a custom device_map? The map I wrote only differs from the auto-generated map in device order. Why does this cause an error? Does the device order affect the execution results?
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
import torch
from torch import nn
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, LlamaForCausalLM
from transformers import DataCollatorForLanguageModeling, DataCollatorWithPadding
from transformers.utils.fx import symbolic_trace
import argparse
import numpy as np
from datasets import load_metric, load_dataset
def compute_metrics(eval_preds):
metric = load_metric("glue", "mrpc")
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
def tokenize_function(example):
return tokenizer(example["sentence1"], example["sentence2"], truncation=True)
if name == "main":
parser = argparse.ArgumentParser()
parser.add_argument('--gpus', type=int, help='the number of gpus', default=8)
parser.add_argument('--modelName', type=str, help="the name of model", default='Llama2')
parser.add_argument('--bs', type=int, help="the name of bs", default=4)
Expected behavior
I want to know if the device order in the device_map affects the results.
The text was updated successfully, but these errors were encountered: