I am training a causal language model (Llama2) using the standard Trainer for handling multiple GPUs (no accelerate or torchrun). When I train on a single GPU only with batch size 1, everything works fine. However, when I have more than a single GPU or more than one example in the batch, I get the following error:
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with âpadding=Trueâ âtruncation=Trueâ to have batched tensors with the same length. Perhaps your features (
labelsin this case) have excessive nesting (inputs typelistwhere typeintis expected).
It doesnât seem like this error should have anything to do with training on multiple GPUs or multi-batch, but apparently it does. Here is my preprocessing function:
def preprocess_func(batch, tokenizer, max_source_length=512, max_target_length=128):
inputs = []
labels = []
articles = batch["article"]
summaries = batch["highlights"]
for article, summary in zip(articles, summaries):
input_text = article + "\nSummary: "
target_text = summary + tokenizer.eos_token
input_ids = tokenizer.encode(input_text, max_length=max_source_length, truncation=True)
target_ids = tokenizer.encode(target_text, max_length=max_target_length, truncation=True)
# Combine inputs and targets
input_ids_combined = input_ids + target_ids
# Create labels (no prediction needed for the input tokens, so set to -100)
labels_combined = [-100] * len(input_ids) + target_ids
inputs.append(input_ids_combined)
labels.append(labels_combined)
return {
'input_ids': inputs,
'labels': labels
}
Data collator and trainer are called as follows:
# Data collator
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False,
)
# Trainer
trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["validation"],
data_collator=data_collator,
compute_metrics=compute_metrics,
)
The description in Data Collator states
Inputs are dynamically padded to the maximum length of a batch if they are not all of the same length.
so unequal lengths of examples in a batch should not be an issue.
Thanks a lot!