使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397
Open
1 task done
Labels
pending
This problem is yet to be addressed
Reminder
System Info
Reproduction
CUDA_VISIBLE_DEVICES=1 python src/train.py
--stage sft
--do_train True
--model_name_or_path /home/vvv/llm_model/Qwen1.5-32B-Chat-AWQ
--finetuning_type lora
--template qwen
--dataset_dir /home/vvv/LLaMA-Factory-0.9.1/data
--dataset train-explore
--cutoff_len 8192
--learning_rate 5.0e-5
--num_train_epochs 3
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 2
--save_steps 30
--output_dir /home/vvv/LLaMA-Factory-0.9.1/saves/train-test/phone-num-liger
--quantization_bit 4
--quantization_type fp4
--lora_rank 8
--lora_alpha 8
--lora_dropout 0.1
--lora_target all
--plot_loss True
--overwrite_output_dir True
--overwrite_cache True
--seed 1
--enable_liger_kernel True
--use_adam_mini True \
Expected behavior
使用adam-mini优化器减少训练时间
Others
No response
The text was updated successfully, but these errors were encountered: