使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

TC10127 · 2024-12-19T14:57:18Z

Reminder

I have read the README and searched the existing issues.

System Info

Reproduction

CUDA_VISIBLE_DEVICES=1 python src/train.py
--stage sft
--do_train True
--model_name_or_path /home/vvv/llm_model/Qwen1.5-32B-Chat-AWQ
--finetuning_type lora
--template qwen
--dataset_dir /home/vvv/LLaMA-Factory-0.9.1/data
--dataset train-explore
--cutoff_len 8192
--learning_rate 5.0e-5
--num_train_epochs 3
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 2
--save_steps 30
--output_dir /home/vvv/LLaMA-Factory-0.9.1/saves/train-test/phone-num-liger
--quantization_bit 4
--quantization_type fp4
--lora_rank 8
--lora_alpha 8
--lora_dropout 0.1
--lora_target all
--plot_loss True
--overwrite_output_dir True
--overwrite_cache True
--seed 1
--enable_liger_kernel True
--use_adam_mini True \

Expected behavior

使用adam-mini优化器减少训练时间

Others

No response

TC10127 · 2024-12-19T14:58:46Z

github-actions bot added the pending This problem is yet to be addressed label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

TC10127 commented Dec 19, 2024

TC10127 commented Dec 19, 2024

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

Comments

TC10127 commented Dec 19, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

TC10127 commented Dec 19, 2024