-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: The size of tensor a (151936) must match the size of tensor b (152064) at non-singleton dimension 1 #286
Comments
It seems that qwen14b and qwen1.5b use different vocabulary sizes and KD methods generally requires the teacher and student models to share the same vocabulary. However, the vocabulary difference of different-sized qwen is just for padding. Therefore, cutting the larger vocabulary to the smaller one or padding the smaller to the larger one would be fine. |
@t1101675 how to padding the smaller to the larger |
@t1101675 I need to modify the size, re-train SFT, and then perform distillation? |
Yes. Besides, I think the SFT re-training does not need to be extensive to adapt the model for a few padding tokens. |
@t1101675 how to padding the smaller to the larger, I change config.json, but it had no effect. and Can't the code support auto padding? |
You probably need to resize the embeddings of the student model after changing config.json. |
when I try train minillm from qwen14b to qwen1.5b, I encountered the following problem
`[rank0]: Traceback (most recent call last):
[rank0]: File "/miniLLM/LMOps-main/minillm/train_minillm.py", line 103, in
[rank0]: main()
[rank0]: File "/miniLLM/LMOps-main/minillm/train_minillm.py", line 89, in main
[rank0]: train(
[rank0]: File "/miniLLM/LMOps-main/minillm/minillm/init.py", line 37, in train
[rank0]: sampler.run_sample(args.num_rollouts_per_device)
[rank0]: File "/miniLLM/LMOps-main/minillm/minillm/sampler.py", line 70, in run_sample
[rank0]: gen_out = self.trainer.generate(**batch, return_dict_in_generate=True, mode=mode, teacher_mixed_sample=(self.args.teacher_mixed_alpha is not None), output_scores=True)
[rank0]: File "/miniLLM/LMOps-main/minillm/minillm/trainer.py", line 618, in generate
[rank0]: gen = model.generate(
[rank0]: File "/miniLLM/LMOps-main/minillm/minillm/model.py", line 21, in generate
[rank0]: return self.base_model.generate(**x)
[rank0]: File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/miniLLM/LMOps-main/minillm/transformers/src/transformers/generation/utils.py", line 2229, in generate
[rank0]: result = self._sample(
[rank0]: File "/miniLLM/LMOps-main/minillm/transformers/src/transformers/generation/utils.py", line 3331, in _sample
[rank0]: probs = (1 - mix_in_alpha) * probs + mix_in_alpha * m_probs
[rank0]: RuntimeError: The size of tensor a (151936) must match the size of tensor b (152064) at non-singleton dimension 1
E1212 22:55:56.524920 139651309172544 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 739442) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/miniLLM/LMOps-main/minillm/train_minillm.py FAILED`
The text was updated successfully, but these errors were encountered: