You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After training, when saving the model, I got the following error. It seems that the relative path is not constructed correctly.
100%███████████████████████████████████████████| 1/1 [00:05<00:00, 5.78s/it][INFO|trainer.py:3801] 2024-12-19 20:23:29,329 >> Saving model checkpoint to ../../models/wildfeedback-december/phi-wildfeedback-gpt4o-sft/checkpoint-1
Traceback (most recent call last):
File "/app/src/llamafactory/launcher.py", line 23, in <module>
launch()
File "/app/src/llamafactory/launcher.py", line 19, in launch
run_exp()
File "/app/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/app/src/llamafactory/train/sft/workflow.py", line 101, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 2122, in train
return inner_training_loop(
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 2541, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3000, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3090, in _save_checkpoint
self.save_model(output_dir, _internal_call=True)
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3706, in save_model
self._save(output_dir, state_dict=state_dict)
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3823, in _save
self.model.save_pretrained(
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2809, in save_pretrained
custom_object_save(self, save_directory, config=self.config)
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 623, in custom_object_save
for needed_file in get_relative_import_files(object_file):
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 128, in get_relative_import_files
new_imports.extend(get_relative_imports(f))
File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 97, in get_relative_imports
with open(module_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/models/phi3/..generation.py'
Expected behavior
No response
Others
I used the same yaml config to train llama and qwen but received no error. I can also save the model using the following code, which supposedly uses the same save_pretrained method.
fromtransformersimportAutoModel, AutoTokenizer# Path to your modelmodel_path="../../models/Phi-3-mini-4k-instruct"# Update with your pathoutput_path="../../models/wildfeedback-december/phi-wildfeedback-gpt4o-sft"# Directory to save the modeltry:
# Load the model and tokenizermodel=AutoModel.from_pretrained(model_path)
tokenizer=AutoTokenizer.from_pretrained(model_path)
print("Model and tokenizer loaded successfully.")
# Save the model and tokenizermodel.save_pretrained(output_path)
tokenizer.save_pretrained(output_path)
print(f"Model and tokenizer saved successfully to {output_path}.")
exceptExceptionase:
print(f"An error occurred: {e}")
Could anyone please help? Thank you!
The text was updated successfully, but these errors were encountered:
Reminder
System Info
llamafactory
version: 0.9.2.dev0Reproduction
command:
phi3.yaml:
After training, when saving the model, I got the following error. It seems that the relative path is not constructed correctly.
Expected behavior
No response
Others
I used the same yaml config to train llama and qwen but received no error. I can also save the model using the following code, which supposedly uses the same
save_pretrained
method.Could anyone please help? Thank you!
The text was updated successfully, but these errors were encountered: