-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Trace Option Causing TypeError in mLoRA Training #268
Comments
seem the pytorch changed the metadata function, can you check the type of |
Thanks for the reply. grad_fn.metadata type is <class 'dict'> and it is empty at this point. |
can we just change the grad_fn.metadata() to grad_fn.metadata? i don't know when pytorch change this function. |
Thank you! It works well. However, I’m unable to open test_report.nsys-rep with NVIDIA Nsight Compute. Could you provide a compilable version as mentioned in the pull request?
Thanks! |
you can use any version, must ensure your NVIDIA Nsight Compute version is higher than the nsys cli version. |
I encountered an error when using the --trace option. The error message indicates the following:
I executed the command:
nsys profile -w true -t cuda,nvtx -s none -o test_report -f true -x true python mlora_train.py --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 --device "cuda:0" --config /projects/bcrn/mLoRA/demo/lora/lora_case_1.yaml --trace
or simply added --trace after normal commands.
Could you please help me understand why this error is occurring? And could you help me with using trace? Thanks!
The text was updated successfully, but these errors were encountered: