-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can DeepSpeed be configured to prevent the merging of parameter groups #6878
Comments
@CLL112, DeepSpeed already supports this request. For example, we don't merge weights and biases, which are typically implemented as different param groups. It would be helpful to have a full repro for us to understand what is going with your scenario. |
I have rewritten the optimizers and separately set the learning rate for the
I printed the relevant parameters:
However, in
The output is:
This is strange; there are no Param group 1 and 2. I am using DeepSpeed’s Zero3. Does this change the Param group?
|
@CLL112, can you please share simple but full repro code to debug? |
The optimizer has been re-implemented to group parameters and set different learning rates for each group. However, after using DeepSpeed, all the
param_groups
are merged into one. How can this be prevented?The text was updated successfully, but these errors were encountered: