Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Z-eloto · 2024-11-12T07:00:34Z

Thank you for sharing your codes.
When running gpt2/kd/kd_medium.sh on 2*3090, the program encountered this error. What should I do, such as adjusting the learning rate?

shiboyu1999 · 2024-11-18T07:19:41Z

You can use fp32 to train the model or decrease the batch size to 1.

Z-eloto · 2024-11-19T11:09:19Z

You can use fp32 to train the model or decrease the batch size to 1.

Thanks. I will try it. :)

t1101675 · 2024-11-23T23:29:19Z

You can also try using bfloat16 by replacing ds_config_zero1_fp16.json in this line with ds_config_zero1_bf16.json.

Z-eloto · 2024-12-04T01:32:25Z

You can also try using bfloat16 by replacing ds_config_zero1_fp16.json in this line with ds_config_zero1_bf16.json.

OK, I have already resolved this problem. Thanks :)
But I also want to know if this modification will affect the experimental results?

t1101675 · 2024-12-04T05:47:04Z

This will not affect the results much. In fact, bf16 will be more stable in training than fp16 and will not suffer from the "Current loss scale already at minimum" problem.

Z-eloto · 2024-12-04T09:06:44Z

I see. Many thanks!

This will not affect the results much. In fact, bf16 will be more stable in training than fp16 and will not suffer from the "Current loss scale already at minimum" problem.

t1101675 mentioned this issue Nov 23, 2024

MiniLLM and Data selection #285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Z-eloto commented Nov 12, 2024 •

edited

Loading

shiboyu1999 commented Nov 18, 2024

Z-eloto commented Nov 19, 2024

t1101675 commented Nov 23, 2024

Z-eloto commented Dec 4, 2024 •

edited

Loading

t1101675 commented Dec 4, 2024

Z-eloto commented Dec 4, 2024

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Exception: Current loss scale already at minimum - cannot decrease scale anymore #280

Comments

Z-eloto commented Nov 12, 2024 • edited Loading

shiboyu1999 commented Nov 18, 2024

Z-eloto commented Nov 19, 2024

t1101675 commented Nov 23, 2024

Z-eloto commented Dec 4, 2024 • edited Loading

t1101675 commented Dec 4, 2024

Z-eloto commented Dec 4, 2024

Z-eloto commented Nov 12, 2024 •

edited

Loading

Z-eloto commented Dec 4, 2024 •

edited

Loading