Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trian error #42

Open
DUhaixia opened this issue May 12, 2022 · 1 comment
Open

trian error #42

DUhaixia opened this issue May 12, 2022 · 1 comment

Comments

@DUhaixia
Copy link

one gpu training,run trian.py meet this error

``
Traceback (most recent call last):
File "D:/rhnet-daima/res-loglikelihood-regression-master/res-loglikelihood-regression-master/scripts/train.py", line 172, in
main()
File "D:/rhnet-daima/res-loglikelihood-regression-master/res-loglikelihood-regression-master/scripts/train.py", line 45, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(opt, cfg))
File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\multiprocessing\spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\multiprocessing\spawn.py", line 157, in start_processes
while not context.join():
File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\multiprocessing\spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\multiprocessing\spawn.py", line 19, in _wrap
fn(i, *args)
File "D:\rhnet-daima\res-loglikelihood-regression-master\res-loglikelihood-regression-master\scripts\train.py", line 55, in main_worker
init_dist(opt)
File "D:\rhnet-daima\res-loglikelihood-regression-master\res-loglikelihood-regression-master\rlepose\utils\env.py", line 24, in init_dist
world_size=opt.world_size, rank=opt.rank)
File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group
init_method, rank, world_size, timeout=timeout
File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for tcp://

@Jeff-sjtu
Copy link
Owner

Hi @DUhaixia, you should change WORLD_SIZE in the config file to 1 when you use one GPU for training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants