You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Output
Server Output:
Sending 10 tagged messages
Waiting for messages to complete
munmap_chunk(): invalid pointer
Aborted (core dumped)
Server Backtrace:
gdb) bt
#0 0x00007ffff6496aff in raise () from /lib64/libc.so.6 #1 0x00007ffff6469ea5 in abort () from /lib64/libc.so.6 #2 0x00007ffff64d9097 in __libc_message () from /lib64/libc.so.6 #3 0x00007ffff64e04ec in malloc_printerr () from /lib64/libc.so.6 #4 0x00007ffff64e079c in munmap_chunk () from /lib64/libc.so.6 #5 0x00007ffff7a88e0f in psm3_free_internal (ptr=0x735a80, curloc=0x7ffff7b12953 "prov/psm3/psm3/psm_ep.c:1163")
at prov/psm3/psm3/psm_utils.c:3964 #6 0x00007ffff7a63d41 in psm3_ep_close (ep=0x636ac0, mode=0, timeout_in=2000000000) at prov/psm3/psm3/psm_ep.c:1163 #7 0x00007ffff7a29b31 in psmx3_trx_ctxt_free (trx_ctxt=0x62b3a0, usage_flags=3) at prov/psm3/src/psmx3_trx_ctxt.c:223 #8 0x00007ffff7a11cea in psmx3_ep_close (fid=0x7349b0) at prov/psm3/src/psmx3_ep.c:234 #9 0x0000000000403fb1 in fi_close (fid=)
at /path_to_libfabric_install/include/rdma/fabric.h:632 #10 ft_close_fids () at common/shared.c:1792 #11 0x0000000000404a9a in ft_free_res () at common/shared.c:1862 #12 0x0000000000401b2a in main (argc=, argv=) at functional/rdm_tagged_peek.c:364
Client Output:
Peek for a bad msg
Peek w/ claim for a bad msg
Peek msg 1
Receive msg 1
Peek w/ claim msg 2
Receive claimed msg 2
Peek & discard msg 3
Checking to see if msg 3 was discarded
Peek w/ claim msg 4
Claim and discard msg 4
Receive msg 5
Receive msg 6
Receive msg 10
Receive msg 9
Receive msg 8
Receive msg 7
Environment:
rocky 8.7 mlnx 5.0
Additional context
Setting and unsetting FI_PROVIDER fixes this bug
Specific free() call that fails is freeing the hfi_nids struct in file psm_ep.c:1163
The text was updated successfully, but these errors were encountered:
@zachdworkin Is this still reproducible? Can you provide any details on the system hardware and configuration?
I've been unable to reproduce this on our PSM test systems so far (tried RHEL 8.10 w/single MT28000 in eth mode on the commit just prior to the test disable commit).
Based on the stack trace above, this is hitting a libc malloc guard on free(). It thinks it's freeing a memory mapped pointer, which should not be the case here. This suggests perhaps that the private malloc header got overwritten, e.g. a buffer underflow.
fi_rdm_tagged_peek fails to cleanup on the server side with "munmap_chunk(): invalid pointer" if FI_PROVIDER="psm3" is set.
To Reproduce
server_cmd: FI_PROVIDER=psm3 fi_rdm_tagged_peek -p psm3 -E
client_cmd: FI_PROVIDER=psm3 fi_rdm_tagged_peek -p psm3 -E "server_address"
Expected behavior
Test passes successfully
Output
Server Output:
Sending 10 tagged messages
Waiting for messages to complete
munmap_chunk(): invalid pointer
Aborted (core dumped)
Server Backtrace:
gdb) bt
#0 0x00007ffff6496aff in raise () from /lib64/libc.so.6
#1 0x00007ffff6469ea5 in abort () from /lib64/libc.so.6
#2 0x00007ffff64d9097 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff64e04ec in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff64e079c in munmap_chunk () from /lib64/libc.so.6
#5 0x00007ffff7a88e0f in psm3_free_internal (ptr=0x735a80, curloc=0x7ffff7b12953 "prov/psm3/psm3/psm_ep.c:1163")
at prov/psm3/psm3/psm_utils.c:3964
#6 0x00007ffff7a63d41 in psm3_ep_close (ep=0x636ac0, mode=0, timeout_in=2000000000) at prov/psm3/psm3/psm_ep.c:1163
#7 0x00007ffff7a29b31 in psmx3_trx_ctxt_free (trx_ctxt=0x62b3a0, usage_flags=3) at prov/psm3/src/psmx3_trx_ctxt.c:223
#8 0x00007ffff7a11cea in psmx3_ep_close (fid=0x7349b0) at prov/psm3/src/psmx3_ep.c:234
#9 0x0000000000403fb1 in fi_close (fid=)
at /path_to_libfabric_install/include/rdma/fabric.h:632
#10 ft_close_fids () at common/shared.c:1792
#11 0x0000000000404a9a in ft_free_res () at common/shared.c:1862
#12 0x0000000000401b2a in main (argc=, argv=) at functional/rdm_tagged_peek.c:364
Client Output:
Peek for a bad msg
Peek w/ claim for a bad msg
Peek msg 1
Receive msg 1
Peek w/ claim msg 2
Receive claimed msg 2
Peek & discard msg 3
Checking to see if msg 3 was discarded
Peek w/ claim msg 4
Claim and discard msg 4
Receive msg 5
Receive msg 6
Receive msg 10
Receive msg 9
Receive msg 8
Receive msg 7
Environment:
rocky 8.7 mlnx 5.0
Additional context
Setting and unsetting FI_PROVIDER fixes this bug
Specific free() call that fails is freeing the hfi_nids struct in file psm_ep.c:1163
The text was updated successfully, but these errors were encountered: