-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Issues: Dao-AILab/flash-attention
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
is flash_attn_with_kvcache() supposed to work for seqlen > 1 ?
#1402
opened Dec 20, 2024 by
vince62s
Understanding the Role of arrive in NamedBarrier Synchronization
#1400
opened Dec 19, 2024 by
ziyuhuang123
Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space?
#1396
opened Dec 18, 2024 by
ziyuhuang123
g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values.
#1395
opened Dec 18, 2024 by
NVIDIA-JerryChen
Why does NamedBarrier in epilogue use NumMmaThreads(256) + NumThreadsPerWarp(32)?
#1389
opened Dec 16, 2024 by
ziyuhuang123
[ROCm] benchmark_flash_attention.py failing with Memory Access Fault
#1381
opened Dec 11, 2024 by
nikhil-tensorwave
Can wgmma.async and barrier.arrive Ensure GEMM Completion Before Moving Forward?
#1373
opened Dec 6, 2024 by
ziyuhuang123
Sliding Window (Local Attention) possibly incorrect on newest branch
#1366
opened Dec 3, 2024 by
kilianhaefeli
Is there any way to compile the codes with nvcc debug flag(-G)?
#1364
opened Dec 2, 2024 by
Dev-Jahn
Triton Issues for Rotary flash_attn.layers.rotary.apply_rotary_emb_qkv_
#1362
opened Nov 29, 2024 by
albertotono
Previous Next
ProTip!
Follow long discussions with comments:>50.