Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 1.4k
Star 14.7k

Code
Issues 616
Pull requests 53
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: Dao-AILab/flash-attention

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

616 Open 554 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

4 Failing test_flash_attn_output_fp8 tests on H100

#1404 opened Dec 20, 2024 by BioGeek

Does bar.sync Emit Semaphores Alongside bar.arrive?

#1403 opened Dec 20, 2024 by ziyuhuang123

is flash_attn_with_kvcache() supposed to work for seqlen > 1 ?

#1402 opened Dec 20, 2024 by vince62s

Understanding sync and arrive in FA3 Store Function

#1401 opened Dec 19, 2024 by ziyuhuang123

Understanding the Role of arrive in NamedBarrier Synchronization

#1400 opened Dec 19, 2024 by ziyuhuang123

The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups

#1398 opened Dec 19, 2024 by tengdecheng

Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space?

#1396 opened Dec 18, 2024 by ziyuhuang123

g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values.

#1395 opened Dec 18, 2024 by NVIDIA-JerryChen

Large loss of accuracy between flashattention and native

#1391 opened Dec 17, 2024 by fanfanaaaa

a small typo and fix

#1390 opened Dec 16, 2024 by liguohao96

Why does NamedBarrier in epilogue use NumMmaThreads(256) + NumThreadsPerWarp(32)?

#1389 opened Dec 16, 2024 by ziyuhuang123

Windows 11 Installation Error

#1388 opened Dec 16, 2024 by 404-xianjin

is fwd_kvcache compatible with torch.compile in 2.7.2post1 ?

#1386 opened Dec 14, 2024 by vince62s

How to get actual col idx

#1385 opened Dec 12, 2024 by wenkechen

[ROCm] benchmark_flash_attention.py failing with Memory Access Fault

#1381 opened Dec 11, 2024 by nikhil-tensorwave

Possible to install with just torch installed?

#1379 opened Dec 10, 2024 by davidmezzetti

Flash attention 3 does not use Dropout_p?

#1377 opened Dec 9, 2024 by nighting0le01

Can wgmma.async and barrier.arrive Ensure GEMM Completion Before Moving Forward?

#1373 opened Dec 6, 2024 by ziyuhuang123

Why we have a third barrier::QueryEmpty arrive?

#1372 opened Dec 6, 2024 by ziyuhuang123

GLT

#1369 opened Dec 6, 2024 by deepgandu

Issue Installing cuDNN Python Module via pip install cudnn

#1367 opened Dec 4, 2024 by ziyuhuang123

Sliding Window (Local Attention) possibly incorrect on newest branch

#1366 opened Dec 3, 2024 by kilianhaefeli

Is there any way to compile the codes with nvcc debug flag(-G)?

#1364 opened Dec 2, 2024 by Dev-Jahn

Triton Issues for Rotary flash_attn.layers.rotary.apply_rotary_emb_qkv_

#1362 opened Nov 29, 2024 by albertotono

Need tests/__init__.py for hopper/test_flash_attn.py

#1360 opened Nov 28, 2024 by hancheolcho

Previous 1 2 3 4 5 … 24 25 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly