Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add int8 to gemm w/ addmatrix and consider onednn provider #3040

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alexbaden
Copy link
Contributor

Update the gemm addmatrix benchmark to support int8 inputs as well as bfloat16.

The int8 benchmark is pretty slow - not because Triton performance is bad (it is at least on par with bfloat16) but because PyTorch does not support int8 matmul on GPU, so we have to do the matmul on the GPU. This makes the benchmark something like 20x slower. To fix that, I changed the PyTorch accuracy check to only run for a few shapes instead of all the shapes - I tried to pick shapes that I thought were representative of different cases but am open to suggestions. Now the benchmark runs in reasonable time.

A few open items need to be addressed:

  • for int8 we want a separate geomean (i.e. geomean of bfloat16, and geomean of int8). What's the best way to keep int8 and bfloat16 separate? I can introduce an environment variable and run the benchmark twice - once only bfloat16, once only int8. Open to other suggestions.
  • for onednn comparison, there is no problem with bfloat16 but there is no support for GPU matmul w/ int8. I don't think we want to run the comparison vs CPU (it takes too long and gives us no info), so I might need to introduce the env variable anyway to do one run which is float16 w/ onednn, and another run which is int8 w/out onednn.

cc #3014

@Egor-Krivov
Copy link
Contributor

Egor-Krivov commented Dec 18, 2024

I think we treat them as 2 separate benchmarks in terms of reporting (So we have 2 lines of --benchmark gemm-postop-addmatrix & --benchmark gemm-postop-addmatrix-int8). Then all geomeans will work as intended. Otherwise we'll have to introduce some geomean groups in our database.

Then we either run benchmark script twice with different dypes to generate 2 separate report files (I'd prefer this), or modify our report script to add some filtering capability.

About onednn and int8 support. Do we want to measure onednn? If not, and we run it only for validation, maybe we could run it in other precision, like fp32 or bf16 just for validation of the output.

@Egor-Krivov
Copy link
Contributor

My only issue with this PR right now is that all charts and GeoMeans for addmatrix benchmark will be discontinued, due to new parameters. Hence, my suggestion to introduce separate benchmark for int8

@vlad-penkin vlad-penkin linked an issue Dec 18, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add onednn to gemm benchmarks
2 participants