Minimize difference between Intel port and OpenAI Triton #2030

etiotto · 2024-08-28T19:52:54Z

I took a first pass at the difference between the latest OpenAI Triton code upstream and our fork. We have 66 common files showing a difference. To obtain the difference use the following command on the latest llvm-target (the commit ID is from our last merge):

# Search for the last merge commit id in the git log.
COMMIT_ID=`git log | grep "Merge commit" | head -1 | cut -d "'" -f2`

# Obtain the list of modified files and the difference.
echo "*********** MODIFIED FILES ***********"
git diff $COMMIT_ID --diff-filter=CDMRTUXB | grep "diff --" | cut -d"a" -f2- | cut -d" " -f1 | cut -d"/" -f2- 2>&1

echo "*********** DIFFERENCES ***********"
git diff $COMMIT_ID --diff-filter=CDMRTUXB 2>&1

The file containing a difference are in the following table. The 2nd column labeled "Upstreamable" indicates whether the diff. in that file are upstreamable or not, and whether we should attempt to upstream them now or in the future (i.e. we need to upstream our BE in third_party in order to upstream the difference). Specificaly:

"Future" means we cannot upstream now, e.g., it depends on Intel specify features, but should be upstreamable when the OpenAI community accept our BE
"Now" means it is actionable now, and we should attempt to upstream it
"Partially" means a mix of both
"N" means not upstreamable.

The 3rd column indicates whether we should move the difference (or the file) in the third_party/intel directory. The 4th column indicates whether the difference could be reduced or not.

File	Upstreamable	Movable to third_party/intel	Can be reduced?	Comments
.pre-commit-config.yaml	Now	N	N	Contains extra pre-commits
CMakeLists.txt	Future	N	?	set LLVM_CONFIG needed?
LICENSE	Future	N	N
README.md	Future	N	N
bin/CMakeLists.txt	Future	N	N
bin/RegisterTritonDialects.h	Future	N	N
bin/triton-opt.cpp	Future	N	N
docs/conf.py	N	N	Y	Make it common with upstream
docs/index.rst	Future	N	?	Programming Guide Section is not upstream
include/triton/Conversion/TritonGPUToLLVM/Utility.h	N	N	Y	Remove differences
include/triton/Conversion/TritonToTritonGPU/Passes.td	N	N	Y	Remove differences
include/triton/Dialect/Triton/IR/TritonTypes.td	N	N	?	How to remove "F8E4M3B11FNUZ"?
include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td	Now	N	N	Try to upstream changes
include/triton/Tools/Sys/GetEnv.hpp	Future	N	Future	Clean upn INTEL env. variables
lib/Analysis/Utility.cpp	Future	N	N
lib/Conversion/TritonGPUToLLVM/CMakeLists.txt	Future	N	N
lib/Dialect/Triton/IR/Ops.cpp	Now	N	N
lib/Dialect/TritonGPU/IR/CMakeLists.txt	Future	N	N
lib/Dialect/TritonGPU/IR/Dialect.cpp	Future	N	N	Note: some changes for warp layout needs to be removed (not upstreamable)
lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp	Now	N	N	Try to upstream
lib/Target/CMakeLists.txt	Future	N	N
pyproject.toml	?	N	?	Can remove "importlib_metadata" or upstream this change ?
python/pyproject.toml	?	N	?	Can remove "importlib_metadata" or upstream the change?
python/setup.py	Upstream part of the changes now	N	Y	Can "get_install_requires" be removed?, address: # FIXME: pytorch<2.3.0 doesn't support numpy 2.0
python/src/ir.cc	N	N	Y	TODO: align with upstream code to use i8, can we remove "get_threads_per_warp" ?
python/src/llvm.cc	N	Y	N	Need SLPVectorization = true, how to make that pass work for us? Alternatively make it vendor specific?
python/test/regression/test_cast_matmul.py	Partially	N	Y	Device passing could be upstreamed ?
python/test/regression/test_functional_regressions.py	Partially	N	Y	Remove import intel_extension_for_pytorch
python/test/unit/instrumentation/test_gpuhello.py	Partially	N	Y	Device passing could be upstreamed ?
python/test/unit/language/assert_helper.py	Partially	N	Y	Pass device instead of hard coding it
python/test/unit/language/print_helper.py	Partially	N	Y	Pass device instead of hard coding it
python/test/unit/language/test_annotations.py	N	N	Y	Remove import intel_extension_for_pytorch
python/test/unit/language/test_block_pointer.py	Partially	N	Y	Remove import intel_extension_for_pytorch, is_cuda()
python/test/unit/language/test_conversions.py	Partially	N	Y	Pass device instead of hard coding it
python/test/unit/language/test_core.py	Partially	N	Y	Need further investigation to reduce diff with upstream
python/test/unit/language/test_line_info.py	Partially	N	Y	Lines have diff. number, try to reduce diff with upstream
python/test/unit/language/test_pipeliner.py	N	N	Y	Remove import intel_extension_for_pytorch
python/test/unit/language/test_random.py	N	N	Y	Remove import intel_extension_for_pytorch
python/test/unit/language/test_subprocess.py	N	N	Y	Need further investigation
python/test/unit/runtime/test_autotuner.py	N	N	Y	Remove import intel_extension_for_pytorch
python/test/unit/runtime/test_cache.py	Partially	N	Y	Pass device instead of hard coding it
python/test/unit/runtime/test_cublas.py	N	N	Y	pytest.skip vs pytest.xfail difference
python/test/unit/runtime/test_driver.py	Future	N	Y	Remove import intel_extension_for_pytorch
python/test/unit/runtime/test_jit.py	Future	N	Y	Remove import intel_extension_for_pytorch
python/test/unit/runtime/test_launch.py	Future	N	Y	Pass device instead of hard coding it
python/test/unit/runtime/test_subproc.py	N	N	Y	Remove import intel_extension_for_pytorch
python/triton/backends/compiler.py	Future	N	N
python/triton/compiler/compiler.py	Future	N	N	Note: we need to add build_flags into metadata, upstream might accept?
python/triton/language/extra/init.py	Future	N	N	Should be upstreamable later
python/triton/language/semantic.py	Future	N	N
python/triton/runtime/build.py	Future	N	N
python/triton/testing.py	N	N	Y	Cleanup USE_WALL_TIME
python/triton/tools/compile.py	N	N	?	What does AMD do for this file?
python/tutorials/01-vector-add.py	Partially	N	Y	Remove import intel_extension_for_pytorch, can pass device?
python/tutorials/02-fused-softmax.py	N	N	Y	Remove import intel_extension_for_pytorch
python/tutorials/03-matrix-multiplication.py	Partially	N	Y	Remove import intel_extension_for_pytorch
python/tutorials/04-low-memory-dropout.py	Partially	N	Y	Remove import intel_extension_for_pytorch
python/tutorials/05-layer-norm.py	Partially	N	Y	Remove import intel_extension_for_pytorch
python/tutorials/06-fused-attention.py	Partially	N	Y	Remove import intel_extension_for_pytorch
python/tutorials/07-extern-functions.py	Partially	N	Y	Remove import intel_extension_for_pytorch
python/tutorials/08-grouped-gemm.py	Partially	N	Y	Remove import intel_extension_for_pytorch
test/CMakeLists.txt	Future	N	N
third_party/nvidia/CMakeLists.txt	N	N	Y	Try to make common
unittest/Conversion/TritonGPUToLLVM/CMakeLists.txt	N	N	?	Investigate
unittest/Conversion/TritonGPUToLLVM/PTXAsmFormatTest.cpp	N	N	?	Investigate
unittest/Dialect/TritonGPU/CMakeLists.txt	Future	N	N

The following table summarizes features (or lack thereof) that need to be redesigned in order to remove the difference thy cause in common files. For example "warp layout" is a feature of our advanced codegen path which requires some redesign to avoid changes in common files.

Feature	Files Affected	Comments
F8E4M3B11FNUZ	include/triton/Dialect/Triton/IR/TritonTypes.td, /python/src/ir.cc	Determine how to align with upstream
warp layout	lib/Dialect/TritonGPU/IR/Dialect.cpp	How to handle warp layout without invasive changes ?
passing device	unit tests/tutorials	Need to pass device to tests/tutorials rather than hardcoding "cuda"

The text was updated successfully, but these errors were encountered:

anmyachev · 2024-08-30T09:45:29Z

@etiotto I moved PTXAsmFormatTest.cpp test into third_party/nvidia folder in Triton PR#4608. There should be no differences from upstream in unittest/Conversion/ folder now.

etiotto · 2024-08-30T14:56:53Z

After PR #2064 lands we will have 48 files with differences (down from 66):

git diff a78c9c40aca4f6ad80deef39682a32056ea8976f --diff-filter=CDMRTUXB | grep "diff --" | cut -d"a" -f2- | cut -d" " -f1 | cut -d"/" -f2-|wc                                 ✔  10243  14:55:43 
     48      48    1682

…keLists.txt` (#2887) Part of #2030 Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev · 2024-12-03T13:45:08Z

Status update:

@whitneywhtsang maybe it's time for a status update? (again)

whitneywhtsang · 2024-12-03T14:20:11Z

Status update:

@whitneywhtsang maybe it's time for a status update? (again)

Yup, will work on that.

…core.py` (#2906) Part of #2030 Namely here: https://github.com/intel/intel-xpu-backend-for-triton/blob/6588f0de8fb46be7839ed0f111c99b97646f210c/python/test/unit/language/test_core.py#L1536 Initially was added in #760. At that time, `float16` was not tested in `test_core.py` Signed-off-by: Anatoly Myachev <[email protected]>

Part of #2030 This change does not affect the pass rate. --------- Signed-off-by: Anatoly Myachev <[email protected]>

whitneywhtsang · 2024-12-04T01:24:37Z

Status update:

# of modified files: 53
- python/triton/runtime/build.py
- Change for trying different GRF mode:
  - python/triton/compiler/compiler.py
- UpcastMXFPOp related changes:
  - lib/Dialect/TritonGPU/IR/Ops.cpp
- FreezeMaskedDivRem related changes:
  - bin/triton-llvm-opt.cpp
- SPIRVDialect related changes:
  - bin/triton-opt.cpp
- dpas related changes:
  - lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp
- Likely cannot reduce:
  - .gitignore
  - LICENSE
  - bin/RegisterTritonDialects.h
  - docs/index.rst
  - lib/Conversion/TritonGPUToLLVM/CMakeLists.txt
  - lib/Dialect/TritonGPU/IR/CMakeLists.txt
  - python/triton/backends/compiler.py
  - test/lib/Analysis/CMakeLists.txt
  - python/triton/language/semantic.py
- include/triton/Tools/Sys/GetEnv.hpp (contains changes for advanced path)
- lib/Dialect/TritonGPU/IR/Dialect.cpp Upstream changes in lib/Dialect/TritonGPU/IR/Dialect.cpp #2917 (contains changes for advanced path)
- lib/Analysis/Utility.cpp (contains changes for advanced path)
- lib/Conversion/TritonGPUToLLVM/DecomposeUnsupportedConversions.cpp (contains changes for advanced path)
- lib/Target/CMakeLists.txt Move lib/Target/SPIRV to third_party/intel #2918
- .pre-commit-config.yaml Upstream changes in .pre-commit-config.yaml #2913
- python/triton/testing.py Remove wall timing fallback from python/triton/testing.py #2958
- bin/triton-tensor-layout.cpp Upstream changes in bin/triton-tensor-layout.cpp #2904
- include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td Remove getElemsPerThreadForOperands from MmaEncodingTrait #2823
- python/src/ir.cc [Upstream] Upstream the threads_per_warp changes in #414 #1135 Only print warning under MLIR_ENABLE_DIAGNOSTICS #2943
- Move triton-translate to third_party/intel/bin #2927
  - bin/CMakeLists.txt
  - test/CMakeLists.txt
- Reduce changes in common files related to F8E4M3B11FNUZ #2903
  - include/triton/Dialect/Triton/IR/TritonTypes.td
  - python/src/ir.cc
- Reland upstream commit 340cbc6 #2811
  - lib/Dialect/TritonGPU/Transforms/OptimizeDotOperands.cpp
  - test/TritonGPU/dot-operands.mlir
- Reduce changes in common files for windows support #2824
  - CMakeLists.txt
  - python/setup.py
  - python/triton/runtime/build.py
  - third_party/nvidia/backend/driver.c
  - third_party/nvidia/backend/driver.py
  - third_party/nvidia/include/cublas_instance.h
- python/test: 10
- python/tutorials: 8
# of new files: 405
- third_party/intel: 169
- .github: 39
- benchmarks: 40
- scripts: 63
- docs: 15
- test: 56
- utils/SPIRVRunner: 8
- DEVELOPMENT.md
- RELEASE.md
- SECURITY.md
- bandit.yaml
- bin/triton-translate.cpp Move triton-translate to third_party/intel/bin #2927
- cmake/FindSPIRVToLLVMTranslator.cmake Move FindSPIRVToLLVMTranslator.cmake into third_party/intel #2935
- include/triton/Target/SPIRV/SPIRVTranslation.h Move lib/Target/SPIRV to third_party/intel #2918
- lib/Target/SPIRV/CMakeLists.txt Move lib/Target/SPIRV to third_party/intel #2918
- lib/Target/SPIRV/SPIRVTranslation.cpp Move lib/Target/SPIRV to third_party/intel #2918
- lib/Target/SPIRV/spirv-llvm-translator.conf Move lib/Target/SPIRV to third_party/intel #2918
- python/test/conftest.py
- python/test/regression/test_divide.py
- python/triton/runtime/CLFinder.py Get rid of using CLFinder.py #2960
- python/tutorials/10-experimental-block-pointer.py
- python/tutorials/10i-experimental-block-pointer.py

Part of #2030 More context: #2913 Remaining bandit check in https://github.com/intel/intel-xpu-backend-for-triton/blob/main/.github/workflows/bandit-check.yml --------- Signed-off-by: Anatoly Myachev <[email protected]>

whitneywhtsang · 2024-12-06T14:22:47Z

Status update:

# of modified files: 52
# of new files: 402
- third_party/intel: 169

Part of #2030 Part of #2824 For all other compilers the situation is about the same, it is expected that they are already in the paths. I don't think it should be any different for Windows. --------- Signed-off-by: Anatoly Myachev <[email protected]>

…modifying lit config file (#2965) Part of #2030 Signed-off-by: Anatoly Myachev <[email protected]>

whitneywhtsang · 2024-12-12T04:09:25Z

Status update:

# of modified files: 66
# of new files: 403
- third_party/intel: 172

etiotto self-assigned this Aug 28, 2024

anmyachev mentioned this issue Aug 28, 2024

Revert PR#750 which add importlib_metadata dep #2031

Closed

This was linked to pull requests Aug 28, 2024

[NFI] Change LLVM_CONFIG setting position #2034

Merged

[NFC] Code refactoring #2035

Merged

whitneywhtsang closed this as completed in #2035 Aug 28, 2024

whitneywhtsang reopened this Aug 28, 2024

whitneywhtsang linked a pull request Aug 28, 2024 that will close this issue

Create intel version optimize_module #2038

Merged

vlad-penkin added umbrella research labels Aug 29, 2024

vlad-penkin added this to the 0.2 [Triton] Upstream milestone Aug 29, 2024

vlad-penkin added upstream: triton enhancement New feature or request dependencies: ipex labels Aug 29, 2024

whitneywhtsang closed this as completed in #2034 Aug 29, 2024

etiotto reopened this Aug 29, 2024

whitneywhtsang closed this as completed in #2038 Aug 29, 2024

whitneywhtsang reopened this Aug 29, 2024

etiotto linked a pull request Aug 29, 2024 that will close this issue

Remove import intel_extension_for_pytorch #2049

Merged

anmyachev mentioned this issue Aug 29, 2024

Revert changes in python/triton/tools/compile.py #2056

Merged

whitneywhtsang closed this as completed in #2056 Aug 29, 2024

whitneywhtsang closed this as completed in ca6e12a Aug 29, 2024

whitneywhtsang reopened this Aug 29, 2024

whitneywhtsang closed this as completed in #2049 Aug 30, 2024

whitneywhtsang reopened this Aug 30, 2024

etiotto linked a pull request Aug 30, 2024 that will close this issue

Remove unnecessary change in common code #2064

Merged

This was referenced Aug 30, 2024

Move DPAStoLinearLayoutTest.cpp test into third_party/intel/unittest folder #2067

Merged

Move DPAStoLinearLayoutTest.cpp test into third_party/intel/unittest folder #2069

Closed

anmyachev added a commit that referenced this issue Dec 1, 2024

Don't specify TritonSPIRV translation library directly in main `CMa…

c42f4a7

…keLists.txt` (#2887) Part of #2030 Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev mentioned this issue Dec 3, 2024

Remove test_emulated_atomics.py since float16 is tested in test_core.py #2906

Merged

anmyachev mentioned this issue Dec 3, 2024

Ignore language/test_cublas.py directly in test-triton.sh #2908

Merged

anmyachev added a commit that referenced this issue Dec 3, 2024

Ignore language/test_cublas.py directly in test-triton.sh (#2908)

3cda01c

Part of #2030 This change does not affect the pass rate. --------- Signed-off-by: Anatoly Myachev <[email protected]>

whitneywhtsang linked a pull request Dec 4, 2024 that will close this issue

Sync from upstream #2914

Merged

anmyachev closed this as completed in #2914 Dec 4, 2024

whitneywhtsang reopened this Dec 4, 2024

vlad-penkin changed the title ~~Classify difference between Intel port and OpenAI Triton~~ Minimize difference between Intel port and OpenAI Triton Dec 4, 2024

This was referenced Dec 4, 2024

Move triton-translate to third_party/intel/bin #2927

Closed

Remove PyCQA/bandit pre-commit check #2928

Merged

This was referenced Dec 5, 2024

Move lib/Target/SPIRV to third_party/intel #2921

Merged

Move FindSPIRVToLLVMTranslator.cmake into third_party/intel #2935

Closed

whitneywhtsang linked a pull request Dec 5, 2024 that will close this issue

Only print warning under MLIR_ENABLE_DIAGNOSTICS #2943

Merged

whitneywhtsang closed this as completed in #2943 Dec 6, 2024

anmyachev reopened this Dec 6, 2024

This was referenced Dec 6, 2024

Calculate the pass rate without changing pytest.skip to pytest.xfail in tests. #2957

Open

Remove wall timing fallback from python/triton/testing.py #2958

Open

Setup MSVC environment in CI workflow #2942

Merged

This was referenced Dec 8, 2024

Copy triton-translate to triton_BINARY_DIR/bin folder instead of modifying lit config file #2965

Merged

Use get_current_target function to select the device to run tutorials on #2969

Closed

whitneywhtsang pushed a commit that referenced this issue Dec 9, 2024

Copy triton-translate to triton_BINARY_DIR/bin folder instead of …

a6f0f80

…modifying lit config file (#2965) Part of #2030 Signed-off-by: Anatoly Myachev <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize difference between Intel port and OpenAI Triton #2030

Minimize difference between Intel port and OpenAI Triton #2030

etiotto commented Aug 28, 2024 •

edited by whitneywhtsang

Loading

anmyachev commented Aug 30, 2024

etiotto commented Aug 30, 2024 •

edited

Loading

anmyachev commented Dec 3, 2024

whitneywhtsang commented Dec 3, 2024

whitneywhtsang commented Dec 4, 2024 •

edited by anmyachev

Loading

whitneywhtsang commented Dec 6, 2024 •

edited

Loading

whitneywhtsang commented Dec 12, 2024

Minimize difference between Intel port and OpenAI Triton #2030

Minimize difference between Intel port and OpenAI Triton #2030

Comments

etiotto commented Aug 28, 2024 • edited by whitneywhtsang Loading

anmyachev commented Aug 30, 2024

etiotto commented Aug 30, 2024 • edited Loading

anmyachev commented Dec 3, 2024

whitneywhtsang commented Dec 3, 2024

whitneywhtsang commented Dec 4, 2024 • edited by anmyachev Loading

whitneywhtsang commented Dec 6, 2024 • edited Loading

whitneywhtsang commented Dec 12, 2024

etiotto commented Aug 28, 2024 •

edited by whitneywhtsang

Loading

etiotto commented Aug 30, 2024 •

edited

Loading

whitneywhtsang commented Dec 4, 2024 •

edited by anmyachev

Loading

whitneywhtsang commented Dec 6, 2024 •

edited

Loading