Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize difference between Intel port and OpenAI Triton #2030

Open
etiotto opened this issue Aug 28, 2024 · 14 comments · Fixed by #2034, #2035, #2038, #2049 or #2056
Open

Minimize difference between Intel port and OpenAI Triton #2030

etiotto opened this issue Aug 28, 2024 · 14 comments · Fixed by #2034, #2035, #2038, #2049 or #2056

Comments

@etiotto
Copy link
Contributor

etiotto commented Aug 28, 2024

I took a first pass at the difference between the latest OpenAI Triton code upstream and our fork. We have 66 common files showing a difference. To obtain the difference use the following command on the latest llvm-target (the commit ID is from our last merge):

# Search for the last merge commit id in the git log.
COMMIT_ID=`git log | grep "Merge commit" | head -1 | cut -d "'" -f2`

# Obtain the list of modified files and the difference.
echo "*********** MODIFIED FILES ***********"
git diff $COMMIT_ID --diff-filter=CDMRTUXB | grep "diff --" | cut -d"a" -f2- | cut -d" " -f1 | cut -d"/" -f2- 2>&1

echo "*********** DIFFERENCES ***********"
git diff $COMMIT_ID --diff-filter=CDMRTUXB 2>&1

The file containing a difference are in the following table. The 2nd column labeled "Upstreamable" indicates whether the diff. in that file are upstreamable or not, and whether we should attempt to upstream them now or in the future (i.e. we need to upstream our BE in third_party in order to upstream the difference). Specificaly:

  • "Future" means we cannot upstream now, e.g., it depends on Intel specify features, but should be upstreamable when the OpenAI community accept our BE
  • "Now" means it is actionable now, and we should attempt to upstream it
  • "Partially" means a mix of both
  • "N" means not upstreamable.

The 3rd column indicates whether we should move the difference (or the file) in the third_party/intel directory. The 4th column indicates whether the difference could be reduced or not.

File Upstreamable Movable to third_party/intel Can be reduced? Comments
.pre-commit-config.yaml Now N N Contains extra pre-commits
CMakeLists.txt Future N ? set LLVM_CONFIG needed?
LICENSE Future N N
README.md Future N N
bin/CMakeLists.txt Future N N
bin/RegisterTritonDialects.h Future N N
bin/triton-opt.cpp Future N N
docs/conf.py N N Y Make it common with upstream
docs/index.rst Future N ? Programming Guide Section is not upstream
include/triton/Conversion/TritonGPUToLLVM/Utility.h N N Y Remove differences
include/triton/Conversion/TritonToTritonGPU/Passes.td N N Y Remove differences
include/triton/Dialect/Triton/IR/TritonTypes.td N N ? How to remove "F8E4M3B11FNUZ"?
include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td Now N N Try to upstream changes
include/triton/Tools/Sys/GetEnv.hpp Future N Future Clean upn INTEL env. variables
lib/Analysis/Utility.cpp Future N N
lib/Conversion/TritonGPUToLLVM/CMakeLists.txt Future N N
lib/Dialect/Triton/IR/Ops.cpp Now N N
lib/Dialect/TritonGPU/IR/CMakeLists.txt Future N N
lib/Dialect/TritonGPU/IR/Dialect.cpp Future N N Note: some changes for warp layout needs to be removed (not upstreamable)
lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp Now N N Try to upstream
lib/Target/CMakeLists.txt Future N N
pyproject.toml ? N ? Can remove "importlib_metadata" or upstream this change ?
python/pyproject.toml ? N ? Can remove "importlib_metadata" or upstream the change?
python/setup.py Upstream part of the changes now N Y Can "get_install_requires" be removed?, address: # FIXME: pytorch<2.3.0 doesn't support numpy 2.0
python/src/ir.cc N N Y TODO: align with upstream code to use i8, can we remove "get_threads_per_warp" ?
python/src/llvm.cc N Y N Need SLPVectorization = true, how to make that pass work for us? Alternatively make it vendor specific?
python/test/regression/test_cast_matmul.py Partially N Y Device passing could be upstreamed ?
python/test/regression/test_functional_regressions.py Partially N Y Remove import intel_extension_for_pytorch
python/test/unit/instrumentation/test_gpuhello.py Partially N Y Device passing could be upstreamed ?
python/test/unit/language/assert_helper.py Partially N Y Pass device instead of hard coding it
python/test/unit/language/print_helper.py Partially N Y Pass device instead of hard coding it
python/test/unit/language/test_annotations.py N N Y Remove import intel_extension_for_pytorch
python/test/unit/language/test_block_pointer.py Partially N Y Remove import intel_extension_for_pytorch, is_cuda()
python/test/unit/language/test_conversions.py Partially N Y Pass device instead of hard coding it
python/test/unit/language/test_core.py Partially N Y Need further investigation to reduce diff with upstream
python/test/unit/language/test_line_info.py Partially N Y Lines have diff. number, try to reduce diff with upstream
python/test/unit/language/test_pipeliner.py N N Y Remove import intel_extension_for_pytorch
python/test/unit/language/test_random.py N N Y Remove import intel_extension_for_pytorch
python/test/unit/language/test_subprocess.py N N Y Need further investigation
python/test/unit/runtime/test_autotuner.py N N Y Remove import intel_extension_for_pytorch
python/test/unit/runtime/test_cache.py Partially N Y Pass device instead of hard coding it
python/test/unit/runtime/test_cublas.py N N Y pytest.skip vs pytest.xfail difference
python/test/unit/runtime/test_driver.py Future N Y Remove import intel_extension_for_pytorch
python/test/unit/runtime/test_jit.py Future N Y Remove import intel_extension_for_pytorch
python/test/unit/runtime/test_launch.py Future N Y Pass device instead of hard coding it
python/test/unit/runtime/test_subproc.py N N Y Remove import intel_extension_for_pytorch
python/triton/backends/compiler.py Future N N
python/triton/compiler/compiler.py Future N N Note: we need to add build_flags into metadata, upstream might accept?
python/triton/language/extra/init.py Future N N Should be upstreamable later
python/triton/language/semantic.py Future N N
python/triton/runtime/build.py Future N N
python/triton/testing.py N N Y Cleanup USE_WALL_TIME
python/triton/tools/compile.py N N ? What does AMD do for this file?
python/tutorials/01-vector-add.py Partially N Y Remove import intel_extension_for_pytorch, can pass device?
python/tutorials/02-fused-softmax.py N N Y Remove import intel_extension_for_pytorch
python/tutorials/03-matrix-multiplication.py Partially N Y Remove import intel_extension_for_pytorch
python/tutorials/04-low-memory-dropout.py Partially N Y Remove import intel_extension_for_pytorch
python/tutorials/05-layer-norm.py Partially N Y Remove import intel_extension_for_pytorch
python/tutorials/06-fused-attention.py Partially N Y Remove import intel_extension_for_pytorch
python/tutorials/07-extern-functions.py Partially N Y Remove import intel_extension_for_pytorch
python/tutorials/08-grouped-gemm.py Partially N Y Remove import intel_extension_for_pytorch
test/CMakeLists.txt Future N N
third_party/nvidia/CMakeLists.txt N N Y Try to make common
unittest/Conversion/TritonGPUToLLVM/CMakeLists.txt N N ? Investigate
unittest/Conversion/TritonGPUToLLVM/PTXAsmFormatTest.cpp N N ? Investigate
unittest/Dialect/TritonGPU/CMakeLists.txt Future N N

The following table summarizes features (or lack thereof) that need to be redesigned in order to remove the difference thy cause in common files. For example "warp layout" is a feature of our advanced codegen path which requires some redesign to avoid changes in common files.

Feature Files Affected Comments
F8E4M3B11FNUZ include/triton/Dialect/Triton/IR/TritonTypes.td, /python/src/ir.cc Determine how to align with upstream
warp layout lib/Dialect/TritonGPU/IR/Dialect.cpp How to handle warp layout without invasive changes ?
passing device unit tests/tutorials Need to pass device to tests/tutorials rather than hardcoding "cuda"

@anmyachev
Copy link
Contributor

@etiotto I moved PTXAsmFormatTest.cpp test into third_party/nvidia folder in Triton PR#4608. There should be no differences from upstream in unittest/Conversion/ folder now.

@etiotto etiotto linked a pull request Aug 30, 2024 that will close this issue
@etiotto
Copy link
Contributor Author

etiotto commented Aug 30, 2024

After PR #2064 lands we will have 48 files with differences (down from 66):

git diff a78c9c40aca4f6ad80deef39682a32056ea8976f --diff-filter=CDMRTUXB | grep "diff --" | cut -d"a" -f2- | cut -d" " -f1 | cut -d"/" -f2-|wc                                 ✔  10243  14:55:43 
     48      48    1682

anmyachev added a commit that referenced this issue Dec 1, 2024
@anmyachev
Copy link
Contributor

Status update:

@whitneywhtsang maybe it's time for a status update? (again)

@whitneywhtsang
Copy link
Contributor

Status update:

@whitneywhtsang maybe it's time for a status update? (again)

Yup, will work on that.

anmyachev added a commit that referenced this issue Dec 3, 2024
anmyachev added a commit that referenced this issue Dec 3, 2024
Part of #2030

This change does not affect the pass rate.

---------

Signed-off-by: Anatoly Myachev <[email protected]>
@whitneywhtsang
Copy link
Contributor

whitneywhtsang commented Dec 4, 2024

Status update:

@whitneywhtsang whitneywhtsang linked a pull request Dec 4, 2024 that will close this issue
@whitneywhtsang whitneywhtsang reopened this Dec 4, 2024
@vlad-penkin vlad-penkin changed the title Classify difference between Intel port and OpenAI Triton Minimize difference between Intel port and OpenAI Triton Dec 4, 2024
anmyachev added a commit that referenced this issue Dec 5, 2024
@whitneywhtsang whitneywhtsang linked a pull request Dec 5, 2024 that will close this issue
@anmyachev anmyachev reopened this Dec 6, 2024
@whitneywhtsang
Copy link
Contributor

whitneywhtsang commented Dec 6, 2024

Status update:

  • # of modified files: 52
  • # of new files: 402
    • third_party/intel: 169

anmyachev added a commit that referenced this issue Dec 7, 2024
Part of #2030
Part of #2824

For all other compilers the situation is about the same, it is expected
that they are already in the paths. I don't think it should be any
different for Windows.

---------

Signed-off-by: Anatoly Myachev <[email protected]>
anmyachev added a commit that referenced this issue Dec 7, 2024
Part of #2030
Part of #2824

For all other compilers the situation is about the same, it is expected
that they are already in the paths. I don't think it should be any
different for Windows.

---------

Signed-off-by: Anatoly Myachev <[email protected]>
whitneywhtsang pushed a commit that referenced this issue Dec 9, 2024
…modifying lit config file (#2965)

Part of #2030

Signed-off-by: Anatoly Myachev <[email protected]>
@whitneywhtsang
Copy link
Contributor

Status update:

  • # of modified files: 66
  • # of new files: 403
    • third_party/intel: 172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment