-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EP (CPU TensoRT CUDA) accuracy test #22545
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
from transformers import AutoModel, AutoTokenizer | ||
from transformers import AutoModelForCausalLM | ||
import torch | ||
from transformers.onnx import export |
Check notice
Code scanning / CodeQL
Unused import Note test
import numpy as np | ||
import time | ||
import unittest | ||
import onnx |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note test
Module 'onnxruntime.test.onnx' is imported with both 'import' and 'import from'.
'attention_mask': pytorch_inputs['attention_mask'].numpy(), | ||
'onnx::Neg_2': torch.ones(1, dtype=torch.int64).numpy() # ORT requires this input since it's in the exported graph | ||
} | ||
return model, pytorch_inputs, ort_inputs |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error test
def run_comparison(self, model_name, use_minimal_model=True, use_tensorrt=True, use_fp16=True, use_graph_opt=True, rtol=1e-2, atol=1e-2): | ||
start_time = time.time() | ||
model, pytorch_inputs, ort_inputs = get_model_and_inputs(model_name, use_minimal_model) | ||
pytorch_output = run_model_in_pytorch(model, pytorch_inputs) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error test
def run_comparison(self, model_name, use_minimal_model=True, use_tensorrt=True, use_fp16=True, use_graph_opt=True, rtol=1e-2, atol=1e-2): | ||
start_time = time.time() | ||
model, pytorch_inputs, ort_inputs = get_model_and_inputs(model_name, use_minimal_model) | ||
pytorch_output = run_model_in_pytorch(model, pytorch_inputs) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error test
if model_name == "microsoft/Phi-3.5-mini-instruct": | ||
fix_phi35_model(model_file) | ||
providers = get_ep(use_tensorrt, use_fp16) | ||
ort_output = run_model_in_ort(model_file, ort_inputs, providers, use_graph_opt=use_graph_opt) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error test
run_comparison(self, "microsoft/resnet-18", | ||
use_minimal_model=False, use_tensorrt=False, use_fp16=False, use_graph_opt=False) | ||
|
||
def test_resnet18_cpu_fp32(self): |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning test
Description
This test compares output of below huggingface models
on Pytorch cpu vs [ORT CPU EP, ORT TensorRT EP, ORT CUDA] with different configurations (fp16, no ort graph optimization, 1 layer transformer vs full model / Resnet18 vs Resnet50)
Future work:
Motivation and Context