New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

prepack serialization follow up PR #22670

Open

frank-dong-ms wants to merge 20 commits into main from frdong/prepack_2

Contributor

frank-dong-ms commented Oct 31, 2024

Description

follow up PR of : #22256

change rest kernels with prepack implemented

frank-dong-ms added 18 commits

August 29, 2024 17:42


          test

59fca4a


          serialize prepack initializers to onnx data file

b34b3d0


          sync and merge changes

57c5c58


          fix matmul_nbits kernel

acc23f4


          code clean up

fe9c81b


          bug fix

c7f19ca


          fix lint style

327cb1c


          fix CI failure in Linux

c6f8b4e


          fix CI failure in Android

46b9bac


          fix test failures

ee818ce


          disbale test for non-CPU and non-PC env, fix several tests

4520d83


          gemm with prepack serialize

487c307


          change other kernels for prepack serialization and add related tests

0e1cc89


          merge with main

7ac1cc2


          more changes based on test

0bef664


          update with memory alignment

65f196b


          Merge branch 'main' of https://github.com/Microsoft/onnxruntime into …

5c0c069

…frdong/prepack_2


          code clean up

a78868c

frank-dong-ms requested a review from a team as a code owner

October 31, 2024 06:48

frank-dong-ms requested review from pranavsharma and yuslepukhin

October 31, 2024 06:50

frank-dong-ms added 2 commits

October 31, 2024 00:21


          fix linux build

a6ac7d7


          fix CI

4fff676

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/core/framework/utils.cc

                 return initializer_name + seperator + node_name;
               }
+              size_t GetMemoryAlignedOffset(size_t current_offset) {

Check warning

Code scanning / PREfast

You can attempt to make 'onnxruntime::utils::GetMemoryAlignedOffset' constexpr unless it contains any undefined behavior (f.4). Warning

You can attempt to make 'onnxruntime::utils::GetMemoryAlignedOffset' constexpr unless it contains any undefined behavior (f.4).

onnxruntime/core/framework/utils.cc

                 return initializer_name + seperator + node_name;
               }
+              size_t GetMemoryAlignedOffset(size_t current_offset) {
+                const size_t alignment_number_of_bytes = 64;

Check warning

Code scanning / PREfast

The const variable 'alignment_number_of_bytes' can be computed at compile-time. Consider using constexpr (con.5). Warning

The const variable 'alignment_number_of_bytes' can be computed at compile-time. Consider using constexpr (con.5).

Member

yuslepukhin Oct 31, 2024 •

edited

Loading

This function should beconstexpr

yuslepukhin reviewed

View reviewed changes

Member

yuslepukhin left a comment

Let's address some important issues in this round first.

onnxruntime/contrib_ops/cpu/bert/attention.cc

                                                  int input_idx,
                                                  /*out*/ bool& used_shared_buffers) override;
+                virtual std::optional<Tensor> GetPrePackTensor(int /*input_index*/) override;

Member

yuslepukhin Oct 31, 2024

virtual is redundant in overriding functions. They are virtual by default.

onnxruntime/contrib_ops/cpu/bert/attention.cc

                private:
                 bool IsPackWeightsSuccessful(int qkv_index, AllocatorPtr alloc, size_t head_size,
                                              size_t input_hidden_size, const T* weights_data,
                                              size_t weight_matrix_col_size, PrePackedWeights* prepacked_weights);
+                void ConvertPrePackWeightsIntoTensor(onnxruntime::AllocatorPtr& alloc, const onnxruntime::Tensor& weights, PrePackedWeights* prepacked_weights);

Member

yuslepukhin Oct 31, 2024

Going by the name prepacked_weights can not be optional parameter otherwise there is nothing to prepack, wen can it be nullptr?
Is it an input or output?

onnxruntime/contrib_ops/cpu/bert/attention.cc

+              template <typename T>
+              void Attention<T>::ConvertTensorToPrePackWeight(void* tensor_data_raw) {
+                // buffer of packed_tensor is combine of:

Member

yuslepukhin Oct 31, 2024

Can we create a struct of the layout and then reinterpret the ptr into it, it is difficult to read and understand the code.

onnxruntime/contrib_ops/cpu/bert/attention.cc

+                TensorShapeVector shape_vector = weight_shape_.AsShapeVector();
+                size_t shape_vector_mem_size = utils::CalculateTensorShapeVectorMemoryUsage(shape_vector);
+                void* shape_vector_ptr = static_cast<void*>(&shape_vector);

Member

yuslepukhin Oct 31, 2024

This is undefined behavior. You need to copy data not the object of this class.
For this you can just get GetDims() span and using it copy data from it.

onnxruntime/contrib_ops/cpu/bert/attention.cc

+                std::memcpy(weight_shape_buffer.get(),
+                            static_cast<char*>(tensor_data_raw) + 4 * sizeof(size_t),
+                            weight_shape_buffer_size);
+                auto weight_shape_vector = static_cast<const InlinedVector<int64_t>*>(weight_shape_buffer.get());

Member

yuslepukhin Oct 31, 2024

This is just wrong. Undefined behavior.

onnxruntime/core/framework/utils.h

+              Tensor ConvertPackedBufferAndShapeToTensor(onnxruntime::AllocatorPtr& alloc,
+                                                         const onnxruntime::Tensor& weights,
+                                                         size_t packed_weights_size_,
+                                                         TensorShape weight_shape_,

Member

yuslepukhin Oct 31, 2024

Tensor shape is passed by value? Did you mean to pass span?
Trailing underscores are reserved for member variables
I am confused what is the result of this function? Is the last parameter also an output?

onnxruntime/core/framework/utils.h

+                                                         void* original_packed_buffer,
+                                                         IAllocatorUniquePtr<void>& packed_buffer);
+              Tensor ConvertPackedBufferAndShapeToTensorWithFlag(onnxruntime::AllocatorPtr& alloc,

Member

yuslepukhin Oct 31, 2024

Same comments for all other new functions.

onnxruntime/core/framework/utils.cc

+              size_t CalculateTensorShapeVectorMemoryUsage(TensorShapeVector& tensor_shape_vector) {
+                // Calculate memory for the vector object itself (metadata)
+                size_t vector_metadata_size = sizeof(std::vector<int64_t>);

Member

yuslepukhin Oct 31, 2024

You do not want to do it.

Member

yuslepukhin Oct 31, 2024

How is tensor_shape_vector related to std::vector?

onnxruntime/core/framework/utils.cc

+                // 3. original packed_weights buffer
+                TensorShapeVector shape_vector = weight_shape_.AsShapeVector();
+                size_t shape_vector_mem_size = utils::CalculateTensorShapeVectorMemoryUsage(shape_vector);
+                void* shape_vector_ptr = static_cast<void*>(&shape_vector);

Member

yuslepukhin Oct 31, 2024

Undefined behavior

onnxruntime/core/framework/utils.cc

+                // 2. weight shape: first vector memory size, then vector content
+                // 3. original packed_weights buffer
+                TensorShapeVector shape_vector = weight_shape_.AsShapeVector();
+                size_t shape_vector_mem_size = utils::CalculateTensorShapeVectorMemoryUsage(shape_vector);

Member

yuslepukhin Oct 31, 2024

I am struggling as to what you are trying to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet