combining with np.lib.stride_tricks #236

leshabirukov · 2023-01-11T13:33:33Z

leshabirukov
Jan 11, 2023

Hello Alex, great work!
Is there issues with using einops with tensors, overlapping in memory? I see

vimg = np.lib.stride_tricks.sliding_window_view(ims, window_shape=(9,9), axis=(1,2),  )
reduce(vimg, 'b h w c kh kw -> (b h) w c', 'mean', )

works pretty well. With such a feature one can implement nearly any DL matrix operation einsum+einops based. I suggest it is good opportunity for uniform NN representation, probably competitor for onnx.
Alexey Birukov

arogozhnikov · 2023-01-14T04:22:22Z

arogozhnikov
Jan 14, 2023
Maintainer

Hi @leshabirukov

Is there issues with using einops with tensors, overlapping in memory?

Computations work fine, but issues arise when you need to compute gradients.

Overlapping elements produce overlapping (conflicting) writes during backpropagation. That's the main reason why in DL frameworks strides are either not exposed to user or explicitly discouraged. I suppose that in some hardware architectures (where memory can accumulate lock-free) this isn't an issue, but for GPU/TPU this is a problem.

3 replies

leshabirukov Jan 17, 2023
Author

I tested torch\cuda, it seems it works:
https://www.kaggle.com/leshabirukov/einops-sliding-test-torch
See at the bottom part, I comparing result and gradients from conv2d with einsum and avg_pool2d with einops.reduce.

arogozhnikov Jan 20, 2023
Maintainer

interesting that it works. Made little digging - it looks like on forward pass they allow downstream operations to work with arbitrary strides.
However on the backward pass a full-shape gradient tensor is created (i.e. in your example it is 6-dimensional tensor), and then UnfoldBackward aggregates it into condensed form. This approach reduces but not eliminates memory consumption problem, and is still time-inefficient because of this aggregation step. From practical perspective, conv2d will win dramatically.

It's somewhat reflected in several issues in a tracker: pytorch/pytorch#60466

as_strided implementation seems to not incur pooling step and thus and if the view is “overlapped” (...) its behavior is undefined.

Profiler showing backward for 4->6 dim unfolding.

leshabirukov Jan 20, 2023
Author

Well, poor performance is obviously a bad thing, but it is a technical problem, solvable and not urgent.

leshabirukov · 2023-01-31T10:17:55Z

leshabirukov
Jan 31, 2023
Author

Take a look at this:
https://www.kaggle.com/leshabirukov/mnist-to-ein-2
May be MNIST classifier is not the best testbench, but in this case all-(einsum + einops) solution beats standart one. And for some reason it is more stable (if it really utilizes GPU).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

combining with np.lib.stride_tricks #236

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

combining with np.lib.stride_tricks #236

leshabirukov Jan 11, 2023

Replies: 2 comments · 3 replies

arogozhnikov Jan 14, 2023 Maintainer

leshabirukov Jan 17, 2023 Author

arogozhnikov Jan 20, 2023 Maintainer

leshabirukov Jan 20, 2023 Author

leshabirukov Jan 31, 2023 Author

leshabirukov
Jan 11, 2023

Replies: 2 comments 3 replies

arogozhnikov
Jan 14, 2023
Maintainer

leshabirukov Jan 17, 2023
Author

arogozhnikov Jan 20, 2023
Maintainer

leshabirukov Jan 20, 2023
Author

leshabirukov
Jan 31, 2023
Author