Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generating shader code to load vertices #2204

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

etang-cw
Copy link
Contributor

@etang-cw etang-cw commented Sep 27, 2023

Implementing vertex pipelines on mesh shaders will require shader code to load vertices, and what better way to make sure that code actually works properly than to implement it for use with anything so you can regression test CTS over it?
As a bonus, MoltenVK currently has a bunch of workarounds for Vulkan things that Metal doesn't support (read past stride, 0-divisor, etc), some of which are incredibly hacky. This removes the need for all of those, replacing them with a single "needs shader vertex loader" check: KhronosGroup/MoltenVK#2029.

Supports nearly every reasonable VkFormat, because why not

CTS status
  • macOS 13 (tested UHD 630, Radeon Pro 580X, M1): No regressions
    • (All) Now passes *_indirect_*_divisor_0
    • (All) Now passes tests in dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.* that didn't properly check VkPhysicalDevicePortabilitySubsetPropertiesKHR.minVertexInputBindingStrideAlignment before executing
    • (580X) Now passes dEQP-VK.pipeline.monolithic.input_attribute_offset.* tests. Binding non-4-byte-aligned vertex buffers is UB on non-Apple GPUs in Metal, but only AMD actually broke when you tried. Shader vertex loaders are (I think) UB-free here.
  • macOS 11 (tested GT 750M, Iris Pro 5200): Same as macOS 13
    • (750M) Now passes a number of tests that previously failed due to ICEs. Not going to question these, but hey, sometimes you get lucky
  • macOS 10.13 (tested Radeon Pro 560, HD 630):
    • (HD 630) No regressions, the dEQP-VK.pipeline.monolithic.input_attribute_offset.* tests fail here unlike macOS 11+, but they failed before as well, so not technically a regression.
    • (Radeon Pro 560) Fails some tests due to a compiler bug where loads from a uchar/char array of length > 1 that are then extended to an uint/int end up loading two bytes instead of one (so int(char_array[0]) loads the equivalent of int(reinterpret_cast<const device short&>(char_array[0]))). The code only generates arrays when vertex data is indexed past its stride, so this bug only affects us for code that loads single-byte types from buffers indexed past their stride with single-byte alignment, so given the age of 10.13, I'm considering this too obscure to bother adding a workaround, but if you think it's important, I can try.

Outstanding issues / questions:

  • Is it okay to bump the compiler requirement to C++14, or should I make this work with C++11? Now C++11
  • Should I keep everything in one file (e.g. before eafb97f) or is a separate file fine? Also, preferences for single massive function vs multiple smaller ones called in succession (also e.g. before eafb97f)? Keeping separate file
  • I duplicated VkFormat and VkVertexInputRate under slightly different names due to the lack of vulkan headers. Would it be better to just add vulkan headers as a dependency, or would that run into issues with fighting over whose vulkan headers to include when using SPIRV-cross as a dependency in a project that also uses vulkan headers? Left separate
  • build_implicit_builtins runs very early and seems to be the only thing that can easily generate definitions of builtin variables. We don't actually know what builtins we'll need until after add_interface_block(StorageClassInput) is called, at which point build_implicit_builtins has already run. For now, I'm just adding anything we might potentially need in build_implicit_builtins, which leads to a lot of unused gl_BaseVertex and gl_BaseInstance in many shaders, is there a better solution / should I be trying to refactor this (and if so, do you have any recommendations on how)? Left as-is
  • SRGB types are handled identically to UNORM as per Handling of SRGB vertex formats is confusing Vulkan-Docs#2214, should I temporarily disable support of SRGB types until that gets resolved? SRGB removed

@etang-cw etang-cw force-pushed the VertexLoader branch 8 times, most recently from adbb14b to e4954df Compare September 30, 2023 04:39
@HansKristian-Work
Copy link
Contributor

HansKristian-Work commented Oct 2, 2023

Is it okay to bump the compiler requirement to C++14, or should I make this work with C++11?

Keep it C++11. What are the real gains from using C++14 here? Min-spec is MSVC 2013 / GCC 4.8 (although GCC 4 CI is dead, so not sure if that builds anymore ...)

Would it be better to just add vulkan headers as a dependency.

No dependency on external headers is acceptable. It's fine to document that the MSLFormat is an alias of VkFormat to make client code easier to write.

which leads to a lot of unused gl_BaseVertex and gl_BaseInstance in many shaders, is there a better solution / should I be trying to refactor this (and if so, do you have any recommendations on how)

As long as this only occurs in shaders that use the special magic vertex loader and it doesn't cause actual real-world harm, I wouldn't care too much.

SRGB types are handled identically to UNORM

Just drop sRGB VBO support in MoltenVK. It's meaningless, and it seems it's the only one that exposes it.

This removes the need for all of those

What is the end-game for supporting extended_dynamic_state1 (core in 1.3 I think), which allows custom stride?

Should I keep everything in one file (e.g. before eafb97f) or is a separate file fine?

Given how massive it is, it's probably fine to keep it in a separate file.

@billhollings
Copy link
Contributor

What is the end-game for supporting extended_dynamic_state1 (core in 1.3 I think), which allows custom stride?

If by custom stride, you mean dynamic state stride, as of Metal 3.1 (this year's release), Metal now supports it.

Work on VK_EXT_extended_dynamic_state v1, v2 & v3 in MoltenVK is underway, and should be released in the next few weeks.

@etang-cw
Copy link
Contributor Author

etang-cw commented Oct 2, 2023

What are the real gains from using C++14 here

Mainly better constexpr support

Just drop sRGB VBO support in MoltenVK. It's meaningless, and it seems it's the only one that exposes it.

Done

What is the end-game for supporting extended_dynamic_state1 (core in 1.3 I think), which allows custom stride?

Probably use the stride=0 handling for struct generation, declare vertex buffers as Storage::Fill(buffer_align_log2), and load with reinterpret_cast<const device spvVertexDataXX&>(spvVertexBufferXX[gl_VertexIndex * (ubo.stride / buffer_align)])

Edit: Actually, it seems that while reinterpret_cast<const device spvVertexDataXX&>(char_array[idx * stride]) has the AGX compiler take the alignment info from the char array, and load everything one byte at a time, *reinterpret_cast<const device spvVertexDataXX*>(char_array + idx * stride) uses the alignment of spvVertexDataXX and loads everything nicely, so I'll probably go with that. Gotta love C++ compilers...

Work on VK_EXT_extended_dynamic_state v1, v2 & v3 in MoltenVK should be released in the next few weeks.

Guess I'd better implement it here

@etang-cw etang-cw force-pushed the VertexLoader branch 4 times, most recently from cc6d5d9 to 0999521 Compare October 2, 2023 21:20
@etang-cw
Copy link
Contributor Author

etang-cw commented Oct 2, 2023

Okay dynamic strides should be implemented, I indexed the stride buffer based on binding number, but maybe we should compress it to include only the bindings declared in the layout?

spirv_msl.hpp Outdated Show resolved Hide resolved
@HansKristian-Work
Copy link
Contributor

So, getting back to this after a long time. I've been dreading to stare at this PR. 1700 lines just to load vertices is arguably pretty nuts. I cannot maintain or understand this code.

The proposed MSL changes lately have been increasingly invasive and it's reaching the breaking point where I want to just outright reject the changes, but that's probably not acceptable.

What is the end-game here? Is the plan to emit mesh shaders from vertex/geom/tess? Using mesh shaders for pure VS pipelines isn't very useful. What would that look like in practice? Another 3000 lines of hacks in SPIRV-Cross? I don't think adding increasingly more complex transforms to SPIRV-Cross is the right approach. Large-scale rewriting of code should stay in the IR domain I think. Ideally, we'd have done tessellation that way.

Essentially, my main concern boils down to: Is there a path forward where SPIRV-Cross does not have to absorb all this complexity to work around Metal issues?

@HansKristian-Work
Copy link
Contributor

HansKristian-Work commented Nov 16, 2023

Other options maybe worth entertaining/discussing off-line is handing over long-term maintainership of the MSL backend to CW/MoltenVK if these things must land SPIRV-Cross upstream.

@etang-cw
Copy link
Contributor Author

etang-cw commented Nov 16, 2023

What is the end-game here? Is the plan to emit mesh shaders from vertex/geom/tess?

  1. Use it in normal non-mesh vertex shaders to work around differences between Metal's vertex loaders and Vulkan's (which you can see implemented here: WIP: In-shader vertex loading support MoltenVK#2029)
    This fixes things like
    • Hacks to work around Metal not allowing vertex attributes with offset greater than the buffer stride
    • Hacks to work around disagreements over the handling of zero divisors
    • VkPhysicalDevicePortabilitySubsetPropertiesKHR::minVertexInputBindingStrideAlignment > 1
    • Metal also requires vertex buffer offsets to have the same minimum alignment, but somehow that didn't make it into the portability subset properties
  2. Use it in GS to mesh conversion to replace the sorta-working-but-definitely-wouldnt-pass-cts one in the current GS PR

Large-scale rewriting of code should stay in the IR domain I think. Ideally, we'd have done tessellation that way.

For vertex loading, I think making a SPIR-V vertex loader generator would be harder than making an MSL one, and making a SPIR-V generator whose code translates to performant MSL code would be even harder than that. For reference, the generated MSL code currently uses the following things that don't translate well from SPIR-V to MSL:

  • packed vector types and alignas on structs to adjust alignment
  • reinterpret_cast on pointers/references to rectify any issues with multiple vertices referencing the same memory with different types
  • MSL's pixel data types (e.g. rgb9e5<float3>) to ensure we get the same quality of codegen as if you used Metal's fixed function vertex loader on Apple hardware

Doing the GS-mesh shader conversion might work fine over SPIR-V, but then it would want to call into either an MSL shader vertex loader or the MSL compute attribute binding system that tessellation shaders currently use, which wouldn't really be available to a shader that was generated outside of SPIRV-Cross.

Maybe the best solution here would be to make SPIRV-Cross more friendly to being used to mix SPIR-V and the target language. e.g. if MoltenVK could do the SPIR-V equivalent of struct mvkVertexData0; static mvkVertex mvkLoadVertex(uint vertexID, device struct mvkVertexData0* buffer0); (OpenCL SPIR-V seems to have the ability to represent an extern function definition, though it doesn't seem to be able to do the extern struct definition. Would it be better to add some SPIR-V extensions for this or just have SPIRV-Cross take a list of structs to convert to placeholders? Or maybe offer the ability to replace the definitions of structs/functions with arbitrary strings?), and have SPIRV-Cross generate a non-kernel static function and some reflection metadata saying what arguments it takes/returns, MoltenVK could then paste the vertex loader and the real kernel function that calls into the translated one at the bottom. This might also be useful to someone trying to use an unsupported feature like MSL's explicit imageblocks with an otherwise GLSL shader.

@HansKristian-Work
Copy link
Contributor

Would it be better to add some SPIR-V extensions for this or just have SPIRV-Cross take a list of structs to convert to placeholders?

This seems like a good idea. SPIR-V spec has

2.13. Linkage The ability to have partially linked modules and libraries is provided as part of the Linkage capability.

Other options is to use the non-semantic extensions or define a special SPIRV-Cross ExtInst that can be used to add arbitrary calls.

Maybe something simple like MoltenVK doing very light massage of the SPIR-V to add some wrap calls, SPIRV-Cross can just thunk the call and then all implementation detail that I shouldn't have to think or reason about is left to the API user.

Or maybe offer the ability to replace the definitions of structs/functions with arbitrary strings?)

That works too. When emitting functions, it should just be a matter of checking if there's a replacement string for that particular function body and it can be copy-pasted in, although using function declarations without a body seems more appropriate to me.

@HansKristian-Work
Copy link
Contributor

I don't know exactly what the solution would look like, but iteration on smaller proof of concepts first seems like the way to go and it can grow from there.

@billhollings
Copy link
Contributor

Other options maybe worth entertaining/discussing off-line is handing over long-term maintainership of the MSL backend to CW/MoltenVK if these things must land SPIRV-Cross upstream.

I've created #2229 to focus discussion on options and cost-benefits on this topic.

@goldenx86
Copy link

Any chance to push this forward?

@etang-cw
Copy link
Contributor Author

etang-cw commented Dec 6, 2023

Any chance to push this forward?

Working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants