Support generating shader code to load vertices #2204

etang-cw · 2023-09-27T20:14:41Z

Implementing vertex pipelines on mesh shaders will require shader code to load vertices, and what better way to make sure that code actually works properly than to implement it for use with anything so you can regression test CTS over it?
As a bonus, MoltenVK currently has a bunch of workarounds for Vulkan things that Metal doesn't support (read past stride, 0-divisor, etc), some of which are incredibly hacky. This removes the need for all of those, replacing them with a single "needs shader vertex loader" check: KhronosGroup/MoltenVK#2029.

Supports nearly every reasonable VkFormat, because why not

CTS status

macOS 13 (tested UHD 630, Radeon Pro 580X, M1): No regressions
- (All) Now passes *_indirect_*_divisor_0
- (All) Now passes tests in dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.* that didn't properly check VkPhysicalDevicePortabilitySubsetPropertiesKHR.minVertexInputBindingStrideAlignment before executing
- (580X) Now passes dEQP-VK.pipeline.monolithic.input_attribute_offset.* tests. Binding non-4-byte-aligned vertex buffers is UB on non-Apple GPUs in Metal, but only AMD actually broke when you tried. Shader vertex loaders are (I think) UB-free here.
macOS 11 (tested GT 750M, Iris Pro 5200): Same as macOS 13
- (750M) Now passes a number of tests that previously failed due to ICEs. Not going to question these, but hey, sometimes you get lucky
macOS 10.13 (tested Radeon Pro 560, HD 630):
- (HD 630) No regressions, the dEQP-VK.pipeline.monolithic.input_attribute_offset.* tests fail here unlike macOS 11+, but they failed before as well, so not technically a regression.
- (Radeon Pro 560) Fails some tests due to a compiler bug where loads from a uchar/char array of length > 1 that are then extended to an uint/int end up loading two bytes instead of one (so int(char_array[0]) loads the equivalent of int(reinterpret_cast<const device short&>(char_array[0]))). The code only generates arrays when vertex data is indexed past its stride, so this bug only affects us for code that loads single-byte types from buffers indexed past their stride with single-byte alignment, so given the age of 10.13, I'm considering this too obscure to bother adding a workaround, but if you think it's important, I can try.

Outstanding issues / questions:

~~Is it okay to bump the compiler requirement to C++14, or should I make this work with C++11?~~ Now C++11
Should I keep everything in one file (e.g. before eafb97f) or is a separate file fine? Also, preferences for single massive function vs multiple smaller ones called in succession (also e.g. before eafb97f)? Keeping separate file
I duplicated VkFormat and VkVertexInputRate under slightly different names due to the lack of vulkan headers. Would it be better to just add vulkan headers as a dependency, or would that run into issues with fighting over whose vulkan headers to include when using SPIRV-cross as a dependency in a project that also uses vulkan headers? Left separate
build_implicit_builtins runs very early and seems to be the only thing that can easily generate definitions of builtin variables. We don't actually know what builtins we'll need until after add_interface_block(StorageClassInput) is called, at which point build_implicit_builtins has already run. For now, I'm just adding anything we might potentially need in build_implicit_builtins, which leads to a lot of unused gl_BaseVertex and gl_BaseInstance in many shaders, is there a better solution / should I be trying to refactor this (and if so, do you have any recommendations on how)? Left as-is
~~SRGB types are handled identically to UNORM as per Handling of SRGB vertex formats is confusing Vulkan-Docs#2214, should I temporarily disable support of SRGB types until that gets resolved?~~ SRGB removed

HansKristian-Work · 2023-10-02T11:00:09Z

Is it okay to bump the compiler requirement to C++14, or should I make this work with C++11?

Keep it C++11. What are the real gains from using C++14 here? Min-spec is MSVC 2013 / GCC 4.8 (although GCC 4 CI is dead, so not sure if that builds anymore ...)

Would it be better to just add vulkan headers as a dependency.

No dependency on external headers is acceptable. It's fine to document that the MSLFormat is an alias of VkFormat to make client code easier to write.

which leads to a lot of unused gl_BaseVertex and gl_BaseInstance in many shaders, is there a better solution / should I be trying to refactor this (and if so, do you have any recommendations on how)

As long as this only occurs in shaders that use the special magic vertex loader and it doesn't cause actual real-world harm, I wouldn't care too much.

SRGB types are handled identically to UNORM

Just drop sRGB VBO support in MoltenVK. It's meaningless, and it seems it's the only one that exposes it.

This removes the need for all of those

What is the end-game for supporting extended_dynamic_state1 (core in 1.3 I think), which allows custom stride?

Should I keep everything in one file (e.g. before eafb97f) or is a separate file fine?

Given how massive it is, it's probably fine to keep it in a separate file.

billhollings · 2023-10-02T17:56:26Z

What is the end-game for supporting extended_dynamic_state1 (core in 1.3 I think), which allows custom stride?

If by custom stride, you mean dynamic state stride, as of Metal 3.1 (this year's release), Metal now supports it.

Work on VK_EXT_extended_dynamic_state v1, v2 & v3 in MoltenVK is underway, and should be released in the next few weeks.

etang-cw · 2023-10-02T18:36:43Z

What are the real gains from using C++14 here

Mainly better constexpr support

Just drop sRGB VBO support in MoltenVK. It's meaningless, and it seems it's the only one that exposes it.

Done

What is the end-game for supporting extended_dynamic_state1 (core in 1.3 I think), which allows custom stride?

Probably use the stride=0 handling for struct generation, declare vertex buffers as Storage::Fill(buffer_align_log2), and load with reinterpret_cast<const device spvVertexDataXX&>(spvVertexBufferXX[gl_VertexIndex * (ubo.stride / buffer_align)])

Edit: Actually, it seems that while reinterpret_cast<const device spvVertexDataXX&>(char_array[idx * stride]) has the AGX compiler take the alignment info from the char array, and load everything one byte at a time, *reinterpret_cast<const device spvVertexDataXX*>(char_array + idx * stride) uses the alignment of spvVertexDataXX and loads everything nicely, so I'll probably go with that. Gotta love C++ compilers...

Work on VK_EXT_extended_dynamic_state v1, v2 & v3 in MoltenVK should be released in the next few weeks.

Guess I'd better implement it here

etang-cw · 2023-10-02T21:40:26Z

Okay dynamic strides should be implemented, I indexed the stride buffer based on binding number, but maybe we should compress it to include only the bindings declared in the layout?

spirv_msl.hpp

HansKristian-Work · 2023-11-16T14:29:28Z

So, getting back to this after a long time. I've been dreading to stare at this PR. 1700 lines just to load vertices is arguably pretty nuts. I cannot maintain or understand this code.

The proposed MSL changes lately have been increasingly invasive and it's reaching the breaking point where I want to just outright reject the changes, but that's probably not acceptable.

What is the end-game here? Is the plan to emit mesh shaders from vertex/geom/tess? Using mesh shaders for pure VS pipelines isn't very useful. What would that look like in practice? Another 3000 lines of hacks in SPIRV-Cross? I don't think adding increasingly more complex transforms to SPIRV-Cross is the right approach. Large-scale rewriting of code should stay in the IR domain I think. Ideally, we'd have done tessellation that way.

Essentially, my main concern boils down to: Is there a path forward where SPIRV-Cross does not have to absorb all this complexity to work around Metal issues?

HansKristian-Work · 2023-11-16T14:58:08Z

Other options maybe worth entertaining/discussing off-line is handing over long-term maintainership of the MSL backend to CW/MoltenVK if these things must land SPIRV-Cross upstream.

etang-cw · 2023-11-16T21:22:44Z

What is the end-game here? Is the plan to emit mesh shaders from vertex/geom/tess?

Use it in normal non-mesh vertex shaders to work around differences between Metal's vertex loaders and Vulkan's (which you can see implemented here: WIP: In-shader vertex loading support MoltenVK#2029)
This fixes things like
- Hacks to work around Metal not allowing vertex attributes with offset greater than the buffer stride
- Hacks to work around disagreements over the handling of zero divisors
- VkPhysicalDevicePortabilitySubsetPropertiesKHR::minVertexInputBindingStrideAlignment > 1
- Metal also requires vertex buffer offsets to have the same minimum alignment, but somehow that didn't make it into the portability subset properties
Use it in GS to mesh conversion to replace the sorta-working-but-definitely-wouldnt-pass-cts one in the current GS PR

Large-scale rewriting of code should stay in the IR domain I think. Ideally, we'd have done tessellation that way.

For vertex loading, I think making a SPIR-V vertex loader generator would be harder than making an MSL one, and making a SPIR-V generator whose code translates to performant MSL code would be even harder than that. For reference, the generated MSL code currently uses the following things that don't translate well from SPIR-V to MSL:

packed vector types and alignas on structs to adjust alignment
reinterpret_cast on pointers/references to rectify any issues with multiple vertices referencing the same memory with different types
MSL's pixel data types (e.g. rgb9e5<float3>) to ensure we get the same quality of codegen as if you used Metal's fixed function vertex loader on Apple hardware

Doing the GS-mesh shader conversion might work fine over SPIR-V, but then it would want to call into either an MSL shader vertex loader or the MSL compute attribute binding system that tessellation shaders currently use, which wouldn't really be available to a shader that was generated outside of SPIRV-Cross.

Maybe the best solution here would be to make SPIRV-Cross more friendly to being used to mix SPIR-V and the target language. e.g. if MoltenVK could do the SPIR-V equivalent of struct mvkVertexData0; static mvkVertex mvkLoadVertex(uint vertexID, device struct mvkVertexData0* buffer0); (OpenCL SPIR-V seems to have the ability to represent an extern function definition, though it doesn't seem to be able to do the extern struct definition. Would it be better to add some SPIR-V extensions for this or just have SPIRV-Cross take a list of structs to convert to placeholders? Or maybe offer the ability to replace the definitions of structs/functions with arbitrary strings?), and have SPIRV-Cross generate a non-kernel static function and some reflection metadata saying what arguments it takes/returns, MoltenVK could then paste the vertex loader and the real kernel function that calls into the translated one at the bottom. This might also be useful to someone trying to use an unsupported feature like MSL's explicit imageblocks with an otherwise GLSL shader.

With the current spec, SRGB and UNORM are identical, making SRGB useless

HansKristian-Work · 2023-11-17T13:27:47Z

Would it be better to add some SPIR-V extensions for this or just have SPIRV-Cross take a list of structs to convert to placeholders?

This seems like a good idea. SPIR-V spec has

2.13. Linkage The ability to have partially linked modules and libraries is provided as part of the Linkage capability.

Other options is to use the non-semantic extensions or define a special SPIRV-Cross ExtInst that can be used to add arbitrary calls.

Maybe something simple like MoltenVK doing very light massage of the SPIR-V to add some wrap calls, SPIRV-Cross can just thunk the call and then all implementation detail that I shouldn't have to think or reason about is left to the API user.

Or maybe offer the ability to replace the definitions of structs/functions with arbitrary strings?)

That works too. When emitting functions, it should just be a matter of checking if there's a replacement string for that particular function body and it can be copy-pasted in, although using function declarations without a body seems more appropriate to me.

HansKristian-Work · 2023-11-17T17:53:04Z

I don't know exactly what the solution would look like, but iteration on smaller proof of concepts first seems like the way to go and it can grow from there.

billhollings · 2023-11-17T19:23:20Z

Other options maybe worth entertaining/discussing off-line is handing over long-term maintainership of the MSL backend to CW/MoltenVK if these things must land SPIRV-Cross upstream.

I've created #2229 to focus discussion on options and cost-benefits on this topic.

goldenx86 · 2023-12-06T00:13:25Z

Any chance to push this forward?

etang-cw · 2023-12-06T17:33:14Z

Any chance to push this forward?

Working on it

etang-cw force-pushed the VertexLoader branch 8 times, most recently from adbb14b to e4954df Compare September 30, 2023 04:39

etang-cw force-pushed the VertexLoader branch from e4954df to 2606371 Compare October 2, 2023 18:31

etang-cw force-pushed the VertexLoader branch 4 times, most recently from cc6d5d9 to 0999521 Compare October 2, 2023 21:20

etang-cw force-pushed the VertexLoader branch from 0999521 to 324b939 Compare October 3, 2023 00:55

etang-cw mentioned this pull request Oct 3, 2023

WIP: In-shader vertex loading support KhronosGroup/MoltenVK#2029

Open

etang-cw force-pushed the VertexLoader branch 2 times, most recently from 94b7455 to 3a1cac2 Compare October 4, 2023 20:04

HansKristian-Work reviewed Nov 16, 2023

View reviewed changes

spirv_msl.hpp Outdated Show resolved Hide resolved

etang-cw force-pushed the VertexLoader branch from 6bc56a5 to 5fad5d1 Compare November 16, 2023 19:16

etang-cw added 4 commits November 16, 2023 16:06

Allow fixup hooks to specify relative ordering

7b65cc0

Add FixupInPriorities for custom vertex loading

a73b95e

Add msl vertex index to CLI

712118f

Add MSL shader vertex loader generation

81a66d9

etang-cw added 5 commits November 16, 2023 16:08

Vertex Loader: Disable SRGB support

c9e9dc3

With the current spec, SRGB and UNORM are identical, making SRGB useless

Work around AMD compiler bug

97aedb1

Add vertex loader tests

f7543a3

Vertex Loader dynamic stride support

b890a10

Split spvLoadVertex call across multiple lines

127ef5d

etang-cw force-pushed the VertexLoader branch from 5fad5d1 to 127ef5d Compare November 16, 2023 22:08

billhollings mentioned this pull request Nov 17, 2023

How Do You Solve a Problem Like MSL? #2229

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support generating shader code to load vertices #2204

Support generating shader code to load vertices #2204

etang-cw commented Sep 27, 2023 •

edited

Loading

HansKristian-Work commented Oct 2, 2023 •

edited

Loading

billhollings commented Oct 2, 2023

etang-cw commented Oct 2, 2023 •

edited

Loading

etang-cw commented Oct 2, 2023

HansKristian-Work commented Nov 16, 2023

HansKristian-Work commented Nov 16, 2023 •

edited

Loading

etang-cw commented Nov 16, 2023 •

edited

Loading

HansKristian-Work commented Nov 17, 2023

HansKristian-Work commented Nov 17, 2023

billhollings commented Nov 17, 2023

goldenx86 commented Dec 6, 2023

etang-cw commented Dec 6, 2023

Support generating shader code to load vertices #2204

Are you sure you want to change the base?

Support generating shader code to load vertices #2204

Conversation

etang-cw commented Sep 27, 2023 • edited Loading

HansKristian-Work commented Oct 2, 2023 • edited Loading

billhollings commented Oct 2, 2023

etang-cw commented Oct 2, 2023 • edited Loading

etang-cw commented Oct 2, 2023

HansKristian-Work commented Nov 16, 2023

HansKristian-Work commented Nov 16, 2023 • edited Loading

etang-cw commented Nov 16, 2023 • edited Loading

HansKristian-Work commented Nov 17, 2023

HansKristian-Work commented Nov 17, 2023

billhollings commented Nov 17, 2023

goldenx86 commented Dec 6, 2023

etang-cw commented Dec 6, 2023

etang-cw commented Sep 27, 2023 •

edited

Loading

HansKristian-Work commented Oct 2, 2023 •

edited

Loading

etang-cw commented Oct 2, 2023 •

edited

Loading

HansKristian-Work commented Nov 16, 2023 •

edited

Loading

etang-cw commented Nov 16, 2023 •

edited

Loading