• Breaking News

    Performance: Fastest quad drawing I am currently generating quads in my vertex shader by having the vertex data in an SRV and generating the quads using SV_VertexId i.e null index and vertex buffers are bound to the draw call along with no input layout. This allows me to have a quarter of the bandwidth given to the index and vertex data by using a single index to the quad vertex and reusing that quad vertex for all four vertices required (see below). The problem I have is that I am currently invocating the VS 6 times for each quad because I am currently using tri-lists. I can reduce the VS invocation to four by either going back to using an index buffer (and so incurring 4x memory bandwidth for the indices) OR I could move to using quad lists (instead of tri lists) with the hope that the VS will only be executed four times because the hardware has automatically broken the quad into two triangles under the hood. Nowadays is there any performance overhead between using a quad list vs a tri list? How efficient is the hardware at breaking down quads into triangles? Does using the quad list primitive over the tri list primitive use the post transform cache or will it effectively do the below logic by invocating the VS 6 times? I'm asking this question before I go on to write performance tests in case anybody knows the definitive answer. (Ignore that I'm loading full 32bit index, position and size atm - I will be reducing this!) static const int g_quad[6] = { 0, 1, 2, 2, 1, 3 }; Buffer g_indices : register(vs, t0); ByteAddressBuffer g_quads : register(vs, t1); psDepth VS_Quad(uint vertexId : SV_VERTEXID) { uint quadIndex = vertexId / 6; uint vertexIndex = g_quad[vertexId % 6]; uint quadAddress = g_indices[quadIndex] * PRIMITIVE_SIZE; uint4 vertexdata0 = g_quads.Load4(quadAddress); uint4 vertexdata1 = g_quads.Load4(quadAddress + 16); // I have other data stuffed in yzw not shown here uint3 mask = uint3(vertexIndex & 1, (vertexIndex & 2) >> 1, 0); uint3 invmask = !mask; float3 position = asfloat(vertexdata0.xyz); // I can pack vertex data into smaller values than 32bit and will do! float3 size = float3(asfloat(vertexdata0.w), asfloat(vertexdata1.x), 0.0f); float4 screen_pos = float4((position * invmask) + ((position + size) * mask), 1.0f); https://ift.tt/eA8V8J

    I am currently generating quads in my vertex shader by having the vertex data in an SRV and generating the quads using SV_VertexId i.e null index and vertex buffers are bound to the draw call along with no input layout. This allows me to have a quarter of the bandwidth given to the index and vertex data by using a single index to the quad vertex and reusing that quad vertex for all four vertices required (see below). The problem I have is that I am currently invocating the VS 6 times for each quad because I am currently using tri-lists. I can reduce the VS invocation to four by either going back to using an index buffer (and so incurring 4x memory bandwidth for the indices) OR I could move to using quad lists (instead of tri lists) with the hope that the VS will only be executed four times because the hardware has automatically broken the quad into two triangles under the hood. Nowadays is there any performance overhead between using a quad list vs a tri list? How efficient is the hardware at breaking down quads into triangles? Does using the quad list primitive over the tri list primitive use the post transform cache or will it effectively do the below logic by invocating the VS 6 times? I'm asking this question before I go on to write performance tests in case anybody knows the definitive answer. (Ignore that I'm loading full 32bit index, position and size atm - I will be reducing this!) static const int g_quad[6] = { 0, 1, 2, 2, 1, 3 }; Buffer<uint> g_indices : register(vs, t0); ByteAddressBuffer g_quads : register(vs, t1); psDepth VS_Quad(uint vertexId : SV_VERTEXID) { uint quadIndex = vertexId / 6; uint vertexIndex = g_quad[vertexId % 6]; uint quadAddress = g_indices[quadIndex] * PRIMITIVE_SIZE; uint4 vertexdata0 = g_quads.Load4(quadAddress); uint4 vertexdata1 = g_quads.Load4(quadAddress + 16); // I have other data stuffed in yzw not shown here uint3 mask = uint3(vertexIndex & 1, (vertexIndex & 2) >> 1, 0); uint3 invmask = !mask; float3 position = asfloat(vertexdata0.xyz); // I can pack vertex data into smaller values than 32bit and will do! float3 size = float3(asfloat(vertexdata0.w), asfloat(vertexdata1.x), 0.0f); float4 screen_pos = float4((position * invmask) + ((position + size) * mask), 1.0f);

    from GameDev.net http://bit.ly/2Jm1NOh

    ليست هناك تعليقات

    Post Top Ad

    ad728

    Post Bottom Ad

    ad728