Understanding Vulkan device buffer address alignment
While writing the book on Vulkan, we had to leave out some content in the interest of time and space. Each chapter went through a few iterations from a code point of view but the reader will only see the final version. Other times, we had to figure out some issues with our implementation that might have been interesting to document, however they would have distracted from the content of the chapter and were left out.
One such issue had to do with the VK_KHR_buffer_device_address extension that we use for the ray tracing chapters. To illustrate the problem, I am going to use a simple vertex shader:
#extension GL_EXT_buffer_reference : require
#extension GL_EXT_shader_explicit_arithmetic_types_int64 : require
layout(buffer_reference, std430, buffer_reference_align = 8) buffer vec2_array_type {
vec2 v;
}
layout( binding = 0, set = 0 ) uniform VertexBuffer
{
uint64_t position_buffer;
};
layout (location = 0) out vec2 vTexCoord;
void main() {
vec2_array_type uv_buffer = vec2_array_type( position_buffer );
vTexCoord.xy = uv_buffer[gl_VertexIndex];
}
There’s a lot going in here, but it’s relatively simple:
position_buffer
contains the GPU address obtained throughvkGetBufferDeviceAddressKHR
(think of this as the base pointer)position_buffer
is basically avoid*
at this point, so we have to define a type to access the underlying data. This is done by definingvec2_array_type
.vec2_array_type uv_buffer = vec2_array_type( position_buffer )
is a cast to the type we just defined. This is what we will use to access the data as if it were an array
Notice the alignment in the type definition. This was our original implementation which did not work. The GLSL spec for this extension describes the alignment rules as follows:
Each buffer reference type has an alignment that can be specified via the “buffer_reference_align” layout qualifier. This must be a power of two and be greater than or equal to the largest scalar/component type in the block. If the layout qualifier is not specified, it defaults to 16 bytes. All buffer reference addresses used for a particular buffer reference type are assumed to be aligned to this alignment value. That is, the base of the block must be aligned, and members of the block can be aligned within the block using standard layouts and offset layout qualifiers.
We assumed that the alignment would work as on the CPU: our largest component is a vec2
, which takes 8 bytes, and that should be our alignment. This however caused issues during rendering (see this tweet) and I couldn’t figure out why. Some of you might have understood the issue already, but before getting to the solution, we need to understand how the GLSL spec defines the access computation:
This extension adds additional operator support to GL_EXT_buffer_reference to enable array indexing or “pointer math” on reference types. This can be used to access an array of structures stored consecutively in memory, without having to push the array declaration inside the block type. Array indexing “ref[i]” is similar to “&ref[i]” in C++ (if “ref” were a pointer type), and “ref + i” is equivalent to “(refType)((uint64_t)ref + i*sizeof(refType))”.
No surprise here, this looks like your standard array access. Let’s break down the SPIR-V for this shader (generated with glslangValidator.exe -V --target-env vulkan1.3 vertex.vert
) to better understand what’s going on.
%float = OpTypeFloat 32
%v2float = OpTypeVector %float 2
%vec2_array_type = OpTypeStruct %v2float
%_ptr_PhysicalStorageBuffer_vec2_array_type = OpTypePointer PhysicalStorageBuffer %vec2_array_type
%_ptr_Function__ptr_PhysicalStorageBuffer_vec2_array_type = OpTypePointer Function %_ptr_PhysicalStorageBuffer_vec2_array_type
%ulong = OpTypeInt 64 0
%VertexBuffer = OpTypeStruct %ulong
%_ptr_Uniform_VertexBuffer = OpTypePointer Uniform %VertexBuffer
%_ = OpVariable %_ptr_Uniform_VertexBuffer Uniform
%int = OpTypeInt 32 1
%int_0 = OpConstant %int 0
%_ptr_Uniform_ulong = OpTypePointer Uniform %ulong
%_ptr_Output_v2float = OpTypePointer Output %v2float
%vTexCoord = OpVariable %_ptr_Output_v2float Output
%_ptr_Input_int = OpTypePointer Input %int
%gl_VertexIndex = OpVariable %_ptr_Input_int Input
%long = OpTypeInt 64 1
%ulong_8 = OpConstant %ulong 8
These are the types used in our program. It’s a bit hard to read, but the ones we are interested in are:
%v2float
defines afloat[2]
type%vec2_array_type
defines astruct
with a%v2float
member%_ptr_PhysicalStorageBuffer_vec2_array_type
defines a pointer to our struct
The other entries follow a similar pattern. Moving on to the implementation we have the following:
%uv_buffer = OpVariable %_ptr_Function__ptr_PhysicalStorageBuffer_vec2_array_type Function
%19 = OpAccessChain %_ptr_Uniform_ulong %_ %int_0
%20 = OpLoad %ulong %19
%21 = OpConvertUToPtr %_ptr_PhysicalStorageBuffer_vec2_array_type %20
OpStore %uv_buffer %21
%24 = OpLoad %_ptr_PhysicalStorageBuffer_vec2_array_type %uv_buffer
%25 = OpConvertPtrToU %ulong %24
This corresponds to the first line in our shader: we cast our uint64_t
address to a pointer of vec2_array_type
. Next we have the pointer arithmetic to load the entry.
%28 = OpLoad %int %gl_VertexIndex
%30 = OpSConvert %long %28
%31 = OpBitcast %ulong %30
%33 = OpIMul %ulong %31 %ulong_8
%34 = OpIAdd %ulong %25 %33
%35 = OpConvertUToPtr %_ptr_PhysicalStorageBuffer_vec2_array_type %34
%37 = OpAccessChain %_ptr_PhysicalStorageBuffer_v2float %35 %int_0
%38 = OpLoad %v2float %37 Aligned 8
This roughly translates to:
vec2_array_type* vec2_array = ...;
vec2 v = (vec2_array + glVertexIndex * 8)->v;
Notice instruction %38
: it asks for a load aligned to 8 bytes. And this, I think, is causing the problem. I am not sure why this would cause issues, as I think the alignment required for Vulkan buffer memory is already aligned to 8, although I am not sure if that’s also the case when using VMA as the allocation library.
Re-reading the spec carefully they say we should align to our largest scalar/component type. I thought that our component type was a vec2
, which is aligned to 8 bytes. This type however is treated as an array of two floats, which means our largest scalar component is a float. Our alignment should then be just 4 bytes. Changing the buffer alignment to 4, produces the exact same code as before, except for the last instruction:
%38 = OpLoad %v2float %37 Aligned 4
This means our loads will be byte aligned and the computed address will be correct no matter the base address of the buffer.
I hope you found this useful, feel free to reach out on Twitter for any comments or suggestions :)