Making Sure to Profile Your Test Bench and Fighting Compilers
Context:
In my deque post, linked here, the test bench depended on the size between pointers in the deque. This makes a difference when referencing different values within a struct, so it might have actually caused issues.This was the code:
int x [MAIN_DEQUE_SIZE];
for (int i = 0; i < 100; i++)
{
deque_node* front = main_deque->first;
deque_node* iterfront = main_deque->first;
while(iterfront->next != front)
{
iterfront = iterfront->next;
x[i] = iterfront->payload;
}
}
This was the assembly from gcc 14.1.1 related to the loop that was given as a result when using the -O2 flag:
.L13:
mov rax, QWORD PTR [rax+8]
cmp rax, rdx
jne .L13
Thus, the values inside of the array that I store the payload values in were optimized out. I am trying to time it such that it is referencing this payload though, as the offset from the struct pointer might result in different cache behavior.
The fix is quite simple (fortunately) -- just push the values into a deque that is never used.
deque* trash = init_deque();
for (int i = 0; i < MAIN_DEQUE_SIZE; i++)
deque_push_front(trash, i);
}
This also causes the optimize flag -O3 to reference the payload as well, meaning this was sufficient in my fight with the compiler.
In future benches, I will make sure to continue profiling, but also figure out ways to bench without needing to fight gcc, if not, I'm sure after enough battles I'll lose one quite spectacularly!