With your use of threading you can instead of using one semaphore and buffer do two half as big, so one thread can always read one while the other us written. That potentially can remove all useless idling from your solution
It wasn’t well explained, but there are two buffers. Even with the sampling loop as tight as I could get it I think there is just a few ms of inherent delay between when it finishes reading and when it is ready to read again.