Thanks! With this the timings are even faster. I'll update the post.
julia> @code_llvm hamming_distance(Int8(33), Int8(125)) ; Function Signature: hamming_distance(Int8, Int8) ; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:13 within `hamming_distance` define i64 @julia_hamming_distance_16366(i8 signext %"x1::Int8", i8 signext %"x2::Int8") #0 { top: ; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:14 within `hamming_distance` ; ┌ @ int.jl:373 within `xor` %0 = xor i8 %"x2::Int8", %"x1::Int8" ; └ ; ┌ @ int.jl:415 within `count_ones` %1 = call i8 @llvm.ctpop.i8(i8 %0) ; │┌ @ int.jl:549 within `rem` %2 = zext i8 %1 to i64 ; └└ ret i64 %2 }
it lowers to the machine instruction now.
I also tried 8 Int64s vs 64 Int8s and it doesn't seem to make a difference when doing the search.
EDIT: apologize for the formatting
fn hamming_distance_u8(x1: u8, x2: u8) -> usize { (x1 ^ x2).count_ones() as usize }
Thanks! With this the timings are even faster. I'll update the post.