If you type two keys and have to "select a label" as the README says, then you have to parse the screen to see what actually happened so you can decide what to do next. That seems to be what the parent commenter is saying slows one down.
This doesn't seem significantly different from, say, /<c1><c2><CR> and then hitting n to go to the next match until you've found what you want. It's only one more character than this project is claiming to require (the extra <CR>). With hlsearch enabled it also shows you matches as you type.
You don't scan the screen looking for labels. You are looking directly where you want to be and a label should appear over the place you want to go. That is the label you type to jump, you will never even notice the others or scan them.
My understanding is that the "parsing the screen" step is slightly pipelined with the "typing the keys" step, as the labels you have to read/pick between show up after you've typed the first character of the pair — so in the ~300ms it takes you to type the second character, you've already become aware of the labels, and are well on your way to parsing them.
This doesn't seem significantly different from, say, /<c1><c2><CR> and then hitting n to go to the next match until you've found what you want. It's only one more character than this project is claiming to require (the extra <CR>). With hlsearch enabled it also shows you matches as you type.