What do you mean by "bias in sample size"? Low sample size reduces your statistical power, but if the effect size is huge (as described in the article) you don't need much statistical power to be confident in your answer.
There is no serious evidence that detracking has a positive impact. There is a lot of evidence that it harms students.
>Students — at all levels of performance, but especially our students who need the most support and for whom this model was intended to help most — aren’t having their needs met. In one of my multilevel classes, I received feedback that the lower-level students didn’t want to ask questions because they didn’t want to “look dumb,” and the higher-level students didn’t want to ask questions because they didn’t want their classmates to “feel dumb.” The result was a classroom that was far less dynamic than what I was typically able to cultivate.