I'm not the author and I haven't tried the software (on account of not having a Windows machine handy), but I would use a word/collocation frequency list to hide sentences that only contain words/collocations that have a certain or greater frequency.
As for polysemous words, I would use contextual analysis to disambiguate as well as possible, and then just show the words that I couldn't disambiguate until the threshold for the least frequently encountered meaning has been passed.
Also, I would add a something to the UI to allow the user to, in one click, rewind the video by x seconds (not sure what the optimal value of x would be) and enable all subtitles until the point at which the video had been rewound.
This seems like a Very Hard Problem. How did you approach it? Do you deal with polysemous words gracefully?