"Statistical significance" is starting to tire me. It is too binary for my test:...

"Statistical significance" is starting to tire me. It is too binary for my test: either a given result "achieved" statistical significant, or it is not. Obviously you have to choose a threshold, and which it should be is much less obvious.

Couldn't we just do a way with statistical significance, and just publish likelihood ratios, or decibels of evidence (in favour of one hypothesis over another) ? That way, we should know exactly how much an experiment is supposed to be worth. No arbitrary threshold. Plus, you get to combine several experiments, and get the compound evidence, which can be much stronger (or weaker) than the evidence you get from any single one of them. And then you may have found something worthwhile.

This is especially crucial when said evidence is expensive. In teaching, for instance, one researcher can hardly do experiments on more than 2 or three classrooms, over little more than a year. This is often not enough to accumulate enough evidence at once for reaching statistical significance. But a bunch of such experiments may very well be. (Or not, if the first one proved to be a fluke.)