The model knows the recognition text well and demonstrates good results because of it. If you test the same model on some unrelated speech which model didn't see yet the results will not be that great. Error rate might be significantly worse than other systems.