I wish we could start moving to better approaches for evaluating time series forecasts. Ideally, the forecaster reports a probability distribution over time series, then we evaluate the predictive density with regard to an error function that is optimal for the intended application of the forecast at hand.
I use my package https://github.com/alexhallam/tablespoon to generate naive forecasts then evaluate the crps of the naive vs the crps of the alternative method. This “skill score” approach is very good.