It looks interesting, but I’m slightly confused about the way this is presented. It feels like it’s coming from the wrong angle.
Specifically, reducing a time series to a sequence of patterns and trying to predict what happens next is something that has been done for decades in some form or another. To me the unique aspect of this is that it fits the approach into a transformer.
So I’d expect to see comparisons against other approaches that do the same thing, not against other transformer approaches.
I wouldn’t be confused if the title was “Inverted Transformers are MORE EFFECTIVE THAN NORMAL TRANSFORMERS For Time Series Forecasting”.
However, if the target audience are transformer folk then it makes sense, it just seems that I’m looking at it from the other direction.
The equivalence relationship between efficient AI and universal sequence prediction has been known for decades, so it would be surprising if AI algorithms were poor at sequence prediction. Of course, optimal universal sequence prediction is profoundly intractable and memory hard, which has implications for limits of AI efficiency and scalability.
There used to be a small hobbyist subculture on the Internet in the late 1990s that designed highly efficient approximate universal sequence predictor algorithms for the challenge of it. Now that AI is a thing, I've often wondered if there were some lost insights there on maximally efficient representations of learning systems on real computers. Most of those people would be deep into retirement by now.
I think we're living in a world where deep learning is winning so consistently that comparison to other methods is often just a time suck. It would be nice to provide a non-DL approach as a baseline, but I would expect it to lag behind the DL methods.
Furthermore, often pre-DL methods can be recast as hand-tuned special cases of DL models - some sequence of linear operations with hand-picked discontinuities sprinkled around. If you can implement the pre-DL method using standard neural network components, then gradient descent training of a neural network "should" find an equivalent or better solution.
Deep learning models are not better for vast problem areas which have analytical design algorithms. Deep learning's succession of triumphs has been across areas where analytical design has proven difficult.
First, there are many optimal, or near optimal, direct design algorithms for systems that are well characterized. These solutions are more concise, easier to analyze, reveal important insights, and come with guarantees regarding reliability, accuracy, stability, resource requirements, and operating regimes. Clear advantages over inductively learned solutions.
Second, just assuming that new algorithms are better than older algorithms is completely irrational. An anathema to the purpose and benefits of science, math, and responsible research in general.
If you are going to propose new algorithms, you need to compare the new algorithm against the previous state of the art.
Otherwise practitioners and future researchers will be driven into deadends, deploy pointlessly bad designs, forget important knowledge, and worst of all, lose out on what older algorithms can suggest for improving newer algorithms. With no excuse but gross carelessness.
This something that DL researchers like to think but it is definitely not true for time series forecasting. See https://forecastingdata.org/ for some examples where simple non-DL approaches beat state-of-the-art DL systems.
> I think we're living in a world where deep learning is winning so consistently that comparison to other methods is often just a time suck.
This is quite untrue. DL methods work well when there’s a lot of data in closed domains. DL works well by learning from corpuses of text and media where it can make reasonable interpolations.
When you don’t have enough data and you don’t have a known foundational model that you can do zero shot from, DL doesn’t work better than simpler conventional methods.
> It would be nice to provide a non-DL approach as a baseline, but I would expect it to lag behind the DL methods.
The M# competitions have usually shown very old forecasting algorithms work quite well, with frankly, way less training overhead and data. Ensemble models usually do best, but for a lot of use cases, DL is probably overkill versus ARIMA or triple exponential smoothing.
It looks interesting, but I’m slightly confused about the way this is presented. It feels like it’s coming from the wrong angle.
Specifically, reducing a time series to a sequence of patterns and trying to predict what happens next is something that has been done for decades in some form or another. To me the unique aspect of this is that it fits the approach into a transformer.
So I’d expect to see comparisons against other approaches that do the same thing, not against other transformer approaches.
I wouldn’t be confused if the title was “Inverted Transformers are MORE EFFECTIVE THAN NORMAL TRANSFORMERS For Time Series Forecasting”.
However, if the target audience are transformer folk then it makes sense, it just seems that I’m looking at it from the other direction.