Here's a list of tools for scaling up transformer context that have github repos...

bratao · on April 24, 2023

Very nice list. I didn´t knew about Heinsen Routing, looks very interesting.

From my tests, SSMs are a very promising line of research and on my (small) tests on S4, it really has better characteristics than transformers, as it learned faster, a larger context and with smaller dataset.

cs702 · on April 24, 2023

Agree on SSMs: they look promising. They're on my list of "things to explore more thoroughly." I've done very little with them so far. I'm still making my way through the related papers, trying to get a superficial but accurate/intuitive understanding of these models.

lucidrains · on April 24, 2023

the code is here https://github.com/hazyresearch/safari you should try it and let us know your verdict.

cs702 · on April 24, 2023

Thank you. Somehow I missed that. I'm still making my way through the related papers, trying to get a superficial but accurate/intuitive understanding of these models. Embarrassingly, the work hasn't quite 'clicked' for me yet. Looking forward to tinkering with the code!

danicgross · on April 24, 2023

cs702, fantastic comment. I am sorta poking around this area too. I'd be curious what benchmark you're using to evaluate performance amongst these repos? If you're up for it, shoot me an email -- my email is in my profile.

cs702 · on April 24, 2023

Thank you!

Working on proprietary stuff. Not allowed to share details.

But I'll ask about connecting online :-)