Oh good, because mine would perform very poorly when the conversation got past 1 turn, or if I start a new chat without completely reloading the model.
AllenNLP was written for research, not for production. Many of the design choices reflect that.
As far as the vocabulary goes, a lot of AllenNLP components are about experimenting with ways to turn text into vectors. Constructing the vocabulary is part of that. When pre-trained transformers became a thing, this wasn't needed anymore. That's part of why we decided to deprecate the library: Very few people experiment with how to construct vocabularies anymore, so we don't want to live with the complexity anymore.
Maybe we need to re-work the docs if the DAG aspects stick out to you so much. The main functionality is the cache. If you have a complex experiment, you can still write the code as if all the steps were fast, and let them be slow only the first time you run it. The DAG stuff is also nice, but less important.
That said, you could execute sklearn. If that's what your experiment needs, it's the right thing to do. This is why it gives us the flexibility to also support Jax: https://github.com/allenai/tango/pull/313
The DL-specific stuff is in the components we supply. Like the trainer, dataset handling stuff, file formats, and increasingly, https://github.com/allenai/catwalk.
It depends on what you use AllenNLP for. AllenNLP has a ton of functionality for vectorizing text. Most of the tokenizer/indexer/embedder stuff is about that. But these days we all use transformers for that, so there isn't much of a need to experiment with ways to vectorize.
If you like the trainer, or the configuration language, or some of the other components you should check out Tango (https://github.com/allenai/tango). One of Tango's origins is the question "What if AllenNLP supported workflow steps other than read -> train -> evaluate?". We noticed that a lot of work in NLP no longer fit that simple pattern, so we needed a new tool that can support more complex experiments.
If you like the metrics, try torchmetrics. Torchmetrics has almost exactly the same API as AllenNLP metrics.
If you like any of the nn components, please get in touch with the Tango team (on GitHub). We recently had some discussion around rescuing a few of those, since there seems to be some excitement.
AllenNLP has only ever supported Torch. At the moment, Tango only supports Torch as well, but Jax support is well underway.
And yeah, Tango is a lot like a build script. In fact, I used to manage my experiments with Makefiles. Tango is better though. Results don't have to be single files, and they don't have to live in one filesystem either, so I can run the GPU-heavy parts of my experiments on one machine, and the CPU-heavy parts on another. The way you version your code is better than what Makefiles can do. You have actual control beyond file modification time. And of course, there is the whole Python integration stuff.
>The US is the only Western nation with the demographics (population size and age), political will, technological capacity, and economic ability to challenge a surging China or resurgent Russia (which inherited the might of the Soviet Union to build off of) on the world stage.
>How many Americans would change their tone on military spending if China or Russia were calling the shots on world issues? On spreading their views on governance or human rights? Or if the balance of power shifted so much that more nations decided it was time for them to get nuclear weapons too (imagine Saudi Arabia getting nukes...)?
This was written in 2017. Man I would love to hear his assessment as to how the US is doing vs China today. So much has transpired since then.
I make a living selling working code, but I agree. I think that, just like how monitoring needs to run at higher priority than actual production workloads, the understanding of programmers needs to be at a higher level than the understanding which is actually encoded in the actual production program.
Professional software engineers develop and sell solutions. Code by itself is meaningless, it's the application of code in certain parameters and delivering expected results, which results in solutions.