As someone who has very limited understanding but tried to use BERT for classification, is BERT still relavant when compared to LLMs ? Asking because I hardly see any mention of BERTs anymore.
- Encoder based models have much faster inference (are auto-regressive) and are smaller. They are great for applications where speed and efficiency are key.
- Most embedding models are BERT-based (see MTEB leaderboard). So widely used for retrieval.
- They are also used to filter data for pre-training decoder models. The Llama 3 authors used a quality classifier (DistilRoberta) to generate quality scores for documents. Something similar is done for FineWeb Edu
Wait, I thought GPT's were autoregressive and encoder only like BERT used masked tokens? You're saying BERT is auto-regressive or am I misunderstanding?
You're right. Encoder only models like BERT aren't auto-regressive and are trained with the MLM objective. Decoder only (GPT) and encoder-decoder (T5) models are auto-regressive and are trained with the CLM and sometimes the PrefixLM objectives.
They're still very useful on their own. But even more broadly, you can often use them in tandem with LLMs. A good example could be a classifier that's used as a "router" of sorts; could be for selecting a prompt template, directing to a specific model, or loading a LoRA or soft prompt vector to be used at inference-time.
For many specialized tasks you can run BERTs (and simpler models in general) at scale, with lower latency, at lesser cost, with similar or even better results.
Depends what you’re trying to do. I’m writing a personal assistant app (speech to text) and want to classify the user input according to the current actions I support (or don’t). The flagship LLMs are pretty great at it if you include the classes in the prompt and they will spit out structured output every time. But, man, they are expensive and there’s the privacy aspect I’d prefer to adhere to. I’ve only got 24 GB of RAM, so I can’t run too many fancy local models and things like llama3.1:8b don’t classify very well.
They’ve drowned in the LLM noise, but they’re definitely still relevant.
- Generative model outputs are not always desirable, and often even undesirable
- BERT models are smaller and can run with lower latency and serve larger batches with lower vram requirements
- BERT models have bidirectional attention, which can improve performance in many applications
LLMs are “cheap” in the sense that they work well generically, without requiring fine tuning. Where they overlap with BERT models is mostly that they may work better in low training data environments due to better generalization capabilities.
But mostly companies like them because they don’t “require” ML engineers or data scientists on staff. For the lack of care given to evaluation that I see around LLM apps, I suspect that’s going to prove to be a faulty premise.
Growing up we called it Carrom board, which is square board with 4 pockets in the corners. I never knew there was an American version of it as Crokinole board.
Canadian. I haven't played Carrom, but it's my understanding it's Indian in origin and plays a bit more like a billiards variant, even going so far as to use tiny pool cues.
I think there are few different variants of this game. I played the version with the tiny pool cues as a kid, but we called it Couronne. Looking at images online it seems the main difference between Carrom and Couronne is that in Couronne you hit the pieces with a cue and that the pockets are much bigger than in Carrom.
I haven’t played in a while but as long as I remember, there aren’t any pool cues but there are varying (house) rules on how/where the disc can be flicked
Generally, crokinole is a much less punishing game than carrom, if we're talking about Indian carrom boards. American carrom boards, that were really popular in after-school programs when I was growing up, have relatively HUGE pockets than the Indian boards, in addition to being smaller boards. American carrom is like playing 8-ball, Indian carrom is like playing snooker.
I like carrom a lot, but I'm terrible at it. I'm at least a reasonable player at crokinole, and it's a lot easier to introduce others to the game without them getting too frustrated by it.
Good to see author's mention about routing. I am mentally stuck with mux for a long time and didn't pay attention to the new release features. Happy that I always find things like these on HN.
Thanks for the suggestion, will give it a try. I'm more familiar with Python than Go. I know my way around the Python ecosystem and can make informed decisions about which tool to use. Not so much with Go, so I appreciate your advice.
Chi is amazing. I love the philosophy of extending the stdlib instead of writing an alternative. I try to keep that in mind when writing my own libs or helpers now, and I'm very satisfied with the results.
For example I made a lib to write commands (like cobra or urfave/cli), but based entirely on the `flag` package: https://github.com/Thiht/go-command
> For example I made a lib to write commands (like cobra or urfave/cli), but based entirely on the `flag` package: https://github.com/Thiht/go-command
Looks nice! I'd like an easier way of setting both long and short flags for a command, i.e. --verbose and -v should do the same. Using `flag` I have to declare everything twice to achieve this.
This is really great resource for someone getting started to understand what's going on. I see many have issues and try out different git commands without understanding the outcome.
We are working on something content driven (for an ad or subscription model) with lot of effort and time and I am concerned how this technology will affect all that effort and eventually monetization ideas. But I can see how helpful this tool can be for learning new stuff.
We always talk about how scarce freshwater is but this image reprenstation has made it difficult to imagine how much supply do we have for an ever growing human population, the growing demand for water and how long will it last.
The comment you're replying to is about fresh water. Which becomes non-fresh when it mixes into seawater or waste or pollution. No need to leave the Earth.
Admittedly, it's probably better to talk about the cycle, since non-freshwater will be automatically converted back to freshwater via solar energy. But the rate can be slowed—eg, dump a bunch of toxic stuff in one place, it'll drain to a river, now everything from that point and downstream is no longer freshwater. Or pump up enough groundwater. Or inject toxic crap down where the groundwater lives.
We're quite good at reducing the total amount of freshwater available.
Back in 2012 or sometime around it, I was trying Akka a Java library and trying concurrency and stuff. Around the same time I gave Go a try and it was much less verbose and simple. Never looked at Java after that, but I never felt Go is verbose.
If I understand correctly, this is like SQLite, but Postgres. I love SQLite, but sometimes I need a little more. So, no more saving Date as text and we have arrays, jsonb etc and all the good stuff from Postgres. Am I right ?
Every use case and expectations are different IMO. And yes if it's a file system , you can always grab a way to keep a snapshot. Too early for this project to deliver everything at one go.
reply