Hacker News new | past | comments | ask | show | jobs | submit | srameshc's comments login

As someone who has very limited understanding but tried to use BERT for classification, is BERT still relavant when compared to LLMs ? Asking because I hardly see any mention of BERTs anymore.

Yes, they are still used

- Encoder based models have much faster inference (are auto-regressive) and are smaller. They are great for applications where speed and efficiency are key. - Most embedding models are BERT-based (see MTEB leaderboard). So widely used for retrieval. - They are also used to filter data for pre-training decoder models. The Llama 3 authors used a quality classifier (DistilRoberta) to generate quality scores for documents. Something similar is done for FineWeb Edu


Wait, I thought GPT's were autoregressive and encoder only like BERT used masked tokens? You're saying BERT is auto-regressive or am I misunderstanding?

You're right. Encoder only models like BERT aren't auto-regressive and are trained with the MLM objective. Decoder only (GPT) and encoder-decoder (T5) models are auto-regressive and are trained with the CLM and sometimes the PrefixLM objectives.

You can mask out the tokens at the end, so its technically autoregressive.

They're still very useful on their own. But even more broadly, you can often use them in tandem with LLMs. A good example could be a classifier that's used as a "router" of sorts; could be for selecting a prompt template, directing to a specific model, or loading a LoRA or soft prompt vector to be used at inference-time.

For many specialized tasks you can run BERTs (and simpler models in general) at scale, with lower latency, at lesser cost, with similar or even better results.

Depends what you’re trying to do. I’m writing a personal assistant app (speech to text) and want to classify the user input according to the current actions I support (or don’t). The flagship LLMs are pretty great at it if you include the classes in the prompt and they will spit out structured output every time. But, man, they are expensive and there’s the privacy aspect I’d prefer to adhere to. I’ve only got 24 GB of RAM, so I can’t run too many fancy local models and things like llama3.1:8b don’t classify very well.

So I’m trying BERT models out :)


Try some of the Quen models. They have some that are slightly larger than 8b that will fit on your 24gb quite nicely. They have been amazing so far.

My understanding is that BERT can still outperform LLMs for sentiment classification?

To my understanding yes. But I never found a good use-case for sentiment classification.

It seems to be used by Youtube for comment censoring / shadow-banning.

That might make sense.

I used sentiment analysis a few times in recommender systems (for digital media consumption.)

Also for analyzing Trump's tweets (from 2016): https://mathematicaforprediction.wordpress.com/2016/11/21/te...


They’ve drowned in the LLM noise, but they’re definitely still relevant.

- Generative model outputs are not always desirable, and often even undesirable

- BERT models are smaller and can run with lower latency and serve larger batches with lower vram requirements

- BERT models have bidirectional attention, which can improve performance in many applications

LLMs are “cheap” in the sense that they work well generically, without requiring fine tuning. Where they overlap with BERT models is mostly that they may work better in low training data environments due to better generalization capabilities.

But mostly companies like them because they don’t “require” ML engineers or data scientists on staff. For the lack of care given to evaluation that I see around LLM apps, I suspect that’s going to prove to be a faulty premise.


> - BERT models are smaller and can run with lower latency and serve larger batches with lower vram requirements

The most recent version of Wolfram Language (aka Mathematica) uses by default BERT models for embedding.

(Say, for this function: https://reference.wolfram.com/language/ref/CreateSemanticSea... .)


Sincere question : how is it different that something like K6 ? > https://github.com/grafana/k6

As far as I understand, Codspeed is a tool to perform continuous benchmarking. k6 is a specific implementation of http benchmarking

Growing up we called it Carrom board, which is square board with 4 pockets in the corners. I never knew there was an American version of it as Crokinole board.


Canadian. I haven't played Carrom, but it's my understanding it's Indian in origin and plays a bit more like a billiards variant, even going so far as to use tiny pool cues.


I think there are few different variants of this game. I played the version with the tiny pool cues as a kid, but we called it Couronne. Looking at images online it seems the main difference between Carrom and Couronne is that in Couronne you hit the pieces with a cue and that the pockets are much bigger than in Carrom.


I haven’t played in a while but as long as I remember, there aren’t any pool cues but there are varying (house) rules on how/where the disc can be flicked


Different game, but Carrom is supposed to be great as well. I haven't played it but many of the folks I follow on BGG prefer it to Crokinole.


I played both and own a Crokinole board, but strongly prefer Carrom. It's similar but still quite different.


Generally, crokinole is a much less punishing game than carrom, if we're talking about Indian carrom boards. American carrom boards, that were really popular in after-school programs when I was growing up, have relatively HUGE pockets than the Indian boards, in addition to being smaller boards. American carrom is like playing 8-ball, Indian carrom is like playing snooker.

I like carrom a lot, but I'm terrible at it. I'm at least a reasonable player at crokinole, and it's a lot easier to introduce others to the game without them getting too frustrated by it.


Also, a lot of American carrom boards were produced with a checker board one side and a crokinole board on the other side.


Crokinole is Canadian


Good to see author's mention about routing. I am mentally stuck with mux for a long time and didn't pay attention to the new release features. Happy that I always find things like these on HN.


Nice new feature, would actually make me want to use Go without Gin.


I am over Gin and have been for years yet everyone keeps using it because it has inertia. The docs are garbage.

Big fan of Echo and it has much better docs.

https://echo.labstack.com/


Thanks for the suggestion, will give it a try. I'm more familiar with Python than Go. I know my way around the Python ecosystem and can make informed decisions about which tool to use. Not so much with Go, so I appreciate your advice.


I had to move from Gin to echo for my personal site, the routing in Gin was refusing to serve static resources at the root path without some headache.


I've grown to prefer go-chi over Gin (or Echo), since it's just the standard library with some QoL features on top.


Chi is amazing. I love the philosophy of extending the stdlib instead of writing an alternative. I try to keep that in mind when writing my own libs or helpers now, and I'm very satisfied with the results.

For example I made a lib to write commands (like cobra or urfave/cli), but based entirely on the `flag` package: https://github.com/Thiht/go-command


> For example I made a lib to write commands (like cobra or urfave/cli), but based entirely on the `flag` package: https://github.com/Thiht/go-command

Looks nice! I'd like an easier way of setting both long and short flags for a command, i.e. --verbose and -v should do the same. Using `flag` I have to declare everything twice to achieve this.


Nice CLI lib ! I'm still looking for a Argh or Typer equivalent though.


I like it, but with the new http.ServeMux rolled out in Go 1.22, is there any use for Chi anymore?


Good question. The middleware stack it provides is nice.


This is really great resource for someone getting started to understand what's going on. I see many have issues and try out different git commands without understanding the outcome.


It's a really great resource to come back to for those of us who have been using git over over a decade, too!


https://onlywei.github.io/explain-git-with-d3/ is another one that I've used to demonstrate the repository state after various operations.


I always send this to new hires who don't know git. Pretty common in mechanical engineering


We are working on something content driven (for an ad or subscription model) with lot of effort and time and I am concerned how this technology will affect all that effort and eventually monetization ideas. But I can see how helpful this tool can be for learning new stuff.


We always talk about how scarce freshwater is but this image reprenstation has made it difficult to imagine how much supply do we have for an ever growing human population, the growing demand for water and how long will it last.


Does water leave earth when used?


It's more extreme than that, it stops existing.

The comment you're replying to is about fresh water. Which becomes non-fresh when it mixes into seawater or waste or pollution. No need to leave the Earth.

Admittedly, it's probably better to talk about the cycle, since non-freshwater will be automatically converted back to freshwater via solar energy. But the rate can be slowed—eg, dump a bunch of toxic stuff in one place, it'll drain to a river, now everything from that point and downstream is no longer freshwater. Or pump up enough groundwater. Or inject toxic crap down where the groundwater lives.

We're quite good at reducing the total amount of freshwater available.


Only tiny amounts


Back in 2012 or sometime around it, I was trying Akka a Java library and trying concurrency and stuff. Around the same time I gave Go a try and it was much less verbose and simple. Never looked at Java after that, but I never felt Go is verbose.


If I understand correctly, this is like SQLite, but Postgres. I love SQLite, but sometimes I need a little more. So, no more saving Date as text and we have arrays, jsonb etc and all the good stuff from Postgres. Am I right ?


Exactly, all your favourite PG types, plus any that come with extensions such as vectors with pgvector.

We are working on PostGIS to, which will bring the geo types to PGlite.


IIRC Sqlite can exist as flat file and can be backed up. Will this work the same? And will it allow multiple writers?


Every use case and expectations are different IMO. And yes if it's a file system , you can always grab a way to keep a snapshot. Too early for this project to deliver everything at one go.


Valid question and I am sure it doesn't.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: