More

bioxept · 2024-12-13T18:25:36 1734114336

I signed up but I just get an alert that I will get 10 free requests when I signup. It quits the search when I press ok. Running it on safari iOS.

wluk · 2024-12-13T20:55:00 1734123300

Sorry, you were a victim of the outage caused by HN flooding my website! It's back online now if you want to give it a try :)

bioxept · on Oct 22, 2024

How do you curate the content you consume? And how do you prevent yourself from consuming non-curated content and loosing yourself in it?

mfro · on Oct 22, 2024

Feed readers and self discipline I would guess. I don't want to pay or host a feed reader right now, and I'm bad at self discipline, so I just limit what social media I use to HN and some blogs.

bioxept · on Aug 26, 2024

It actually was a lot of fun. I could see myself exploring some more worlds in the future.

bioxept · on Aug 26, 2024

I was easily able to escape the guardrails by buying a teleportation stone at the merchant. It allowed me to explore different parts of the town, allowing me to free my magical creature from the town hall and traveling to an emerald dimension where I attacked the entire village.

Definitely a crazy ride when you leave the main storyline and just do whatever you like.

chrisnolet · on Aug 27, 2024

Hahah, alright Merlin! That is so far from the narrative arc I had planned, I don’t even know what to say, lol.

bioxept · on Jan 21, 2024

What a weird post and that ad is unacceptable.

bioxept · on Dec 20, 2023

It seems to me that the buzz-word "vector db" leads to people not fully understanding what it's actually about and how it even relates with LLMs. Vector databases or nearest neighbor algorithms (as they were called before) were already in use for lots of other tasks not related to language processing. If you look at them from that perspective, you will naturally think of vector dbs as just another way of doing plain old search. I hope we get some more advancements in hybrid search. Most of the times, search is the limiting factor when doing RAG.

james-revisoai · on Dec 20, 2023

Good points... In many ways, before LLMs, vectors were getting so exciting, Sentence Transformers and BERT embeddings felt so instrumental, so powerful... work by the txtai author (especially things like semantic walking) felt incredible and like the next evolution. It's a shame in a way that all the creative and brilliant uses of text embeddings from similarity embeddings didn't really have any time to shine or go into product before ChatGPT made so much except search use cases obsolete..

Der_Einzige · on Dec 20, 2023

Btw - I published a paper at EMNLP with the txtai author (David) about using semantic graphs for automatic creation of debate cases!

https://aclanthology.org/2023.newsum-1.10/

Happy to see that David's excellent work is getting the love that it deserves!

dmezzetti · on Dec 21, 2023

Thanks for the nice words on txtai. There have been times this year I've thought about an alternate 2023 where the focus wasn't LLMs and RAG.

ChatGPT certainly set the tone for the year. Though I will say you haven't heard the last of semantic graphs, semantic paths and some of that work that did happen in late 2022 right before ChatGPT. A bit of a detour? Yes. Perhaps the combination is something that will lead to features even more interesting - time will tell.

charcircuit · on Dec 20, 2023

>It's a shame in a way that all the creative and brilliant uses of text embeddings from similarity embeddings didn't really have any time to shine or go into product before ChatGPT

Yes, it did. Companies that offer competitive search or recommendation feeds were all using these text models in production.

james-revisoai · on Dec 20, 2023

I was running one of them, and entering kaggle competitions throughout 2021 and 2022 using them. Many efforts and uses of Sentence-transformers (and new PhD projects) were thrown in the trash with Instruct GPT models and ChatGPT. I mean it's like developing a much better bicycle (lets say an ebike) but then cars come out. It was like that.

The future looked incredibly creative with cross-encoders, things like semantic paths, using the latent space to classify - everything was exciting. A all-in-one LLM that eclipsed embeddings on all but speed for these things was a bit of a kill joy.

Companies that changed existing indexing to use sentence transformers aren't exactly innovating; that process happened once or twice a decade for the last few decades. This was parents point I believe, in a way. And tbh, the improvement in results has never been noticeable to me; exact match is actually 90% of the solution to retrieval(maybe not search) already - we just take it for granted because we are so used to it.

I fully believe in a world without GPT-3, HN demos would be full of sentence transformer and other cool technology being used for demos and in creative ways, compared to how rarely you see them.

Der_Einzige · on Dec 20, 2023

Also, people seem to have forgotten that the whole technique behind sentence transformers (pooling embeddings) works as a form of "medium term" memory in-between "long term" (vectorDB retrieval) and "short term" (the prompt).

You can compress a large N number of token embeddings into a smaller N number of token embeddings with some loss of information using pooling techniques like what was in sentence transformers.

But I've literally gotten into fights here on HN with people who claimed that "if this was so easy people would be doing it" and other BS. The reality is that LLMs and embedding techniques are still massively undetooled. For another example, why can't I average pool tokens in ChatGPT, such that I could ask "What is the definition of {apple|orange}". This is notably easy to do in Stable Diffusion land and also even works in LLMs - despite that even "greats" in our field will go and fight me in the comments when I post this[1] again and again, desperately trying to get a properly good programmer to implement it for production use cases...

[1] https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

wahnfrieden · on Dec 20, 2023

Share use cases?

charcircuit · on Dec 20, 2023

>Many efforts and uses of Sentence-transformers (and new PhD projects) were thrown in the trash with Instruct GPT models and ChatGPT.

There still exists a need for fast and cheap models where LLMs do not make sense.

bioxept · on Oct 21, 2023

That’s cool. Also adding “R”s at various positions leads to interesting effects.

For example:

C[[[RC[[[RC[[[CR[[[C[[C[C[FCR]]FRR]]RRRCRRRF]]FFFRFFFFRR]]RRRRCRRFFFFFFFFF]]FFFRFFFFFFFFFFFFFFFFFFFFRR]]RRRRRRFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF]]FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFCFFFFFFFFFFCFFFFFFFFFFFFFFFFRR]]CC

bioxept · on July 16, 2023

You will probably have different problems when this happens.

bioxept · on July 5, 2023

https://www.marc-julian.de

Writing about data science, own projects and tech in general. You will also find some „today I learned“ posts where I share stuff that I found out while studying and working.

bioxept · on June 12, 2023

How does this compare to a small discord server? What can it do better?

asim · on June 12, 2023

So it's private and invite only by default. There is no element of "public" and the focus is on real connections with people you know rather than people you've never met in person. There's also no notifications or "is typing". The goal is really not to try emulate existing chat or bombard you with the need for immediate replies. It's a lot more focused on how most of our lives and mine personally operates now which is with less reaction and more laid back. Group sizes are also capped at 20.

nathants · on June 12, 2023

no is-typing notifications is a killer feature. sold.