Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've setup and use Vicuna-13b for text classification, summarization and topic modelling. Works like a charm.

It's also good for math lessons.



Maybe others' experiences are different, but I find smaller models to work just as well for "reductive" tasks.

Dolly sucks for generating long-form content (not very creative) but if I need a summary or classification, it's quicker and easier to spin up dolly-3b than vicuna-13b.

I suspect OpenAI is routing prompts to select models based on similar logic.


Would like to know how you setup this. A posts would be awesome.


There are various posts online on how to set it up, either for Linux or Windows. There was an older post here on how to install opt-65b on a mac studio ultra, and smaller models on mac pros. There was also a post if I remember correctly about running vicuna-7b on an iPhone.

Here are a few examples:

https://morioh.com/p/55296932dd8b

https://www.youtube.com/watch?v=iQ3Lhy-eD1s

https://news.ycombinator.com/item?id=35430432

Side note. You need bonkers hardware to run it efficiently. I'm currently using a 16-core cpu, 128G RAM, a Pcie 4.0 nvme and an RTX 3090. There are ways to run it on less powerful hardware, like 8cores, 64GB RAM, simple ssd and an RTX 3080 or 70, but I happen to have a large corpus of data to process so I went all in.


I think the previous comment is more interested in your experience with your large data: what are you doing with it?

I have similar hardware at home, so I wonder how reliably you can process simple queries using domain knowledge + logic which work on on mlc-llm, something like "if you can chose the word food, or the word laptop, or the word deodorant, which one do you chose for describing "macbook air"? answer precisely with just the word you chose"

If it works, can you upload the weights somewhere? IIRC, vicuna is open source.


If these problems are all very similar in structure, then you may not need an LLM. Simple GloVe or W2V may suffice with a dot product. The. You can plow through a few terabytes by the time the LLM goes through a fraction of that.


There's an online demo of Vicuna-13b where you can test its efficiency:

https://chat.lmsys.org/


Yes, but can you replicate that functionality using llama.cpp?

If so, what did you run with main?

I haven't been able to get an answer, while for the question above, I can get 'I chose the word "laptop"' with mlc-llm


For the tasks I need it the efficiency is similar to the online model. Only slower. I don't care for conversational functionality.


After two prompts I was astounded by the innacuracies present in the answers. An they were pretty easy questions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: