I've setup and use Vicuna-13b for text classification, summarization and topic m...

jstarfish · on May 14, 2023

Maybe others' experiences are different, but I find smaller models to work just as well for "reductive" tasks.

Dolly sucks for generating long-form content (not very creative) but if I need a summary or classification, it's quicker and easier to spin up dolly-3b than vicuna-13b.

I suspect OpenAI is routing prompts to select models based on similar logic.

BOOSTERHIDROGEN · on May 14, 2023

Would like to know how you setup this. A posts would be awesome.

elorant · on May 14, 2023

There are various posts online on how to set it up, either for Linux or Windows. There was an older post here on how to install opt-65b on a mac studio ultra, and smaller models on mac pros. There was also a post if I remember correctly about running vicuna-7b on an iPhone.

Here are a few examples:

https://morioh.com/p/55296932dd8b

https://www.youtube.com/watch?v=iQ3Lhy-eD1s

https://news.ycombinator.com/item?id=35430432

Side note. You need bonkers hardware to run it efficiently. I'm currently using a 16-core cpu, 128G RAM, a Pcie 4.0 nvme and an RTX 3090. There are ways to run it on less powerful hardware, like 8cores, 64GB RAM, simple ssd and an RTX 3080 or 70, but I happen to have a large corpus of data to process so I went all in.

csdvrx · on May 14, 2023

I think the previous comment is more interested in your experience with your large data: what are you doing with it?

I have similar hardware at home, so I wonder how reliably you can process simple queries using domain knowledge + logic which work on on mlc-llm, something like "if you can chose the word food, or the word laptop, or the word deodorant, which one do you chose for describing "macbook air"? answer precisely with just the word you chose"

If it works, can you upload the weights somewhere? IIRC, vicuna is open source.

chaxor · on May 14, 2023

If these problems are all very similar in structure, then you may not need an LLM. Simple GloVe or W2V may suffice with a dot product. The. You can plow through a few terabytes by the time the LLM goes through a fraction of that.

elorant · on May 14, 2023

There's an online demo of Vicuna-13b where you can test its efficiency:

https://chat.lmsys.org/

csdvrx · on May 14, 2023

Yes, but can you replicate that functionality using llama.cpp?

If so, what did you run with main?

I haven't been able to get an answer, while for the question above, I can get 'I chose the word "laptop"' with mlc-llm

elorant · on May 14, 2023

For the tasks I need it the efficiency is similar to the online model. Only slower. I don't care for conversational functionality.

techload · on May 14, 2023

After two prompts I was astounded by the innacuracies present in the answers. An they were pretty easy questions.