Are Open-Source Large Language Models Catching Up?

lhl · on Dec 1, 2023

A couple big/strong open models that have just been released the past few days:

* Qwen 72B (and 1.8B) - 32K context, trained on 3T tokens, <100M MAU commercial license, strong benchmark performance: https://twitter.com/huybery/status/1730127387109781932

* DeepSeek LLM 67B - 4K context, 2T tokens, Apache 2.0 license, strong on code (although DeepSeek Code 33B it benches better) https://twitter.com/deepseek_ai/status/1729881611234431456

Also recently released: Yi 34B (with a 100B rumored soon), XVERSE-65B, Aquila2-70B, and Yuan 2.0-102B, interestingly, all coming out of China.

Personally, I'm also looking forward to the larger Mistral releasing soon as mistral-7b-v0.1 was already incredibly strong for its size.

idiliv · on Dec 1, 2023

I've tried out DeepSeek on deepseek.com and it refuses conversations about several topics censored in China (Tiananmen, Xi Jinping as Winnieh-the-Pooh).

Has anyone tried if this also happens when self-hosting the weights?

simcop2387 · on Dec 1, 2023

I haven't tried that base model yet but I have tried with the coder model before and experienced similar things. A lot of refusals to write code if the model thought that it was unethical or could be used unethically. Like asking it to write code to download images from an image gallery website would work or not depending on what site it thought it was going to retrieve from.

vunderba · on Dec 1, 2023

I just tried the GGUF 7b model of Deepseek and it let me ask some questions about some pretty sensitive topics - Uighur Muslims, Tank man, etc.

https://huggingface.co/TheBloke/deepseek-llm-7B-chat-GGUF

idiliv · on Dec 1, 2023

When I try out the topics you suggest at the huggingface endpoint you link, the answer is either my question translated into Chinese, or no answer when I prompt the model in Chinese:

<User>: 历史上的“天安门广场的坦克人”有什么故事？ <Assistant>:

vunderba · on Dec 1, 2023

Interesting - I can't speak to the Huggingface endpoint. I downloaded the 4-bit GGUF model locally and ran it through Oobabooga with instruct-chat template - I expressed my questions in English.

John9 · on Dec 1, 2023

Since it's not allowed to use ChatGPT in China, there is a huge opportunity to build a local LLM.

wenyuanyu · on Dec 1, 2023

Anyone know what is the reason why both OpenAI and Anthropic proactively banned users from China from using their products...

wavemode · on Dec 1, 2023

Source for this? I know China's government firewall blocks ChatGPT (for obvious reasons) but I wasn't aware that OpenAI was blocking them in return.

pototo666 · on Dec 1, 2023

Chinese IP are not allowed to use ChatGPT. Chinese credit card is not allowed for OpenAI API.

Source: my own experience.

What puzzles me most is the second restriction. My credit card is accepted by AWS, Google, and many other services. It is also accepted by many services which use Stripe to process payments.

But OpenAI refuses to take my money.

SheinhardtWigCo · on Dec 1, 2023

Perhaps they are unwilling to operate in a territory where they would be required to disclose every user's chat history to the government, which has potentially severe implications for certain groups of users and also for OpenAI's competitive interests.

Aerbil313 · on Dec 5, 2023

PRISM.

dspillett · on Dec 1, 2023

> Chinese credit card is not allowed for OpenAI API.

A lot of online services don't accept Chinese credit cards, hosting providers for instance, so I don't think that is specific to OpenAI. The reason usually given for this is excessive chargebacks of (in the case of hosting) TOS violations like sending junk mail (followed by a charge-back when this is blocked). It sounds a like collective punishment a little: while I don't doubt that there are a lot of problem users coming from China, with such a large population that doesn't indicate that any majority of users from the region are a problem. I can see the commercial PoV though: if the majority of charge-back issues and related problems come from a particular region and you get very few genuine costumers from there¹ then blocking the area is a net gain despite potentially losing customers.

----

[1] due to preferring local variants (for reasons of just wanting to support local, due to local resources having lower latency, due to your service being blocked by something like the GFW, local services being in their language, any/all the above and more)

hnfong · on Dec 2, 2023

It's definitely not a commercial thing but political.

I'm located in Hong Kong and using Hong Kong credit cards have never been a problem with online merchants. I don't think Hong Kong credit cards are particularly bad with chargebacks or whatever. OpenAI has explicitly blocked Hong Kong (and China). Hong Kong and China, together with other "US adversaries" like Iran, N. Korea, etc are not on OpenAI's supported countries list.

If you have been paying attention, you'll know that US policy makers are worried that Chinese access to AI technology will pose a security risk to the US. This is just one instance of these AI technology restrictions. Ineffectual of course given the many ways to workaround them, but it is what it is.

Aerbil313 · on Dec 5, 2023

> Chinese access to AI technology will pose a security risk to the US

There’s no security risk I can see except the fact products like ChatGPT increase productivity of a lot of people in a nation so much. Economic.

GaggiX · on Dec 1, 2023

I don't understand, if ChatGPT is blocked by the firewall, how do you know that ChatGPT is blocking IPs in return? Are there chinese IP ranges that are not affected by censorship that a citizen can use?

dantondwa · on Dec 1, 2023

When a website is blocked by the firewall, it doesn’t load.

When a website blocks Chinese users, the website loads but you cannot create an account.

Yes, the firewall does not block everything, otherwise it would be the same as turning off the internet! There are websites that work.

GaggiX · on Dec 1, 2023

Okay but the point is that ChatGPT is blocked by the firewall.

EDIT: I read the comment below about Hong Kong, but I can't reply because I'm typing too fast by HN standards, so I'm writing it here and yolo: "I'm from Italy and I remember when ChatGPT was blocked here after the Garante della Privacy complaint, of course the site wasn't blocked by Italy but OpenAI complies with local obligations, so maybe it could be a reason about the block. API were also not blocked in Italy."

EDIT 2: if the website is not actually blocked (the websites that check if a website is reachable by mainland China lied to me) then I guess they are just complying to local regulations so that the entire website does not get blocked.

deadfoxygrandpa · on Dec 1, 2023

it's not blocked by the firewall. i'm in china and i can load openai's website and chatgpt just fine. openai just blocks me from accessing chatgpt or signing up for an account unless i use a VPN and US based phone number for signup

as in, if i open chat.openai.com in my browser without a VPN, from behind the firewall, i get an openai error message that says "Unable to load site" with the openai logo on screen

if the firewall blocks something the page just doesn't load at all and the connection times out

hnfong · on Dec 1, 2023

In so far as Hong Kong IPs are "Chinese IPs", we can access OpenAI's website, but their signup and login pages blocks Hong Kong phone numbers, credit cards and IP addresses.

Curiously, the OpenAI API endpoints works flawlessly with Hong Kong IP addresses as long as you have a working API key.

btzs · on Dec 1, 2023

OpenAI API is not blocked. You can set up your own front-end like Chatbot UI.

rfoo · on Dec 1, 2023

ChatGPT was not blocked by the GFW when it first released for a few weeks (if not months, I don't remember), but at that time OpenAI already blocked China.

The geo check only happened once during login at that time, with a very clear message that it's "not available in your region". Once you are logged in with a proxy you can turn off your proxy/VPN/whatever and use ChatGPT just fine.

rmbyrro · on Dec 1, 2023

Have you tried with a prepaid card? Some even allow you to fund it with crypto.

renonce · on Dec 1, 2023

Yeah that’s how most users in China access OpenAI. But it’s inconvenient for the majority of people nevertheless.

hnfong · on Dec 1, 2023

OpenAI does not allow users from China, including Hong Kong.

Hong Kong generally does not have a Great Firewall, so the only thing preventing Hong Kong users from using ChatGPT is Open AI's policy. They don't allow registration from Hong Kong phone numbers, from Hong Kong credit cards, etc.

I'd say it's been pretty deliberate.

Reason? Presumably in alignment with US government policies of trying to slow down China's development in AI, alongside with the chips bans etc etc.

suslik · on Dec 1, 2023

Sounds plausible - this is in line with the modern trend to posture by sanctioninig innocent people.

Of course, the only demographic these restrictions can affect are casuals. Even I know how to cirumvent this; thinking that this could hinder a government agent - who surely have access to all the necessary infrastructure by default - is simply mental.

laborcontract · on Dec 1, 2023

Now former board member was a policy hawk. One of big beliefs is that china is at no risk of keeping up with US companies, due to them not having the data.

I wouldn't be surprised if OpenAI blocking China is a result of them trying to prevent them from generating synthetic training sets.

btzs · on Dec 1, 2023

My theory was that they operate at loss and they don't want increase that loss by offering it to adversaries.

skripp · on Dec 1, 2023

I live in China. You can't use it here easily. Even if you use a VPN you still need a non-Chinese phone number.

finnjohnsen2 · on Dec 1, 2023

I would love to know how they technically they can stop you once you run VPN. Do you have any idea on that?

deadfoxygrandpa · on Dec 1, 2023

i know how: you need a verified phone number to open an account, and open ai does not accept chinese phone numbers or known IP phone numbers like google voice.

they also block a lot of data center IP addresses, so if you're trying to access chatgpt from a VPN running on blacklisted datacenter IP range (a lot of VPN services or common cloud providers that people use to set up their own private VPNs are blacklisted), then it tells you it can't access the site and "If you are using a VPN, try turning it off."

hnfong · on Dec 2, 2023

OpenAI requires a working phone number to sign up, and a credit card to use various features.

So they just block the phone numbers (which has a country code), and credit cards (which owner/issuer's country info is available).

Not sure why this seems to be such a surprise to everyone here...

FooBarWidget · on Dec 1, 2023

Probably because of the cost of legal compliance. Various AI providers also banned Europe because until they were ready for GDPR compliance. China has even stricter rules w.r.t. privacy and data control: a lot of data must stay inside China while allowing authorities access. Typically implementing this properly requires either a local physical presence or a local partner. This is why many apps/services have a completely segregated China offering. AWS's China region is completely sealed off from the rest of AWS, and is offered through a local partner. Similar story with Azure's China region.

jacquesm · on Dec 1, 2023

Probably the realization that this is an arms race of sorts.

magpi3 · on Dec 1, 2023

Baidu has a Chatgpt clone that I use regularly.

https://yiyan.baidu.com

I imagine it is good enough for most people.

GaggiX · on Dec 1, 2023

I'm curious in knowing why you've opted for this model over ChatGPT-3.5. Is it because it performs better in Chinese?

magpi3 · on Dec 1, 2023

Chatgpt is blocked in China including Hong Kong, so my school computer doesn't have access to it. I also am a very very casual AI user

szatkus · on Dec 1, 2023

Given I'd get through registration can I talk with it in English?

magpi3 · on Dec 2, 2023

moffkalast · on Dec 1, 2023

Given the subdomain name, I presume it uses the Yi-34B model?

magpi3 · on Dec 1, 2023

I have no idea, but yiyan is short for wenxinyiyan（文心一言), which roughly translates to character-heart-one-(speech/word). Maybe someone who is Chinese could translate it better. So I don't think the name has anything to do with the model.

I do wonder what their backend is. They have the same 3.5/4 version numbering scheme that ChatGPT uses, which could be just marketing (and probably is), but I wonder.

EDIT: fixed my translation

nanmu42 · on Dec 1, 2023

Their backend originates from Baidu ERNIE: http://research.baidu.com/Blog/index-view?id=160

antonvs · on Dec 1, 2023

“A single word from the heart”

yzh · on Dec 1, 2023

AFAIK, model behind yiyan is Baidu's ERNIE. Yi-34B (and Yi model family) comes from another startup created by Kai-fu Lee earlier this year: 01.ai.

blueboo · on Dec 1, 2023

Which is why there are 100+ LLMs in China ... the so-called 百模大战, battle of the 100 models.

eunos · on Dec 1, 2023

Let 100+ LLMs bloom

civilitty · on Dec 1, 2023

I was thinking more of a Thunderdome setup.

ekianjo · on Dec 1, 2023

> Also recently released: Yi 34B (with a 100B rumored soon), XVERSE-65B, Aquila2-70B, and Yuan 2.0-102B, interestingly, all coming out of China.

most AI papers are from Chinese people (either from mainland China or from Chinese ancestry living in other countries). They have a huge pool of brains working on this.

purplecats · on Dec 1, 2023

when is the new mistral coming out and at what size?

tarruda · on Dec 1, 2023

I'm hoping that they make it 13B, which is the size I can run locally in 4-bit and still get reasonable performance

kybernetikos · on Dec 1, 2023

What kind of system do you need for that?

michaelt · on Dec 1, 2023

If your GPU has ~16GB of vram, you can run a 13B model in "Q4_K_M.gguf" format and it'll be fast. Maybe even ~12GB.

It's also possible to run on CPU from system RAM, to split the workload across GPU and CPU, or even from a memory-mapped file on disk. Some people have posted benchmarks online [1] and naturally, the faster your RAM and CPU the better.

My personal experience is running from CPU/system ram is painfully slow. But that's partly because I only experimented with models that were too big to fit on my GPU, so part of the slowness is due to their large size.

[1] https://www.reddit.com/r/LocalLLaMA/comments/14ilo0t/extensi...

gardnr · on Dec 1, 2023

I can fit 13B Q4 K M models on a 12GB RTX 3060. It OOMs when the context window goes above 3k. I get 25 tok/s.

tarruda · on Dec 1, 2023

I get 10 tokens/second on a 4-bit 13B model with 8GB VRAM offloading as much as possible to the GPU. At this speed, I cannot read the LLM output as fast as it generates, so I consider it to be sufficient.

johnbellone · on Dec 1, 2023

Which video card?

tarruda · on Dec 1, 2023

RTX 3070 Max-Q (laptop)

tarruda · on Dec 1, 2023

Mine is a laptop with i7-11800h CPU + RTX 3070 Max-Q 8GB VRAM + 64GB RAM (though you can get probably get away with 16GB RAM). I bought this system for work and causal gaming, and was happy when I found out that GPU also enabled me to run LLMs locally at good performance. This laptop costed me ~= $ 1600, which was a bargain considering how much value I get out of it. If you are not on a budget, I highly recommend getting one of the high end laptops that have RTX 4090 and 16GB VRAM.

With my system, Llama.cpp can run Mistral 7B 8-bit quantized by offloading 32 layers to the GPU (35 total) at about 25-30 tokens/second, or 6-bit quantized by offloading all layers to the GPU at ~ 35 tokens/second.

I've tested a few 13B 4-bit models such as Codellama and got about 10 tokens/second by offloading 37 layers to the GPU. Got me about 10-15 tokens/second.

disiplus · on Dec 1, 2023

i have lenovo legion with 3070 8GB and was wondering should i use that instead of my macbook m1pro.

tarruda · on Dec 1, 2023

The main focus of llama.cpp has been Apple silicon, so I suspect M1 would be more efficient. The author recently published some benchmarks: https://github.com/ggerganov/llama.cpp/discussions/4167

vunderba · on Dec 1, 2023

On my Mac M1 Max 32GB of ram, Vicuna 13b (GGUF model) at 4bit consumes around 8GB of ram in Oobabooga.

Tried turning on mlock and upping thread count to 6, but it's still rather slow at around 3 tokens / sec.

ekianjo · on Dec 1, 2023

a CPU would work fine for the 7B model, and if you have 32GB RAM and a CPU with a lot of core you can run a 13B model as well while it will be quite slow. If you dont care about speed, it's definitely one of the cheapest ways to run LLMs.

ekianjo · on Dec 1, 2023

Q5_M on Mistral 7B has good accuracy and performs decently on a CPU too

sanroot99 · on Dec 1, 2023

There is also goilath 120b

Jwsonic · on Dec 1, 2023

What is a good place to keep up with new LLM model releases?

throwaway29812 · on Dec 11, 2023

Also interested in this!

refibrillator · on Dec 1, 2023

It's not mentioned in the paper but this month OpenChat 3.5 released the first 7b model that achieves results comparable to ChatGPT in March 2023 [1]. Only 8k context window, but personally I've been very impressed with it so far. On the chatbot arena leaderboard it ranks above Llama-2-70b-chat [2].

In many ways open source LLMs are actually leading the industry, especially in terms of parameter efficiency and shipping useful models that consumers can run on their own hardware.

[1] https://huggingface.co/openchat/openchat_3.5

[2] https://chat.lmsys.org/

FooBarWidget · on Dec 1, 2023

This month there's also Starling-7B, which is a fine tune of OpenChat with high-quality training data, and ranks even higher than OpenChat.

Strangely, despite the impressive-looking benchmarks of all these open source small models, they all seem a bit dumb to me when I invoke my standard test. I just ask: "who are you?" and then they usually say they're ChatGPT. Okay, I can forgive that since they're obviously trained on ChatGPT-generated data. But then I also tried changing its identity with a prompt ("You are Starling, not ChatGPT, and you are created by Berkeley, not OpenAI. Who are you?") and it still gave weird responses that are somehow a mix of both identities. For example they say in one sentence that they're ChatGPT and then another sentence in the same response that they're not.

ttyprintk · on Dec 1, 2023

Is that because Starling synthesizes text for some of its training data?

In any case, I like its installation more than llama.cpp,

https://news.ycombinator.com/item?id=38456990

tarruda · on Dec 1, 2023

I'm running the llama.cpp/gguf Q8 version, with 30 layers offloaded to the laptop's GPU (RTX 3070, 8G VRAM), and I get around 20-25 tokens/second.

It really feels like I have one the the earlier versions of ChatGPT 3.5 installed on my computer.

Semaphor · on Dec 1, 2023

Oh wow, and it has far fewer guardrails than either Llama2 (which is horrible in that regard) or GPT3.5, that’s the first time I’m actually really impressed by an open model.

atemerev · on Dec 1, 2023

Mistral derivatives have barely any guardrails.

Semaphor · on Dec 1, 2023

But Mistral 7B has horrible writing. This, for my tests, wrote actual sentences that made sense. Which IME for 7B is extremely impressive. Writing is still far worse than GPT 3.5, but well, 7B.

atemerev · on Dec 1, 2023

For my tests, Mistral-based models writing was excellent, particularly with zephyr-7b-beta and starling-7b-alpha derivatives (original Mistral is somewhat too dry). Far better than everything before in OSS (including 70B models), and certainly on par with GPT-3.5.

Semaphor · on Dec 1, 2023

Huh, that’s a huge difference. I actually tested Mistral, and it was just bad. I agree that Zephyr is very similar to gpt3.5

GaggiX · on Dec 1, 2023

https://openchat.team/ is the link if you want to test the model online.

pityJuke · on Dec 1, 2023

Is it hallucinating (whether that be through sheer chance, or trained to think it is GPT), or is it pointing at the wrong place? https://imgur.com/a/YOF6szw or https://imgur.com/a/fkgkfRO

zwily · on Dec 1, 2023

Probably just trained on lots of GPT-4 output.

moffkalast · on Dec 1, 2023

Apparently trained on lots of refusals too, speaks to the high competence of whoever was setting up the dataset. It's one string regex to filter them out and get more performance for fucks sake.

pityJuke · on Dec 1, 2023

Oh, right, I remember hearing that was a technique to train LLMs. Interesting that it impacts it in such a way.

LoganDark · on Dec 1, 2023

> Only 8k context window

Is this supposed to be low? All the chat models I've used top out at 4096.

Sai_ · on Dec 1, 2023

GPT-4-turbo is at 128k. Claude 2.1 is 200k. But yes, among open source models 8k is roughly middle to top of the pack.

smeagull · on Dec 1, 2023

The problem with those numbers is they hit the internal limit before you use all those tokens. There's a limit to how many rules or factors their conditional probability model can keep track of. Once you hit that having a bigger context window doesn't matter.

LoganDark · on Dec 1, 2023

That's insane. The highest I've personally seen in the open-source space is RWKV being trained on (IIRC) 4k but being able to handle much longer context lengths in practice due to being an RNN (you can simply keep feeding it tokens forever and ever). It doesn't generalize infinitely by any means but it can be stretched for sure, sometimes up to 16k.

It's not a transformer model though, and old context fades away much faster / is harder to recall because all the new context is layered directly on top of it. But it's quite interesting nonetheless.

zozbot234 · on Dec 1, 2023

> It's not a transformer model though, and old context fades away much faster / is harder to recall because all the new context is layered directly on top of it.

That's a well known limitation. But if you actually know that a "context" comprises multiple sentences (or other elements of syntax) and that any ordering among them is completely arbitrary, the principled approach is to RNN-parse them all in parallel and sum the activations you end up with as vectors - like in bag-of-words model, essentially enforcing commutativity on the network: that's pretty much how attention-based models work under the hood. The really basic intuition is just that a commutative and associative function can be expressed (hence "learned") in terms of vector sum modulo some arbitrary conversion of the inputs and outputs.

LoganDark · on Dec 2, 2023

> That's a well known limitation.

I know. I did a lot of work on state handling in rwkv.cpp

viraptor · on Dec 1, 2023

The numbers are high, but whether 8k is low depends on your use case. Do you want to process whole book chapters, or feed lots of related documents at the same time? If not, and you're just doing a normal question/answer session with some priming prompt, 8k is already a lot.

kolinko · on Dec 1, 2023

8k is very little if you want to add almost any additional data in context, or have a more complicated prompt.

Otherwise your knowledge retrieval needs to be almost spot on for llm to provide a proper reply.

Ditto with any multi shot prompts.

jimmyl02 · on Dec 1, 2023

to be fair, I think the ability of these models to actually use these contexts beyond the standard 8k / 16k tokens is pretty weak. RAG based methods are probably a better option for these ultra long contexts

lhl · on Dec 1, 2023

Haystack testing on GPT-4's 128K context suggests otherwise: https://twitter.com/SteveMoraco/status/1727370446788530236

jeswin · on Dec 1, 2023

> I think the ability of these models to actually use these contexts beyond the standard 8k / 16k tokens is pretty weak.

For 32k GPT4 contexts, that's not accurate. GPT4 Turbo is a bit weaker than GPT4-32k, but not to the extent that you claim.

kolinko · on Dec 1, 2023

Are you talking about claude or Gpt4 as well? Anybspecific examples where ChatGPT4 fails for long contexts?

lhl · on Dec 1, 2023

Most 4K models can use context window extension to get to 8K reasonably, but you're starting to see 16K, 32K, 128K (see YaRN for example) tunes become more common, or even a 200K version of Yi-34B.

LoganDark · on Dec 2, 2023

> see YaRN for example

YaRN is to blame for making llama.cpp misbehave if you accidentally zero-initialize the llama_context_params structure rather than calling llama_context_default_params :)

(guess how I know...)

ekianjo · on Dec 1, 2023

openchat is very impressive indeed. I think it may be better than Mistral, while comparisons are not always easy.

tudorw · on Dec 1, 2023

I'm finding Mistral good at creative literature and it is fairly adept at taking instructions, good enough for my purposes, and running locally on consumer CPU, the future of open source local models looks bright.

Tostino · on Dec 1, 2023

It depends on what you're doing... Just for reference, here is a small showcase of the capabilities that I've trained on a 13 billion parameter llama2 fine tune (done with qlora).

https://old.reddit.com/r/LocalLLaMA/comments/186qq92/comment...

Edit: Embed some of the content instead.

Inkbot can create knowledge graphs. The structure returned is proper YAML, and I got much better results with my fine-tune than using GPT4.

https://huggingface.co/Tostino/Inkbot-13B-8k-0.2

Simple prompt: https://gist.github.com/Tostino/c3541f3a01d420e771f66c62014e...

Complex prompt: https://gist.github.com/Tostino/44bbc6a6321df5df23ba5b400a01...

It also does chunked summarization.

Here is an example of chunking:

Part 1: chunked summarization - https://gist.github.com/Tostino/cacb1cecdf2eb7386baf565d157f...

Part 2: summary-of-summaries - https://gist.github.com/Tostino/81eeee9781e519044950332b4e64...

Here is an example of a single-shot document that fits entirely within context: https://gist.github.com/Tostino/4ba4e7e7988348134a7256fd1cbb...

reactive001 · on Dec 1, 2023

Amazing work, I've really wanted to get into knowledge graph generation with LLM's for the last year but haven't found the time. Glad to see someone making good progress on the idea!

How are you going about generating training data?

Tostino · on Dec 1, 2023

Lots and lots of manual review of very detailed instructions to "more powerful LLMs" with 2-4 prompts to generate the training data.

thecal · on Dec 1, 2023

I really like Inkbot! Are you working on a new version? How about one from Yi 34B?

Tostino · on Dec 1, 2023

Yeah, I will be soon.

I was busy adding `chat template` support to vLLM recently, so the model (and any others that implement it properly) will work seamlessly with a clone of the OpenAI chat/completions endpoint.

https://github.com/vllm-project/vllm/pull/1756

Now that I have that out of the way, back to model training ;).

thecal · on Dec 1, 2023

Very cool, looking forward to it!

anon373839 · on Dec 1, 2023

This looks really impressive. Any chance a 7B Inkbot is in the works?

Tostino · on Dec 1, 2023

Yeah, i'll be training a Mistral variant soon (and some of the larger, newer models as well).

I had a few dataset issues I know about that I wanted to fix first.

anon373839 · on Dec 1, 2023

That’s great to hear. Thanks!

cced · on Dec 1, 2023

Have any references on how you fine tuned?

Tostino · on Dec 1, 2023

Sure thing, I used axolotl, and my training parameters were:

sequence_len: 6144 lora_r: 128 lora_alpha: 48 learning_rate: 0.00006 warmup_steps: 600 lr_scheduler: cosine gradient_accumulation_steps: 4 micro_batch_size: 1 num_epochs: 4 optimizer: paged_adamw_32bit flash_attention: true sample_packing: true

kaycebasques · on Dec 1, 2023

Thanks for sharing all these details throughout the thread, Tostino. True open source spirit.

fintechie · on Dec 1, 2023

We're nearing a point where we'll just need a prompt router in front of several specialised models (code, chat, math, sql, health, etc)... and we'll have a local Mixture of Experts kind of thing.

  1. Send request to router running a generic model.
  2. Prompt/question is deconstructed, classified, and proxied to expert(s) xyz.
  3. Responses come back and are assembled by generic model.

Is any project working on something similar to this?

b_mc2 · on Dec 1, 2023

I also think this is the route we are heading, a few 1-7B or 14B param models that are very good at their tasks, stitched together with a model that's very good at delegating. Huggingface has Transformers Agents which "provides a natural language API on top of transformers: we define a set of curated tools and design an agent to interpret natural language and to use these tools"

Some of the tools it already has are:

Document question answering: given a document (such as a PDF) in image format, answer a question on this document (Donut)

Text question answering: given a long text and a question, answer the question in the text (Flan-T5)

Unconditional image captioning: Caption the image! (BLIP)

Image question answering: given an image, answer a question on this image (VILT)

Image segmentation: given an image and a prompt, output the segmentation mask of that prompt (CLIPSeg)

Speech to text: given an audio recording of a person talking, transcribe the speech into text (Whisper)

Text to speech: convert text to speech (SpeechT5)

Zero-shot text classification: given a text and a list of labels, identify to which label the text corresponds the most (BART)

Text summarization: summarize a long text in one or a few sentences (BART)

Translation: translate the text into a given language (NLLB)

Text downloader: to download a text from a web URL

Text to image: generate an image according to a prompt, leveraging stable diffusion

Image transformation: modify an image given an initial image and a prompt, leveraging instruct pix2pix stable diffusion

Text to video: generate a small video according to a prompt, leveraging damo-vilab

It's written in a way that allows the addition of custom tools so you can add use cases or swap models in and out.

https://huggingface.co/docs/transformers/transformers_agents

reexpressionist · on Dec 2, 2023

I like the analogy to a router and local Mixture of Experts; that's basically how I see things going, as well. (Also, agreed that Huggingface has really gone far in making it possible to build such systems across many models.)

There's also another related sense for which we want routing across models for efficiency reasons in the local setting, even for tasks for the same input modalities:

First, attempt prediction on small(er) models, and if the constrained output is not sufficiently high probability (with highest calibration reliability), route to progressively larger models. If the process is exhausted, kick it to a human for further adjudication/checking.

ekianjo · on Dec 1, 2023

it's kind of trivial today.

the first layer could be a mix of nlp and zero-shot classification to clarify the nature of the request. Then using LLM deconstruct the request into several specific parts that would be sent to specialized LLMs. Then stitch it back together at the end again with LLM as the summarization machine.

Problem is running so many LLMs in parallel means you need quite a bunch of resources.

fintechie · on Dec 1, 2023

Yeah, it shouldn't be too difficult to build this with python. I wonder why none of the popular routers like https://github.com/BerriAI/litellm have this feature.

> Problem is running so many LLMs in parallel means you need quite a bunch of resources.

Top of line MacBooks or Minis should be able to run several 7B or even 13B models without major issues. Models are also getting smaller and better. That's why we're close =)

generalizations · on Dec 1, 2023

Could lora fine tunes be used instead of completely different models? I wonder if that would save space.

amilios · on Dec 1, 2023

Yeah that would save disk space! In terms of inference, you'd still need to hold multiple models in memory though, and I don't think we're that close to that (yet) on personal devices. You could imagine a system that dynamically unloads and reloads the models as you need them in this process, but that unloading and reloading would be pretty slow probably.

ilaksh · on Dec 1, 2023

https://github.com/predibase/lorax does this, it's not that slow, since LoRAs aren't usually very big.

Kubuxu · on Dec 1, 2023

With a fast NVME loading a model is only 2-3s.

ij23 · on Dec 1, 2023

I'm the LiteLLM maintainer, can you elaborate what you're looking for us to do here?

adam_smith123 · on Dec 1, 2023

Idk a paper literally just came out showing that improved prompting of bigger general models was generally superior to specialized models.

https://arxiv.org/pdf/2311.16452.pdf

StrauXX · on Dec 1, 2023

It was rumored a few months ago that this is how GPT-4 works. A controller model routing data to expert models. Perhaps also by running all the experts and comparing probabilities. So far as I know thats just speculation based on a few details leaked on Xitter though.

trash_cat · on Dec 1, 2023

It does not explain why it's so expensive to run.

yeldarb · on Dec 1, 2023

Yeah, check out LLaVA-Plus (they call the experts in your vocabulary "tools") https://github.com/LLaVA-VL/LLaVA-Plus-Codebase

hskalin · on Dec 1, 2023

Yeah I thought that is how GPT4 works (remember reading it somewhere). Some 10-11 expert models in an ensemble

smusamashah · on Dec 1, 2023

Is that what an MoE is? I thought LLMs in an MoE talk to each other and come up with the response together.

susi22 · on Dec 1, 2023

Semantic Kernel is something like that

Havoc · on Dec 1, 2023

For self hosted it seems likely that swapping out the fine tuning lora on the fly is a better option

theturtle32 · on Dec 1, 2023

This is how image generation works for DALL-E 3 via ChatGPT.

yawnxyz · on Dec 1, 2023

What's the best model for health right now?

thorum · on Dec 1, 2023

Current ~70B models like LLAMA 2 70B are on par wih ChatGPT 3.5. The best smaller models can appear on par at first glance, but they hallucinate at a much higher rate and lack knowledge of the world. GPT 4 ‘gets’ things at a deeper level and no open source model is even close.

A year is a good timeframe to evaluate things: the rest of the world seems to lag behind OpenAI by around 12-18 months, at least with LLMs and image generation.

On the other hand open source tech usually has additional features for controlling output that OpenAI never bothers to implement, like llama.cpp’s grammars or ControlNet. So in that sense open source is usually ahead of OpenAI in terms of customizability.

avereveard · on Dec 1, 2023

On the other hand gpt model are converging down. Gpt4 turbo degraded performance so much that now certain 13b produce more consistent results in reasoning. I've a marathon test here for example https://chat.openai.com/share/dfd9b9ae-7214-4dd7-ad20-7ee07a... with purposefully open ended and somewhat ambiguous request to see how models perform and gpt4 turbo chat is just not that good it confuses persons out, didn't pick the right one for abduction, didn't change topic when requested, when recalling persons picked the one from the wrong set, when asked to change language it didn't... It know a lot when asked zero shot questions, but when proving it's self consistency and attention it is nowhere near gpt4.

infecto · on Dec 1, 2023

I don't think think using examples derived from ChatGPT are a fair comparison to the underlying models. OpenAI has many optimization tricks on the ChatGPT side that are unrelated to the underlying models being used.

We do know of course that ChatGPT is most likely using 4-Turbo from the decrease in latency and increase in unhelpful answers.

We cannot say that the models are "converging down" though. I don't remember the marketing materials but from the model side we all realize that the Turbo models have some type of quantization/optimization that makes them cheap and fast. 4-Turbo is 3x cheaper than 4, substantially quicker and provides better results than 3.5-Turbo. Amazing progress in my arena.

Workaccount2 · on Dec 1, 2023

There were many rumors (and it probably was true) that OpenAI was hemorrhaging cash on GPT4 requests. So it makes tons of sense for them to sprint towards a turbo model at the expense of some ability. GPT4-turbo still is ridiculously powerful anyway.

BoorishBears · on Dec 1, 2023

[flagged]

cooperaustinj · on Dec 1, 2023

Your credibility is killed by thinking using an API can guarantee which model you're getting. It's entirely black box. If OpenAI wants to lie to you, they can.

BoorishBears · on Dec 1, 2023

Your credibility is killed by thinking paying AWS for a c5.metal can guarantee which compute you're getting. It's entirely black box.

If AWS wants to torpedo their business for marginal gain they can. And you won't be able to tell just because your workload falls on its face.

Your comment is the textbook definition of FUD in a comical way.

avereveard · on Dec 1, 2023

The point is exactly that the model people are experiencing is converging down with every subsequent update, and I even mentioned that it's nowhere near the orig gpt4, idk possibly read it again slower instead of jumping to credibility and whatnot.

BoorishBears · on Dec 1, 2023

[flagged]

dang · on Dec 1, 2023

Yikes, you broke the site guidelines really badly in this thread, not just in this comment but in several other places including these:

https://news.ycombinator.com/item?id=38490123

https://news.ycombinator.com/item?id=38484632

We have to ban accounts that post this way. It's not what this site is for, and destroys what it is for. Moreover, we've had warn you about this more than once over the years.

I don't want to ban you, so if you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules from now on, we'd appreciate it. That means no swipes and no personal attacks, among other things.

Edit: you've unfortunately been doing this in other recent threads too:

https://news.ycombinator.com/item?id=38453843

https://news.ycombinator.com/item?id=38453381

https://news.ycombinator.com/item?id=38377068

We really are going to have to ban you if this keeps up, so if you'd please make whatever change is needed to not post like this again, that would be good.

avereveard · on Dec 1, 2023

> you can't compare it to the "original" via the web ui

Good thing then that I was comparing the chat offering to 7b and 13b open models then, maybe you didn't catch that, try a third reading.

Also, this kind of is a chatgpt thread. I know it's not a perfect comparison, but still. It's a discussion point I'm presenting, not a research paper.

BoorishBears · on Dec 1, 2023

[flagged]

kaycebasques · on Dec 1, 2023

BoorishBear, please stop with the aggressive, condescending tone. You seem to have some productive counterarguments but they're getting lost in your disrespectful language.

BoorishBears · on Dec 1, 2023

[flagged]

kaycebasques · on Dec 1, 2023

I don't have a feature to flag your comments and your profile does not point to a way for me to message you privately so this was my only option to call you out

What you call "tone policing" I call "violating numerous HN guidelines":

> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

> When disagreeing, please reply to the argument instead of calling names.

> Please don't sneer, including at the rest of the community.

> Please don't post shallow dismissals, especially of other people's work.

dang · on Dec 1, 2023

You can flag comments as described here: https://news.ycombinator.com/newsfaq.html#cflag

BoorishBears · on Dec 1, 2023

It's not your job to call out anyone on HN, and it adds precious little to the conversation.

kaycebasques · on Dec 2, 2023

> It's not your job to call out anyone on HN

You may not personally agree with it, but there's nothing in the HN guidelines or comments from mods saying that I can't do it

> it adds precious little to the conversation

The conversations would have ended a long time ago if you're vitriolic tone was considered acceptable and normal

See dang's response to your comments: https://news.ycombinator.com/item?id=38493663

seattleeng · on Dec 1, 2023

I dont think OpenAI is ever going to ahead in image generation, they were lapped very soon after dall-e and every real workflow Ive seen uses Midjourney or Stable Diffusion. The reverse (GPT 4 vision) is well ahead of open source though

vunderba · on Dec 1, 2023

The original is leagues behind anything current, but DALL-E version 3 absolutely blows any state of the art generative model out of the water, including mid journey 5.2 and SDXL in terms of pure prompt accuracy and coherence.

Midjourney still has the edge in quality, but it's a moot point if it takes you 1000 v-rolls to get to your original vision.

If all you're generating is anime waifus then MJ/NovelAI/Niji will suffice, but generating prompts particularly featuring relatively complex scenes or actions are amazing on DALL-E 3.

And of course unfortunately, it goes without saying that open AI DALL-E is going to be the most restrictive in terms of censorship.

I generated these from DALL-E 3 instantly. Try to generate them in any other commercial offering. Go ahead. I'll wait...

https://imgur.com/a/2GTRjfK

Descriptions:

A 80s photograph of the Koolaid Man breaking through the Berlin Wall.

Comic illustration set at a festive children's party. The main focus is on the magician who looks uncannily like a well-known fictional wizard. He's trying to say abracadabra but accidentally uses the killing curse.

mattnewton · on Dec 1, 2023

SDXL has controlnet for other kinds of non-text input (like scribbles or just masks). The results are much easier to control in my opinion (a picture is worth thousands of prompt words).

For pure prompt coherence though I think ideogram is not far behind dalle 3.

vunderba · on Dec 1, 2023

EDIT: okay, I just tried Ideogram. It's not terrible and seems to do an okay job on text generation but I'd still say its a distant second compared to DALL-E 3. However, having the ability to maintain image continuity to make refinements of your initial image based on corrections like: "Make the building larger", or "He should have a more prominent forehead" is a game changer (e.g. InstructPix2Pix) and DALL-E 3's the only one that's got it.

Ideogram comparisons at bottom:

https://imgur.com/a/2GTRjfK

vunderba · on Dec 1, 2023

SDXL and even some SD 1.5 checkpoints are great. My current workflow is:

1. Generate initial draft image in DALL-E 3 (iterate as necessary)

It's essentially the ONLY good InstructPix2Pix model.

2. Bring into InvokeAI

Inpaint with stuff that might be considered censored in DALL-E 3.

I'd like to see some proof of Ideogram - it looks... very mobile/instagrammy from the landing page. If you have an account, try out my prompts I'd like to see what you're able to produce.

FooBarWidget · on Dec 1, 2023

> Midjourney still has the edge in quality, but it's a moot point if it takes you 1000 v-rolls to get to your original vision.

I can corroborate this. I wanted about 6 images for a presentation. I rolled ~300 MidJourney images. Most of them looked great, but none of them did what I wanted. I rolled ~50 DALL-E 3 images.

In the end, I only picked DALL-E 3 images. They were qualitatively not as good as MidJourney. For example when you zoom in then you see distortions. Or they're a bad fit for 16:9 format. But only DALL-E 3 was able to draw the things I wanted.

Terretta · on Dec 1, 2023

> Try to generate them in any other commercial offering. Go ahead. I'll wait...

For interest's sake, this is from the second /imagine on MidJourney (so one of the second set of 4 images):

https://imgur.com/a/YyWHppb

While yours is what you'd want, this arguably looks more like the super cheesy children's TV commercials back in the day and beats the ideogram take.

The Midjourney generations all appear to be referencing Halloween costumes or terrible cosplays, as if there are no trademarked koolaid men in their training set.

vunderba · on Dec 1, 2023

yeah, I did some rolls of this image for MJ but that was back in v4 and wasn't very impressed - doesn't look like its made much progress. The original commercials while silly looking are very visually identifiable as the Koolaid man.

I remember hearing that the first versions of MJ used the LAION image set for training data - I'd be curious to see if it has any training data containing the Koolaid man.

I did a search through my MJ history from the past year and added the results to the imgur link to include my attempts at generating the Koolaid man from v3/v4/v5.2.

https://imgur.com/a/2GTRjfK

Terretta · on Dec 1, 2023

If you limit to the "6+" aesthetic set, there are zero for koolaid man, two for koolaid:

https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/im...

And only three hundred for berlin wall:

https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/im...

zamadatix · on Dec 1, 2023

Strong disagree on this from me as well. DALL-E 3 is miles ahead of the latest Midjourney/Stable Diffusion in image generation. The only real area it falls short vs the other options right now is in how nannying it can be.

kubrickslair · on Dec 1, 2023

I have found OpenAI to be the most superior in complex prompts especially where written messages like “Get better, Mom” are expected in the images. The distant second would be ideogram.

I am using these tools to send custom personal messages to close friends and family.

ben_w · on Dec 1, 2023

LLMs perhaps (I'm not sure either way, everything moves too quickly), but SDXL 1.0 (July 26, 2023) was a lot better than DALL•E 2 (6 April, 2022). I think DALL•E 3 (August 10, 2023) is a bit better than SDXL, but other than text generation their quality seems very close to me.

(That said, perhaps I'm Clever Hands-ing myself by only using SDXL for what it's good at. It's terrible at dragons every time I've tried that…)

bugglebeetle · on Dec 1, 2023

Function-calling with a JSON schema is about as reliable as llama.cpp’s grammar stuff. I’ve not had any trouble with it.

infecto · on Dec 1, 2023

The only thing I would argue is that JSON generation and function calling have noticeable decrease in quality of output in certain uses. I have had a hard time writing tests to measure it but its noticeable for my human eyes when I compare various implementations I have written.

kaycebasques · on Dec 1, 2023

No comment from me on the question in the title (because I don't know enough to have an opinion), but since others are discussing various open models I will mention another that I've been enjoying tonight: DeepSeek 67B

https://chat.deepseek.com

(This chat UI has adequately replaced my ChatGPT needs so far.)

https://huggingface.co/deepseek-ai/deepseek-llm-67b-base

https://twitter.com/abacaj/status/1730019229175312612

pram · on Dec 1, 2023

I’ve found Mistral OpenOrca is pretty much as good as GPT4-turbo for creative writing/analysis. Actually it tends to output very similar text, which is suspicious, but whatever it saves me a lot of money.

https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca

SushiHippie · on Dec 1, 2023

Also openchat, which was trained on gpt4 conversations IIUC.

https://github.com/imoneoi/openchat

bugglebeetle · on Dec 1, 2023

Mistral OpenOrca is very good at task following as well. Its slightly less reliable than GPT 3.5/4, but the difference in quality for my text processing tasks is pretty much a toss-up.

alfalfasprout · on Dec 1, 2023

Long term it's almost unavoidable that open source LLMs start catching up. One factor that's worth considering too is cost. The open source community is much more resource constrained and they've really accelerated the pace of development in <30B parameter models.

YetAnotherNick · on Dec 1, 2023

Google and Meta and all the funded companies also are not even close to GPT 4, so I doubt cost is the biggest factor. Claude is the only model that is decent other than OpenAI's.

adastra22 · on Dec 1, 2023

My understanding is that the models are pretty comparable, but nobody's reinforcement training set is not nearly as good as OpenAI's, so they're able to fine-tune their model to give more accurate results.

hyperliner · on Dec 1, 2023

This is an industry where cost will be an issue. It reminds me of Rackspace and others trying to win with OpenStack “because open.” AWS and Azure won. Even Google is third.

The big players will win, and there will be a niche for open tools.

devjab · on Dec 1, 2023

Google only lost because they couldn’t re-adjust their business for their paid products to not be similar to their advertising products.

I can only speak for the European enterprise scene, but AWS came first and in the beginning they went a very “Googley” route of not having very great support and very little patience for local needs. Then Azure came along with their typical Microsoft approach to enterprise, which is where you get excellent support and you get contacts into Microsoft who will actually listen and make changes, well, if the changes align with what Microsoft wants. I know Microsoft isn’t necessarily a popular company amongst people who’ve never interacted with them on an Enterprise level, but they really are an excellent it-business partner because they understand that part of being an Enterprise partner is that they let CTOs tell their organisation that they know X is having issues but that Microsoft headquarters is giving them half-hourly updates by phone. Sort of useless from a technical perspective, immensely useful for the CTO when 2000 employees can’t log into Outlook. Another good example is how when Teams rolled out with being on for all users by default, basically every larger organisation in the world went through the official channels and went “nonononono” and a few hours later it was off by default.

Now, when Amazon first entered the European market they were very “Googley” as I said, but once they realized Microsoft business model was losing them customers, they changed. We went from having no contacts to having an assigned AWS person and from not wanting to adopt the GDPR AWS actually became more compliant than even what Azure currently is.

Google meanwhile somehow managed to make the one product they were actually selling (education) worse than it was originally, losing billions of dollars on all the European schools who could no longer use it and be GDPR compliant. The Chinese cloud options obviously had similar data privacy issues to Google and never really became valid options. At least not unless China achieves the same sort of diplomatic relationship with the EU that the US has, which is unlikely.

So that’s the long story of why only two of the major cloud providers “won”. With the massive price increase, however, more and more companies are especially Azure for their own setups. This isn’t necessarily a return to having your own iron in the basement, often it’s going to smaller cloud providers and then having a third party vendor set something like Kubernetes up.

Right now, Microsoft is winning the AI battle. Not so much because it’s better, but because it comes with Office365. Office365 which was already a sort of monopoly on Office products, but is now even more so. A good example is again how Teams became dominant, even though it wasn’t really the best option for a while and is now only the best option because of how it integrates directly with your Sharepoint online which is where most enterprise orgs store documents these days. So too is copilot currently winking the AI battle for organisations who can’t really use a lot of the other options because of data privacy issues. So while copilot isn’t as good as GPT, it’s still what we are using. But if it ever gets too expensive, it’s not as secure as you may think. Especially not if we start seeing more training sets, or EU and US relations worsens.

I think the most likely outcome, at least here in the EU, is that anti-completion laws eventually takes a look at Office365 because of how monopolised it is. Or the EU actually follows through on their “a single vendor is a threat to national security” legislation and force half of the banking/energy/defense/andsoon industries to pick something other than Microsoft. Which will be hilariously hard, but if successful (which it probably won’t be because it’s hilariously hard) will lead to more open products.

visarga · on Dec 1, 2023

> anti completion laws

did you mean anti-competitive laws? don't scare me with "anti-completion laws", please, I still want to have AI

davidkunz · on Dec 1, 2023

Out of personal experience, open source LLMs did not yet reach the quality of GPT 3.5, despite multiple claims with dubious benchmarks. That said, they are already useful as of today and can even run on your local machine. I regularly use them with my Neovim plugin gen.nvim [1] for simple tasks and they save me a lot of time. I'm excited about the future!

[1]: https://github.com/David-Kunz/gen.nvim

tarruda · on Dec 1, 2023

Very interesting.

I want to give it a try, but I see that one of the dependencies is "ollama" which is a Mac App and I don't have a Mac.

I'm running Llama models locally using llama-cpp-python which provides an OpenAI compatibility layer.

davidkunz · on Dec 1, 2023

It should work on Linux and also with WSL. You could also try running it in Docker.

tarruda · on Dec 1, 2023

I see. Does ollama have an http API (this the curl requirement)? If so, is it compatible with OpenAI API?

davidkunz · on Dec 1, 2023

Yes, it works with an Ollama server and the communication is done via HTTP. I know that someone configured my plugin to talk to OpenAI.

rkwz · on Dec 1, 2023

Oh, can install Ollama in Linux/WSL as well

czk · on Dec 1, 2023

I'd say they are catching up for sure, especially with how GPT4 has been regressing consistently over the past month. https://chat.openai.com/share/c91287ee-9a5e-4c99-b5df-49cc45...

tarruda · on Dec 1, 2023

I suspect a lot of the "catching up" was achieved by using GPT-4 API to generate high quality fine-tune datasets.

drakenot · on Dec 1, 2023

I've been somewhat disappointed with the performance of the open models.

The claims of certain models outperforming GPT-3.5-Turbo and approaching GPT-4 fail to hold up to their benchmark results in real-world scenarios, potentially due to data contamination in assessments, based on my testing.

As noted in the linked survey paper, some models may outperform 3.5-Turbo in specific, narrow areas, depending on the model. Yet, we still lack a general model that definitively exceeds 3.5-Turbo in all respects.

I'm concerned that while we're still striving to reach 3.5-Turbo's performance level, OpenAI may unveil a new next-generation model, further widening the performance gap! Back in the summer, I had higher hopes that we would have surpassed the 3.5 threshold by now.

The performance gap has been surprisingly large. It is especially noticeable in areas requiring consistent structured output or tool use from the LLM. This is where open models particularly falter.

snowycat · on Dec 1, 2023

There are tools that you can use to force the model to give structured output, such as llama.cpp GBNF grammar for example. They're a bit harder to use than ask gpt4 but they do work pretty well for what I use it for.

raincole · on Dec 1, 2023

"OpenAI has no moat" aged so badly that it's almost a satire.

Tostino · on Dec 1, 2023

Try out my model vs gpt4 for the same tasks (I explicitly trained on) and compare. https://huggingface.co/Tostino/Inkbot-13B-8k-0.2

It's a 13b param model that isn't meant to be general purpose, but is meant to excel on the limited tasks I've trained on.

You'll see more like this soon.

choxi · on Dec 1, 2023

Any suggestions for creating training data? Did you just manually create your own dataset or did you use any synthetic methods?

Tostino · on Dec 1, 2023

Absolutely, pick a complicated problem and keep breaking it down with an existing model (whatever sota) until you have a consistent output for each step of your problem.

And then stitch all the outputs together into a coherent single response for your training pipeline.

After that you can do things like create q&a pairs about the input and output values that will help the model understand the relationships involved.

With that, your training loss should be pretty reasonable for whatever task you are training.

The other thing is, don't try and embed knowledge. Try and train thought patterns when specific knowledge is available in the context window.

choxi · on Dec 10, 2023

Interesting, thank you!

tarruda · on Dec 1, 2023

Is this a Llama 2 fine tune?

Tostino · on Dec 1, 2023

Yeah, check out some of the showcase I posted above with some more info: https://news.ycombinator.com/item?id=38482347

LoganDark · on Dec 1, 2023

OpenAI has the benefit that it's a hosted service. Even if you can set something up at home, not everybody wants to do that.

Tostino · on Dec 1, 2023

I'm not competing with OpenAI... I did a whole bunch of work, and released it for anyone who wants to use it.

It does what I trained it on well. Use it if you want to, or don't. Either way.

LoganDark · on Dec 1, 2023

Never meant to imply anything against your model. The fact that you released one at all is still more than I have to say for myself.

ben_w · on Dec 1, 2023

It was a cliché almost immediately — "cliché" being just the fancy name we use when humans act like stochastic parrots.

smy20011 · on Dec 1, 2023

I tried to use Sakura LLM to translate some JP novels. It's really good and half the price of GPT3.5 turbo.

https://github.com/SakuraLLM/Sakura-13B-Galgame/tree/dev_ser...

eunos · on Dec 1, 2023

It seems that it only supports Japanese to Chinese only.

yieldcrv · on Dec 1, 2023

Amethyst Mistral 13B q5 gguf is what I’m using most of the time now. Synthetic datasets are great to finetune with, there is no moat for having inaccessible literature data sets

I’m offline now because I’ve had too many ideas and domain names registered too soon after conversing with Chat GPT4

I’m open to the idea of people reacting to similar stimuli that cause ideas to be done at the same time, but I didn't like that experience and I can run these models on my M1 with LM Studio so easily

I do think some chats get flagged when the model says something seems novel, like Albert Einstein working at the patent office. Not worth making it my whole identity in wanting to prove, just the catalyst I needed to try 7B and 13B models seriously and I’m quite pleased

tarruda · on Dec 1, 2023

Is "Amethyst Mistral 13B" a llama fine tune? I searched for it on huggingface and only found the GGUF version, the link to the original model is broken

Havoc · on Dec 1, 2023

Mistral would be the base model

tarruda · on Dec 1, 2023

Can a 7B base model be fine tuned into a 13B model? Mistral is 7B, this one is 13B.

Havoc · on Dec 1, 2023

Two models can be merged from what I understand to result in different total param sizes.

That’s different from a fine tune

I gather the results of merges can be unpredictable though

dkarras · on Dec 1, 2023

you think OpenAI employees watch your conversations and register your domain names? Or that OpenAI has a system in place where they try to profit from registering domain names people talk about?

yieldcrv · on Dec 1, 2023

or somebody in between, yes. random contractor, intern, someone at the data center, an analytics package nobody put scrutiny on, who knows but the difference doesn't matter after the experience, its a vulnerability surface we all know exists and have to trust at all times no matter what assurance we get, as it could change at any time

although I find the model to be very agreeable, it will disagree and generally tell me when it finds a concept "novel" if I identified a friction, I think certain words can be flagged for review to stand out in the sea of conversations it has

matchagaucho · on Dec 1, 2023

If ChatGPT were only 1 LLM, then maybe.

But it's a Mixture of Experts (MoE) architecture, which I think makes open source comparisons unfair?