The reason why Langchain is pointless is that it's trying to solve problems on t...

rchaves · on July 9, 2023

100% this! What is worse is that LangChain hides their prompts away, I had to read the source code and mess with private variables of nested classes just to change a single prompt from something like RetrievalQA, and not only that, the default prompt they use is actually bad, they are lucky things work because GPT-3.5 and GPT-4 are damn smart machines, with any other open LLM, things break. I was hoping for good defaults, but they are not, the prompt I wrote over 6 months ago little after the launch of ChatGPT to do some of the same things work much better.

Would you have anything you can share with us about those "several features using highly sophisticated LLM chains that do all manner of reasoning", I'm really curious about the challenges, the process and insights there

sjnair96 · on July 9, 2023

Can you share some insights/examples, if you can, on how you improved the prompts? One I feel is particularly poor is the next question generation/past question condensation prompts which are used to refine the user's input based on the history, so that the query includes all the context required for the question, and hence, incorporating "memory".

rchaves · on July 10, 2023

Yeah I never know where memory goes exactly in langchain, it's not exactly clear all the time. But sure, the main insight I remember is this, take a look at their MULTI_PROMPT_ROUTER_TEMPLATE: https://github.com/hwchase17/langchain/blob/560c4dfc98287da1...

It's a lot of instructions for an LLM, they seem to forget an LLM is an auto-completion machine, and which data it is trained on. Using <<>> for sections is not a normal thing, it's not markdown, which probably the thing read way more often on the internet, instead of open json comments, why not type signatures, instead of so many rules, why not give it examples? It is an autocomplete machine!

They are relying too much on the LLM being smart because they probably only test stuff in GPT-4 and 3.5, but with GPT4All models this prompt was not working at all, so I had to rewrite it, for simple routing, we don't even need json, carying the `next_inputs` here is weird if you don't need it.

So this is my version of it: https://gist.github.com/rogeriochaves/b67676977eebb1936b9b5c...

It's so basic it's dumb, yet it is more powerful, as it does not rely on GPT-4 level intelligence, it's just what I needed

rchaves · on July 9, 2023

this inspired me on writing a new section in my project "Prompts on the outside" (https://github.com/rogeriochaves/litechain#prompts-on-the-ou...)

Der_Einzige · on July 8, 2023

Much of why this stuff is not reusable is that eventually someone in the NLP world is going to properly migrate the features for promopt engineering that the coomers over in stable-diffusion/automatic1111 land have "pioneered", such as token weighting, negative prompts, token averaging, or etc. Literally all of these techniques work with regular LLMs (if you don't believe me, see here: https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...). NLP folks just haven't built the right tooling for it. Particularly sad since there's supposed to be an "Automatic1111 for LLMs" project called "Oogabooga" but it doesn't have any of the good features.

The future of LLM prompting will involve highly specialized and engineered prompts, much as is the case with most images seen on civit.ai

We are all likely to eventually throw away a lot of our current prompts

sandGorgon · on July 9, 2023

Automatic111 is the domain of Jupyter - desktop experimentation. When you go into production, there are tons of additional pieces of complexity that start hitting you - like prompt routing. So the problem space is different.

We have a simple concept - Generative AI is config management. We model it on top of config management grammar that is proven to work in large production config - jsonnet.

A trivial example is this - https://github.com/arakoodev/EdgeChains/blob/main/Examples/r...

Do u think this is something that works for you ?

krainboltgreene · on July 9, 2023

> where there is some private excitement that we've built an AGI

This is a great litmus test for if you need to get a reality check.

teaearlgraycold · on July 8, 2023

100% agreed. I've used GPT professionally and we would try out different hosts, AI21, etc. and it there were always clear quality issues with just re-using your prompt and hyperparameters. Some of that was down to other models being lesser quality, but we'd also need to re-tune prompts when upgrading to new OpenAI models for the best effect. It turns out that LLMs aren't quite a commodity.

digitcatphd · on July 9, 2023

This is precisely why open source models will be limited. Most of the capabilities distinguishing GPT and later Gemini are emergent behaviors from the large parameter count the open source community is saying is not needed (at least for now).

8f2ab37a-ed6c · on July 8, 2023

Any chance you might have shared some of these hard-earned lessons, so that the rest of us could learn from them as well?

tensor · on July 8, 2023

And then they release an updated GPT that breaks all your tuned prompts.

viktor-ferenczi · on July 11, 2023

That's part of the reason why we need LLMs to run locally (on our own or rented infrastructure). Another reason is protecting the company IP. None of the medium/large corporations want their IP to be leaked to AI providers.

meghan_rain · on July 8, 2023

Especially where "updated" actually means "lobomized to be less offensive"

space_fountain · on July 8, 2023

How do you deal with the prompt iteration phase and how coupled is that to the DAG phase? I've only worked on a few proofs of concept in this phase, but a thing I struggled with was a strong desire to allow non technical colleagues to mess with the prompts. It wasn't clear to me how much the prompts need to evolve in tandem with the the DAG and how much they can exist separately

LASR · on July 8, 2023

There are a few increasingly harder things when it comes to prompt customization:

1. Prompts ask LLM to generate input for the next step

2. Prompts ask LLM to generate instructions for the next step

3. Prompts ask LLM to generate the next step

Doing #3 across multiple steps is the promise of Langchain, AutoGPT et al. Pretty much impossible to do with useful quality. Attempting to do #3 very often either ends up completing the chain too early, or just spinning in a loop. Not the kind of thing you can optimize iteratively to good enough quality at production scale. "Retry" as a user-facing operation is just stupid IMO. Either it works well, or we don't offer it as a feature.

So we stopped doing 3 completely. The features now have a narrow usecase and a fully-defined DAG shape upfront. We feed some context on what all the steps are to every step, so it can understand the overall purpose.

#2, we tune these prompts internally within the team. It's very sensitive to specific words. Even things like newlines affects quality too much.

#1 - we've found it's doable for non-tech folks. In some of the features, we expose this to the user somewhat as additional context and mix that in with the pre-built instructions.

So #2 is where it's both hard to get right and still solvable. Every prompt change has to be tested with a huge number of full-chain invocations on real input data before it can be accepted and stabilized. The evaluation of quality is all human, manual work. We tried some other semi-automated approaches, but just not feasible.

All of this is why there is no way Langchain or anything like it is currently useful to built actually valuable user-facing features at production scale.

remmargorp64 · on July 9, 2023

What if you built a scoring system for re-usable action sequences that are stored in a database, and then have the LLM generate alternate solutions and grade them according to their performance?

An action sequence of steps could be graded according to whether it was successful, it’s speed, efficiency, cleverness, cost, etc.

You could even introduce human feedback into the process, and pay people for proposing successful and efficient action sequences.

All action sequences would be indexed and the AI agent would be able to query the database to find effective action sequences to chain together.

The more money you throw at generating, iterating, and evolving various action sequences stored in your database, the smarter and more effective your AI agent becomes.

zicon35 · on July 10, 2023

Would love to see an open-source version of the internal Langchain you built and what you did differently from an architecture standpoint that made it better in your use-case.

sandGorgon · on July 9, 2023

this is precisely the problem i encountered and tried to solve with Edgechains. we think Generative AI is a config management problem (like Terraform or Kubernetes).

>None of this stuff is reusable. Langchain is attempting to set up abstractions to reuse everything. But what we end up with a mediocre DAG framework where all the instructions/data passing through is just garbage. The longer the chain, the more garbage you find at the output.

chains X prompts X LLMs == pods X services X nodes in Terraform.

So we model it on top of config management grammar that is proven to work in large production config - jsonnet.

A trivial example is this - https://github.com/arakoodev/EdgeChains/blob/main/Examples/r...

Would love to get an example of complex chains (even if u have an ARxiv paper) that you think we could solve in Edgechains-jsonnet ?

applgo443 · on July 8, 2023

I saw your comment, got curious, and looked at a lot of your old comments. Lots of interesting insights - Thanks for sharing them.

If you don't mind me asking, what do you do? I'm a researcher at FAANG working on language models and starting a new company in the space. Would love to connect. Feel free to email me - idyllic.bilges0p@icloud.com

moneywoes · on July 8, 2023

In that case what pattern do you use for integrations?