I work with Langchain on a daily basis now, and so often I find myself asking; do I really need a whole LLM framework for this? At this point, the assistant I am writing, will likely be more stable rewritten in pure Python. The deeper and more complex the application becomes, the more of a risk Langchain seems to become to keeping it maintainable. But even at less complex levels, if I want to do this:
1. Have a huge dataset of documents.
2. Want to ask questions and have an LLM chat conversation based on these documents.
3. Be able to implement tools like math, wiki or Google search on top of the retrieval.
4. Implement memory management for longer conversations.
Its still a lot more straightforward to maintain it in Python. The only thing where it becomes interesting is having agents execute async, which is not that easy replicate, but at the moment agents are not that helpful. Not trying to diss Langchain too much here, because its such an awesome framework, but I can't help seeing past it other than just being a helpful tool to understand LLM's and LLM programming for now.
The most important aspect of langchain is NOT using OpenAI for the LM.
The most useful aspect of using langchain is to use it with Galpaca (or vicuna/koala/etc) to spin up an assistant for your home.
This way, you can push all of your files through it - even petabytes or terabytes of files, at a fraction of the cost - and have it organize things for you. No privacy problems, no extreme costs, just ease of use, low latency, offline, blazingly fast beauty. That's at least the trajectory.
Meta may soon release an improvement to Galactica similar to Galpaca (GeorgiaTech attempt) more officially (perhaps with more multimodal focus), which will likely improve upon the llama based models even further.
ChatGPT is just one model among many here, and it's not even the first to use RLHF (Deepmind, as usual, was a bit earlier).
The simple task of downloading Redpamajas/thePile/etc and getting a vector db for it locally, and enhancing it with local files effectively brings a local Google to everyone, and it may only require a decent spinning disk HD for the DB storage with the typical langchain LLM setup to have a completely local 'jarvis'-like assistant. (Sure, I know some people care about 'news'-like info that requires connectivity, but most things don't)
The vast majority of people building LM apps with (or without) LangChain are using OpenAI.
I sincerely hope local LM tech like Galpaca (or vicuna/koala/etc) succeed but I don't understand why we are collectively pretending they are currently anywhere near gpt-3.5-turbo both in terms of speed and quality. Honestly the local models feel more like first generation BERT/GPT-1 models that have been fine-tuned for QA using RLHF.
The models now are probably closer to GPT-2, and gets very close to GPT-3 with decent hardware (larger param models) and effort.
They're certainly worse when it comes to 'knowledge' or memorized parts of the training data, but a decent bit of this is ameliorated by having "the pile" + other data locally available for reference.
There are some issues that arise from not having decent priors due to that lack of knowledge, which may or may not be important for the given task.
A (perhaps somewhat bad) example may be: if you ask me "what is a good one-liner in bash for parsing xml from a stream", I may give you an answer using xmlstarlet. However, this may not be the best answer - since Xalan can handle XSLT version 3, but xmlstarlet can't (XSLTv3 handles streams).
So if looking up information in the database, some things may be slightly missed like that - but this behavior would be close to what ChatGPT offers (ChatGPT is quite awful in this way most of the time).
You are right that it would miss GPT-4 by a good bit though in these cases, but most people aren't using GPT-4 for this anyway.
Ultimately both can be used. OpenAI can do things you really want it for, such as a programming assistant, or things that may require much more "reasoning" (not a proper word, but conveys the message) capabilities.
Local models can do the really useful base work of completely re-organizing or re-encoding files to free up space if you set it to do so, integrating with a HomeAssistant system, setting up a HomeAssistant system if you want one, answering your vocal 'Alexa/Siri'-like questions completely offline, setting up backup solutions for all your computers, setting up servers that perform more tasks that you may want - essentially a complete personal assistant. OpenAI shouldn't be needed for this, and it is highly desired to not have them do any of this (due to costs and the number of credentials it would give to them).
It's more or less an advanced personal project to stay on top of the LLM learning curve, rather than just being exposed to news and press releases. I also have an appetite for further wrapping my mind around all of this. I already work in the AI space as web-developer on the B2B & enterprise side of things. My opinion here is that there are going to be loads of use-cases and necessary plugins, which for privacy, legal and security reasons need a proprietary solution and won't be able to interface with any third party APIs, plugins or frameworks.
I think that is a phase, which will come to pass, eventually most of the plugins and apps which are going to be popular, will likely run their own models or use open source models. Because the API calls for complex applications to OpenAI are currently far from economical. In the end you will have to charge the user, and that is going to be the crux. As a sole developer, doing loads of experiments on less than 50 documents, I am already crossing 50$ in API calls within a month. I can already run llama.cpp, but its just not good enough; but the cost would effectively be 0$ (not counting my hardware).
We cannot give my company's information to OpenAI/MS. No legal paperwork will change this. This information is so important, it is only on offline computers.
I've experimented with LangChain for my chatbot as well, but ultimately, I resorted to using custom Python. Here are a few issues I faced with LangChain:
- By developing your own solutions, you can engineer specific components that would be provided by LangChain to better suit your use case. For example, by fine-tuning to your use case you can have better results with converation history, context and summarization better by prompt engineering. If you look at prompts within langchain they are pretty basic.
- LangChain is designed around the idea that an entire chat logic resides within a single "REPL loop." In my use case, I had a single-page web app frontend, a standard web "RESTful" backend, and a separate chat service. Different parts of the information are stored and managed by these components. Using LangChain would have forced me to consolidate all logic into the chat service, which doesn't align with the overall architecture of my system beyond just the chat functionality.
Please note that I'm not a LangChain expert, so my assessment might not be entirely accurate about its capabilities. However, based on my evaluation, LangChain introduced too many constraints in comparison to what it provided.
This, I just pretty much ended up using the basic LLMChain and do my own custom flow. The built in agents are close to useless for anything but a toy project; it is simply way too unreliable.
We've been building LLM chains for over a year and found that whilst for simple use-cases it's easy enough to grab a couple of APIs. However, having a Notebook experience built for iterating and collaborating whilst managing the complexity is something we've seen companies care deeply about.
LangChain has been so frequently discussed that I thought it must be this amazing piece of software. I was recently reading about vector databases and how they can be used to provide context to LLMs. I came across a LangChain class called RetrievalQA, which takes in a vector database and a question and produces and answer based on documents stored in the vector db. My curiosity was piqued! How did it work? Well... it works like this:
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:"""
My sense of wonder was instantly deflated. "Helpful Answer:". Seriously? I think LLMs are cool, but this made me realize people are just throwing darts in the dark here.
It got a lot of traction on non-coders thinking it’s doing some magic, but if you read the source it boils down to some brittle prompts and lots of Python class boilerplate.
It’s good for building vector indices without worrying about writing adapters to milvus, pinecone, qdrant etc separately - in case you want to switch out later
The class structure provides a clean API for building on, even if the internals are basic. With some refinement, it could be a good starting point for more advanced models.
I had the same experience and kept wondering if I was missing something important. I'm not a fan of Python, so I was anxious about not using the thing everybody recommended, but for my project I ultimately went with what I know well (C#). I've happily had zero issues.
LangChain docs and tutorials were useful for understanding the popular practices for approaching AI-driven development, but the biggest challenge by far has been getting a baseline prompt and measuring performance of alternative implementations against that in a sensible way that doesn't break the bank. Mitchell Hashimotos's Prompt Engineering article [1] was way more helpful in this regard than anything I saw in LangChain.
To that end I've also been working on a tool to save me money by caching requests and responses, blocking unexpectedly expensive requests, keeping a granular history of requests for prompt cost analysis, etc. Maybe I should open source it and get some VC bux too?
But wait, there is more! In Langchain you can build constitutional chains ontop of your chains, to validate if the answer was really helpful, by doing just one more API call, with a new prompt, asking if the answer answered the question based on the initial prompt, in an helpful way! And if it didn't, revise the answer with another API call to be more helpful! And then you can chain these chains with even further API calls until you went through as much prompts you think are necessary to answer a single sentence question (What is the weather today on the moon?).
LangChain is meant to reduce/remove the amount of boilerplate code needed to build a lot of applications with LLMs. I see your point, but I still think LangChain is useful for a particular segment of early stage developers.
I tried to make a youtube video exploring the code and it was fairly short https://www.youtube.com/watch?v=Joby-58DuBE. I think if the prompts were put front and center in the documentation it would be clear up a lot of mystery.
I'm not sure if "helpful answer" as opposed to "answer" makes much of a difference in answer quality -- I'd believe it helps a little, just don't know that it's been studied -- but a lot of silly stuff like that does definitely make a big difference in response quality on certain tasks. "Let's think step by step:" at the end of your prompt is probably the best-known one: https://arxiv.org/pdf/2205.11916.pdf
It basically gathers text that are similar to what you are asking and feed it into the prompt, yes. No magic. The worse part is that if you ask “please get me the summary to this doc” it will actually search the vector db using the entire question. It’s not very smart. Depending on how you split the embedding you could end up with a bunch of crap
We’re building an easier-to-use langchain that lets you preprocess inputs and remove unnecessary wrapper text by going like “please get me the summary to this {{INPUT}}”
I believe it can reduce tokens. It essentially obfuscates text as well if you want the security theatre of your prompts being hidden. Most importantly it makes more complex chains where multiple questions are asked and uses the response from the llm to decide if you need a further question.
If it makes you feel any better, the researchers building these things don't really know what they're doing either. They just throw data and compute at the problem and hope for the best.
Am I the only one who is not convinced by the value proposition of langchain? 99% of it are interface definitions and implementations for external tools, most of which are super straightforward. I can write integrations for what my app needs in less than an hour myself, why bring in a heavily opinionated external framework? It kind of feels like the npm "left-pad" to me. Everyone just uses it because it seems popular, not because they need it.
For us LangChain actually caused more problems than it solved. We had a system in production which after working fine a few weeks suddenly started experiencing frequent failures (more than 30% of requests). On digging it seems that LangChain sets a default timeout of 60 seconds for every requests. And this behaviour isn't documented! Such spurious decisions made by LangChain are everywhere, and will all eventually come back to bite. In the end we replaced everything with vanilla request clients. Definitely not recommended to build a system on a library that provides very limited value while hiding a huge amount of details and decisions from you.
Langchain is absolutely perfect though, it's bad enough that you'll be driven to write something better out of pure frustration but gives you enough good ideas and breadcrumbs to actually do it.
It's probably the best on-ramp for "practical uses of llms" because it scratches just the right developer itch.
I am slowly coming around to the same conclusion. It isn't always clear how some agent types are different from others. Sometimes the prompts expect JSON blobs and sometimes they expect something else. I tried it out because I could see the potential, but I dont think it's architect-ed in a way that is suitable for things beyond simple PoCs.
It would probably be much better to start with the basic OpenAI API and then build on top of it.
What I find particularly frustrating is the difficulty in easily interfacing with my existing python tools (not like add two numbers, but somewhat complex analytics on top of structured data). If anybody has any success with interfacing with existing tools/scripts, would love to know how people are going about doing it.
It's brilliant for experimentation and prototyping though. Granted I've not deployed anything llm related yet so I have not thought about it yet, but I don't want to just start writing every integration I think I need by hand just to experiment with it.
Yeah, the basics of LangChain are fairly simple, and reimplementing a loop like that in Go, including tool usage, was very straightforward when I was writing Cuttlefish[0] (a toy desktop chat app for ChatGPT that can use stuff like your local terminal or Google).
The magic in LangChain, though, is the ecosystem. I.e. they have integrations with tons of indexes, they have many tool implementations, etc. This is the real value of LangChain. The core ReAct loop is quite trivial (as this article demonstrates).
I got the chance to try Langchain as part of a hiring process. I was already having my eye on it for a personal projects though.
The moment I tried it and went through the docs, the entire abstraction feels weird for me. I know a bit here and there about LLM, but Langchain make me feels like Im learning something entirely new.
How agent and tools work and how to write one wasnt straightforward from the docs, and the idea of having an AI attach itself to an eval or writing its own error/hallucination-prone API request based on a docs doesnt give me a lot of confidence.
The hiring assignment specifically mentioned to use Langchain thought, so I did. But just as a glorified abstraction to call GPT and parses the NL output as JSON.
I did the actual API call, post-processing, etc. manually. Which I have granular control over it. Also cheaper in terms of token usages. You could say I ended writing my own agent/tool that doesnt exactly match Langchain specifications but it works.
I guess Langchain had its use case. But it feels pretty weird to use for me.
I've been working with langchain and llamaindex and did notice that it's a pretty hefty abstraction on top of pretty simple concepts and I also eventually ended up dropping both and simply write the underlying code without the framework on top.
There’s always DSP for those who need a lightweight but powerful programming model — not a library of predefined prompts and integrations.
It’s a very different experience from the hand-holding of LangChain, but it packs reusable magic in generic constructs like annotate, compile, etc that work with arbitrary programs.
I cannot praise Deepset Haystack enough for how simple they make things compared to LangChain, between the Preprocessor, the Reader/Retriever, and the PromptNode - the APIs, docs, and tutorials are quite easy to modify to your use-case.
Not affiliated, just a happy defector from LangChain.
I also was underwhelmed by langchain, and started implementing my own "AIPL" (Array-Inspired Pipeline Language) which turns these "chains" into straightforward, linear scripts. It's very early days but already it feels like the right direction for experimenting with this stuff. (I'm looking for collaborators if anyone is interested!)
As someone who has created several LLM-based applications running in production, my personal experience with langchain has been that it is too high of an abstraction for steps that in the end are actually fairly simple.
And as soon as you want to slightly modify something to better accomodate your use-case, you are trapped in layers & layers of Python boiler plate code and unnecessary abstractions.
Maybe our llm applications haven’t been complex enough to warrent the use of langchain, but if that’s the case, then I wonder how many of such complex applications actually exist today.
-> Anyways, I came away feeling quite let down by the hype.
For my own personal workflow, a more “hackable” architecture would be much more valuable. Totally fine if that means it’s less “general”.
As a comparison, I remember the early days of HugginfaceTransformers where they did not try to create a 100% high-level general abstraction on top of every conceivable Neural Network architecture. Instead, each model architecture was somewhat separate from one another, making it much easier to “hack” it.
Comparing Langchain to Hugging Face Transformers is apples and oranges. One is for research, one is for production. Production ML requires more abstraction, not less.
I disagree. Production systems don't need to be full of AbstractSingletonProxyFactoryBeans which is basically what LangChain is. For example, Linux certainly isn't.
It's worse than that. The documentation is a confusing mess that completely omits the explanation of key default parameters and details. And the abstractions are horrendously brittle. And difficult to fix, because there are too many layers.
The best use of LangChain is probably just looking at the included prompts in the source code for inspiration.
If you know little about prompt engineering and want to throw together a demo of something that kind of works extremely quickly, or experiment with an LLM agent exactly as it's defined in some paper, LangChain is pretty useful.
If you want to develop a real LLM application, you're probably better off skipping the library completely, or at least fully understand each abstraction to make sure it does everything you want before you decide you want to incorporate it.
I’ll repeat what I’ve said in another thread the other day —
To put together a basic question/answer demo that didn't quite fit the LangChain templates, I had to hunt a bunch of doc pages and and cobble together snippets from multiple notebooks. Sure, the final result was under 30 lines of code, BUT:
It uses fns/classes like `load_qa_with_sources_chain` and `ConversationalRetrievalChain`, and to know what these do under the hood, I tried stepping into the debugger, and it was a nightmare of call after call up and down the object hierarchy. They have verbose mode so you can see what prompts are being generated, but there is more to it than just the prompts. I had to spend several hours piecing together a simple flat recipe based on this object hierarchy hunting.
It very much feels like what happened with PyTorch Lightning -- sure, you can accomplish things with "just a few lines of code", but now everything is in one giant function, and you have to understand all the settings. If you ever want to do something different, good luck digging into their code -- I've been there, for example trying to implement a version of k-fold cross-validation: again, an object-hierarchy mess.
VC's are in full blown FOMO mode for things they barely understand. Even the engineering backgrounds are pretty lost; imagine the finance backgrounds that have barely wrote a lick of code.
For me Langchain is glue code between a lot of commonly used LLM building blocks and prompts.
It is great to get a prototype 80% of the way there fast in order to validate an idea or run something short lived.
I suspect that, if you want to go further (simpler code, better control message length, reliability, etc), you will be better served by implementing the functionality you need yourself.
For the calculator tool I suggest instead to just generate Javascript as an output with temperature set to 0 (system prompt set to something along the lines of: "Generate native Javascript code only. Don't provide any explanations. Don't import any extraneous libraries") and then eval that Javascript code in a VM. Deno is a good candidate for this as it has good security settings with access to filesystem and network turned off by default. You can use something like deno-vm [1] to execute it separate from your running process too. Setting GPT-4 as model works even better. I have seen it perform better than Wolfram Alpha in many cases so I am wondering why OpenAI chose to integrate with Wolfram Alpha for this. GPT-4 was able to solve some really complex math problems I threw at it.
Personal experience: was using LangChain and its output parsers for getting structured data. It was having a very high error rates (probably prompt was becoming too long and confusing). But it is a just prompt + some parsing logic. Replaced it with straight asking openAI GPT for json that matches some Rust struct / Python data-class. The errors went down and got one extra dependency out from the project. Tried to use its self hosted embeddings but the implementation (strangely) seemed tied to something called Run-house.
Not to belittle the library, but most of it is a very thin wrapper classes that reek of premature abstraction, couple with hit-n-miss docs. At this point, given the hype, it is primarily optimized for cooking up demoes quickly. But not sure if the valuations or production use is justified.
There are a few ways to use Langchain. Firstly, the docs are a mess. What I personally did, I followed the notebook from OpenAI cookbook on embedding a code base, and one on embedding the docs, and querying over that with GPT-4.
After a while of doing that, I realised like many others that it's too high of an abstraction. In the end I think you're better off just looking at their source code, and just looking at how they've implemented the stuff in normal python and then adapting it for your own needs.
We ported the core of LangChain to Ruby, and while it is way more that 100 lines, I would give similar feedback as the author. Here is the repo if anyone is interested https://github.com/BoxcarsAI/boxcars
The problem with all these new fields is that the first code that gets popular is from people who are good at marketing not those who are good at programming.
we're still in the stage of LLM adoption when we can have "eye-opening" simple discoveries weekly. Langchain has momentum because of this, as a library of simple ideas. this period will end, and if they don't figure out the next step they're gone
1. Have a huge dataset of documents.
2. Want to ask questions and have an LLM chat conversation based on these documents.
3. Be able to implement tools like math, wiki or Google search on top of the retrieval.
4. Implement memory management for longer conversations.
Its still a lot more straightforward to maintain it in Python. The only thing where it becomes interesting is having agents execute async, which is not that easy replicate, but at the moment agents are not that helpful. Not trying to diss Langchain too much here, because its such an awesome framework, but I can't help seeing past it other than just being a helpful tool to understand LLM's and LLM programming for now.