My current hypothesis here is that the way to make coding assistants as reliable as possible is to shift the balance towards making their output rely on context provided in-prompt rather than information stored in LLM weights. As all the major providers shift towards larger context-windows, it seems increasingly viable to give the LLM the necessary docs for whatever libraries are being used in the current file. I've been working on an experiment in this space[0], and while it's obviously bottle-necked by the size of the documentation index, even a couple-hundred documentation sources seems to help a ton when working with less-used languages/libraries.
Yeah I've been using it with prompts that ask it to cite sources as well, honestly I think the best results are when I'm still interacting w/ the docs directly in addition to having the LLM look at em - still can't quite replace needing to RTFM!
This is the way forward imo. Particularly as we've started to flesh out the relationship between model size and true context reliability. We've found that raw context-window size is not representative of what the model can actually consistently recall, but we've also found the recall is consistently reliable out to a point. I suspect more robust theoretical models around superposition will move us a long way towards understanding the limits of context reliability rather than what would currently be an experimental approach.
I've been working on Lightrail for a couple months now, just added gpt-4-vision-preview support and figured it was as good a time as any to do a Show HN. It's still very rough around the edges, but I'm hoping it can provide a compelling alternative to siloed per-app AI assistants. So many of my favorite AI workflows involve working across apps, so I wanted to build something that made those workflows easy -- and that can support a community of new integrations and workflows/actions that are simple to throw together. Would love to hear feedback & happy to answer any questions!
Not the GP, but I've been working on an open platform [0] for integrating OpenAI APIs with other tools (VSCode, JupyterLab, bash, chrome, etc) that you might find interesting, the VSCode integration supports editing specific files / sections etc.
Also worth taking a look at Github Copilot Chat[1], it's a bit limited but in certain cases it works well for editing specific parts of files.
Check out marcel: https://marceltheshell.org, and https://github.com/geophile/marcel. Both marcel and nushell start with the idea of piping structured data instead of strings, which is incredibly powerful. (This also applies to osh. I am the author of osh and marcel.)
Marcel (and osh) rely on Python types and language where typical shells have sublanguages. So instead of awk or find and their sublanguages, you just use Python. Instead of piping strings, you pipe streams of Python values.
Marcel lets you use Python on the commmand line. It also has an API which allows you to use shell-like commands inside of Python programs.
- Typed inputs and IntelliSense. There is very basic support for types, even so, I'd love even more strict types and type inference as in Typescript, so my terminal and shell can give me a hint what kind of input the command is expecting. At the same time, IntelliSense should tell me what command flags are still available and what are viable inputs, like cd suggesting only directories, or kubectl --namespace suggesting available namespaces.
- A concept of past commands as building blocks and/or interactive data wrangling. Many times I am mucking around in zsh to find a chain of commands that reliably gets the data I need out of some CSV or other source, retyping the same few commands with some new links until it works the way I want it to.
- A command like EXPLAIN in SQL so I can see where I should rethink what I am doing so I can refactor that part of the chain. At the same time, I'd love it if I could take one of those magic snippets from Stack Overflow and have an EXPLAIN-like command pick the components apart and explain the flags via some structured docstring format.
- Snippets in the Shell for some regularly used patterns of command chains.
- The concept of transactions, like in SQL, so I can run a command or script without worrying about it failing halfway through - the shell automatically undoing its changes. Maybe this should work on the level of a shell session even.
This is basically a function on a streams, that takes an input N, and filters for files that changed in the last N days. To use it:
ls -fr | recent 3
I don't understand what EXPLAIN would do. In SQL, EXPLAIN helps you see the actual implementation chosen for a non-procedural statement. Marcel, nushell, etc. have no optimizer, so there is no invisible execution plan, that would be made visible using EXPLAIN.
Transactions: Might be feasible with a shell built on some kind of COW filesystem, but I'm dubious of how much transaction isolation is really possible even then.
Basically, I think that LLMs will enable a whole new set of app UXes, and I'm trying to build a platform for those UXes. In a sense, a "shell" for LLM apps. It's a command-bar style UI with an SDK that makes it really easy to build functionality on top of LLMs / vector-dbs etc and to interact with other software/files (e.g. VSCode, Chrome, etc.). Currently very limited docs but if you're interested in building LLM workflows/tools, I'd love to collab!
Author of the piece, and I... can't really argue with any of this, tbh. I'll admit both those parts you called out were probably heavily tinged by my own desires, and your more-sober predictions are a very reasonable counterargument for what will happen in the general case. I suppose, especially regarding swappable LLMs, I _do_ only expect it to be an option for devs or sophisticated users; I assume that most folks probably wouldn't care, I'm just hoping there's enough of us that do care that at least some options offer that swappable functionality. Fwiw, I also use Linux (Fedora) as my daily-driver, and I'd be more than content if the predictions from this post came true in a similar vein, e.g. as an OSS option (or family of OSS options) that some subset of users can opt to use.
OP here, as someone who does love cooking, I've gone down this route pretty heavily in the last few years - been growing my collection of physical cookbooks and definitely enjoy flipping through them in search of inspiration. So, yeah, very much endorse the cookbook UX!
I'm the author, and I don't disagree with this at all - I do use LLMs pretty meaningfully in my day-to-day engineering work, and I definitely agree that they hold a ton of promise in situations like the one you mentioned. To be clear, I'm very much bullish on the applications of LLMs to e.g. coding! The point I was trying to make there was just that for _certain_ tasks, the process of chatting with an LLM is, by nature, less precise and more arduous than a purpose-built UX. By analogy, we might have "describe-your-change" functionality in Photoshop, but its no replacement for all the other pixel-perfect editing functionality that it provides, and I'd struggle to imagine a world where Photoshop is ever _replaced entirely_ by a chat UX
Love it! Reminds me of the hilarious "Typing the technical interview" [1] and "Typescripting the technical interview" [2], a couple of my favorite blog posts of all time.
[0]: https://indexical.dev/