If only there was a self-hosted option that I could point at any git repo, that could be called from the CLI with instructions and made eg. a feature branch with the proposed changes committed therein.
As it is, the apparent reliance on github & github actions/apps, and the lack of openness where I can't run the LLM magic myself in an environment I control makes it A) impossible to use in a corporate environment and B) not entirely well aligned with my own FOSS values to use privately.
Edit: to expand, this is the "getting started" section.
- Add the Sweep GitHub app to desired repos
- Create new issue in repo, like "Sweep: Write tests". [...]
- Watch the magic happen
I get that this is how we do software now: walled gardens, handing off your data to some opaque AI box, convenience of a nice integration above control over how and where code runs... but personally, that's not for me.
You can use my open source tool aider from the cli. It is integrated with your local git repo and commits each change from the AI.
Right now only GPT-3.5 and GPT-4 seem to be capable of this sort of "code editing" task. But aider has support for connecting to other/local LLMs and an active community of folks experimenting with them.
Aider's awesome and we're using a few of the ideas from the aider blogs like ctags and the search-and-replace editing prompt. I'm wondering, is anyone experimenting with fine-tuning Llama 2 to get it reply in this search-and-replace format? Excited to see open-source models catch up to GPT-4 on code editing.
Thanks for the suggestion! CLI tools can be simple to deploy and setup but are limited in the long run. Running the logic server side allows us to perform way more validation, such as GitHub Actions, linting, and pre-commit hooks. You can do this in your terminal, but it's not as easily parallelized and the process is not as easily persisted. There are definitely some tradeoffs to be made.
Also, the onboarding flow for GitHub Apps is significantly smoother and is a familiar interface for many people. We built a CLI-installed tool previously and there were a ton of versioning and environment-related problems etc, with 3 OS's to support. Unfortunately, it becomes harder to focus on building the best tool for our users if our focus is too broad.
To your last point, you can pry into our codebase and our blogs to see what we do under the hood; "Watch the magic happen" is just referring to the simplicity of our installation process. Also, we only store the logs for debugging and they only get persisted 30 days (which is also how long OpenAI stores them for). We gave a lot of thought to designing our search engine that doesn't store any code in plaintext. Check it out here: https://docs.sweep.dev/blogs/search-infra
This is a great overview of a RAG workflow, and also composing a complex task down into simple tasks that fit within the context window!
I'm really interested if there is any software or open source projects that make this type of thing easier- specifically the idea of creating multiple "phases" or "tasks" that each has their own LLM prompt and validation rules (like each node in the flowchart.) I think something like that could be very helpful!
There's a few tools out there like AgentGPT (https://github.com/reworkd/AgentGPT, although it's a more conversational interface), and (https://github.com/logspace-ai/langflow) and others. I think most developers definitely prefer a code-first interface though like a library but haven't found one that's great yet. We've used them in the past but didn't have the best experience so would love to hear if anyone has worked with a library they found really flexible.
As it is, the apparent reliance on github & github actions/apps, and the lack of openness where I can't run the LLM magic myself in an environment I control makes it A) impossible to use in a corporate environment and B) not entirely well aligned with my own FOSS values to use privately.
Edit: to expand, this is the "getting started" section.
I get that this is how we do software now: walled gardens, handing off your data to some opaque AI box, convenience of a nice integration above control over how and where code runs... but personally, that's not for me.