> 99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1
It's definitely possible for AI to do a large fraction of your coding, and for it to contribute significantly to "improving itself". As an example, aider currently writes about 70% of the new code in each of its releases.
I automatically track and share this stat as graph [0] with aider's release notes.
Before Sonnet, most releases were less than 20% AI generated code. With Sonnet, that jumped to >50%. For the last few months, about 70% of the new code in each release is written by aider. The record is 82%.
Folks often ask which models I use to code aider, so I automatically publish those stats too [1]. I've been shifting more and more of my coding from Sonnet to DeepSeek V3 in recent weeks. I've been experimenting with R1, but the recent API outages have made that difficult.
Thank you so much for linking me to that! I think an `aider stats`-type command would be really cool (it would be cool to calculate stats based activity since the first aider commit or all-time commits of the repo).
Aider has a command to add files to the prompt. For files that are not added, it uses tree-sitter to extract a high-level summary. So for a `.env`, it will mention to the LLM the fact that the file exists, but not what is in it. If the model thinks it needs to see that file, it can request it, at which point you receive a prompt asking whether it's okay to make that file available.
> 99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1
you're assuming the PR will land:
> Small thing to note here, for this q6_K_q8_K, it is very difficult to get the correct result. To make it works, I asked deepseek to invent a new approach without giving it prior examples. That's why the structure of this function is different from the rest.
This certainly wouldn't fly in my org (even with test coverage/passes).
>> Small thing to note here, for this q6_K_q8_K, it is very difficult to get the correct result. To make it works, I asked deepseek to invent a new approach without giving it prior examples. That's why the structure of this function is different from the rest.
> This certainly wouldn't fly in my org (even with test coverage/passes).
To be fair, this seems expected. A distilled model might struggle more with aggressive quantization (like q6) since you're stacking two forms of quality loss: the distillation loss and the quantization loss. I think the answer would be to just use the higher cost full precision model.
To some extent, yes. I would not run production off of it, even if it can eek out performance gains on hardware at hand. I'd suggest vLLM or TGI or something similar instead.
I think the secret of DeepSeek is basically using RL to train a model that will generate high quality synthetic data. You then use the synthetic dataset to fine-tune a pretrained model and the result is just amazing: https://open.substack.com/pub/transitions/p/the-laymans-intr...
> It's definitely possible for AI to do a large fraction of your coding, and for it to contribute significantly to "improving itself". As an example, aider currently writes about 70% of the new code in each of its releases.
That number itself is not saying much.
Let's say I have an academic article written in Word (yeah, I hear some fields do it like that). I get feedback, change 5 sentences, save the file. Then 20k of the new file differ from the old file. But the change I did was only 30 words, so maybe 200 bytes. Does that mean that Word wrote 99% of that update? Hardly.
Or in C: I write a few functions in which my old-school IDE did the indentation and automatic insertion of closing curly braces. Would I say that the IDE wrote part of the code?
Of course the AI supplied code is more than my two examples, but claiming that some tool wrote 70% "of the code" suggests a linear utility of the code which is just not representing reality very well.
Every metric has limitations, but git blame line counts seem pretty uncontroversial.
Typical aider changes are not like autocompleting braces or reformatting code. You tell aider what to do in natural language, like a pair programmer. It then modifies one or more files to accomplish that task.
Here's a recent small aider commit, for flavor.
-# load these from aider/resources/model-settings.yml
-# use the proper packaging way to locate that file
-# ai!
+import importlib.resources
+
+# Load model settings from package resource
MODEL_SETTINGS = []
+with importlib.resources.open_text("aider.resources", "model-settings.yml") as f:
+ model_settings_list = yaml.safe_load(f)
+ for model_settings_dict in model_settings_list:
+ MODEL_SETTINGS.append(ModelSettings(**model_settings_dict))
Point is that not all lines are equal. The 30% that the tool didn't make are the hard stuff. Not just in line count. Once an approach or an architecture or a design are clear then implementing is merely manual labor. Progress is not linear.
You shouldn't judge your sw eng employees by lines of code either. Those that think the hard stuff often don't have that many lines of code checked in. But it's those people that are the key to your success.
"The stats are computed by doing something like git blame on the repo, and counting up who wrote all the new lines of code in each release. Only lines in source code files are counted, not documentation or prompt files."
I do not as I'm not in the ecosystem, but groq is openai compliant, so any tool that is openai compliant (99% are) and lets you put in your own baseurl should work.
For example, many tools will let you use local llms. Instead of putting in the url to the local llm, you would just plug in the groq url and key.
Continue.dev is available for Jetbrains, though the plugin is not as good as the VSCode counterpart. You can plug in any openai compatible API. Under experimental settings, you can also define an applyCode model (and others) which you could set to a faster, cheaper one (eg Sonnet).
aider looks amazing - I'm going to give it a try soon. Just had a question on API costs to see if i can afford it. Your FAQ says you used about 850k tokens for Claude, and their API pricing says output tokens are $15/MTok. Does that mean it cost you under $15 for your Claude 3.5 usage or am I totally off-base? (Sorry if this is has an obvious answer ... I don't know much about LLM API pricing.)
When I was mostly just using Sonnet I was spending ~$100/month on their API. That included some amount of bulk API use for benchmarking, not just my interactive AI coding.
If you're concerned about API costs, the experimental Gemini models with API keys from API studio tend to have very generous free quota. The quality of e.g. Flash 2.0 Experimental is definitely good enough to try out Aider and see if the workflow clicks. (For me, the quality has been good enough that I just stuck with it, and didn't get around to experimenting with any of the paid models yet.)
In case you are on a 32+GB Mac, you could try deepseek-r1-distill-qwen-32b-mlx in LM Studio. It’s just barely usable speed-wise, but gives useful results most of the time.
When a log line contains {main_model, weak_model, editor_model} does the existence of main_model mean that mean the person was using Aider in Architect/Editor mode?
Do you usually use that mode and, if so, with which architect?
It's definitely possible for AI to do a large fraction of your coding, and for it to contribute significantly to "improving itself". As an example, aider currently writes about 70% of the new code in each of its releases.
I automatically track and share this stat as graph [0] with aider's release notes.
Before Sonnet, most releases were less than 20% AI generated code. With Sonnet, that jumped to >50%. For the last few months, about 70% of the new code in each release is written by aider. The record is 82%.
Folks often ask which models I use to code aider, so I automatically publish those stats too [1]. I've been shifting more and more of my coding from Sonnet to DeepSeek V3 in recent weeks. I've been experimenting with R1, but the recent API outages have made that difficult.
[0] https://aider.chat/HISTORY.html
[1] https://aider.chat/docs/faq.html#what-llms-do-you-use-to-bui...