Hacker News new | past | comments | ask | show | jobs | submit login

> 99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1

It's definitely possible for AI to do a large fraction of your coding, and for it to contribute significantly to "improving itself". As an example, aider currently writes about 70% of the new code in each of its releases.

I automatically track and share this stat as graph [0] with aider's release notes.

Before Sonnet, most releases were less than 20% AI generated code. With Sonnet, that jumped to >50%. For the last few months, about 70% of the new code in each release is written by aider. The record is 82%.

Folks often ask which models I use to code aider, so I automatically publish those stats too [1]. I've been shifting more and more of my coding from Sonnet to DeepSeek V3 in recent weeks. I've been experimenting with R1, but the recent API outages have made that difficult.

[0] https://aider.chat/HISTORY.html

[1] https://aider.chat/docs/faq.html#what-llms-do-you-use-to-bui...






First off I want to thank you for Aider. I’ve had so much fun playing with it and using it for real work. It’s an amazing tool.

How do you determine how much was written by you vs the LLM? I assume it consists of parsing the git log and getting LoC from that or similar?

If the scripts are public could you point me at them? I’d love to run it on a recent project I did using aider.


Glad to hear you’re finding aider useful!

There’s a faq entry about how these stats are computed [0]. Basically using git blame, since aider is tightly integrated with git.

The faq links to the script that computes the stats. It’s not designed to be used on any repo, but you (or aider) could adapt it.

You’re not the first to ask for these stats about your own repo, so I may generalize it at some point.

[0] https://aider.chat/docs/faq.html#how-are-the-aider-wrote-xx-...


Thank you so much for linking me to that! I think an `aider stats`-type command would be really cool (it would be cool to calculate stats based activity since the first aider commit or all-time commits of the repo).

Slightly longer than `aider stats` but here you go:

  uv run --with=semver,PyYAML,tqdm https://raw.githubusercontent.com/Aider-AI/aider/refs/heads/main/scripts/blame.py

does this mean lines/diffs otherwise untouched are considered written by Aider?

If a small change is made by an end-user to adjust an Aider result, who gets "credit"?


It works like normal git blame -- it literally uses git blame.

Whoever changed a line last gets credit. Only the new or newly changed lines in each release are considered.

So no, "lines/diffs otherwise untouched" are NOT considered written by aider. That wouldn't make sense?


Maybe this is answered, but I didn't see it. How does aider deal with secrets in a git repo? Like if I have passwords in a `.env`?

Edit: I think I see. It only adds files you specify.


Aider has a command to add files to the prompt. For files that are not added, it uses tree-sitter to extract a high-level summary. So for a `.env`, it will mention to the LLM the fact that the file exists, but not what is in it. If the model thinks it needs to see that file, it can request it, at which point you receive a prompt asking whether it's okay to make that file available.

It's a very slick workflow.


You can use an .aiderignore file to ensure aider doesn't use certain files/dirs/etc. It conforms to the .gitignore spec.

> 99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1

you're assuming the PR will land:

> Small thing to note here, for this q6_K_q8_K, it is very difficult to get the correct result. To make it works, I asked deepseek to invent a new approach without giving it prior examples. That's why the structure of this function is different from the rest.

This certainly wouldn't fly in my org (even with test coverage/passes).


>> Small thing to note here, for this q6_K_q8_K, it is very difficult to get the correct result. To make it works, I asked deepseek to invent a new approach without giving it prior examples. That's why the structure of this function is different from the rest.

> This certainly wouldn't fly in my org (even with test coverage/passes).

To be fair, this seems expected. A distilled model might struggle more with aggressive quantization (like q6) since you're stacking two forms of quality loss: the distillation loss and the quantization loss. I think the answer would be to just use the higher cost full precision model.


llama.cpp optimises for hackability, not necessarily maintainability or cleanliness. You can look around the repository to get a feel for what I mean.

i guess that means no one should use it for anything serious? good to know

To some extent, yes. I would not run production off of it, even if it can eek out performance gains on hardware at hand. I'd suggest vLLM or TGI or something similar instead.

I think the secret of DeepSeek is basically using RL to train a model that will generate high quality synthetic data. You then use the synthetic dataset to fine-tune a pretrained model and the result is just amazing: https://open.substack.com/pub/transitions/p/the-laymans-intr...

> It's definitely possible for AI to do a large fraction of your coding, and for it to contribute significantly to "improving itself". As an example, aider currently writes about 70% of the new code in each of its releases.

That number itself is not saying much.

Let's say I have an academic article written in Word (yeah, I hear some fields do it like that). I get feedback, change 5 sentences, save the file. Then 20k of the new file differ from the old file. But the change I did was only 30 words, so maybe 200 bytes. Does that mean that Word wrote 99% of that update? Hardly.

Or in C: I write a few functions in which my old-school IDE did the indentation and automatic insertion of closing curly braces. Would I say that the IDE wrote part of the code?

Of course the AI supplied code is more than my two examples, but claiming that some tool wrote 70% "of the code" suggests a linear utility of the code which is just not representing reality very well.


Every metric has limitations, but git blame line counts seem pretty uncontroversial.

Typical aider changes are not like autocompleting braces or reformatting code. You tell aider what to do in natural language, like a pair programmer. It then modifies one or more files to accomplish that task.

Here's a recent small aider commit, for flavor.

  -# load these from aider/resources/model-settings.yml
  -# use the proper packaging way to locate that file
  -# ai!
  +import importlib.resources
  +
  +# Load model settings from package resource
  MODEL_SETTINGS = []
  +with importlib.resources.open_text("aider.resources", "model-settings.yml") as f:
  +    model_settings_list = yaml.safe_load(f)
  +    for model_settings_dict in model_settings_list:
  +        MODEL_SETTINGS.append(ModelSettings(**model_settings_dict))
  
https://github.com/Aider-AI/aider/commit/5095a9e1c3f82303f0b...

Point is that not all lines are equal. The 30% that the tool didn't make are the hard stuff. Not just in line count. Once an approach or an architecture or a design are clear then implementing is merely manual labor. Progress is not linear.

You shouldn't judge your sw eng employees by lines of code either. Those that think the hard stuff often don't have that many lines of code checked in. But it's those people that are the key to your success.


That's pretty reaching though if you're comparing an AI to a formatter. Presumably 70% of a new Aider release isn't formatting.

"The stats are computed by doing something like git blame on the repo, and counting up who wrote all the new lines of code in each release. Only lines in source code files are counted, not documentation or prompt files."

R1 is available on both together.ai and fireworks.ai, it should be a drop in replacement using the OpenAI API.

The problem is it's very expensive. More expensive than Claude.

You can use the distilled version on Groq for free for the time being. Groq is amazing but frequently has capacity issues or other random bugs.

Perhaps you could set up Groq as your primary and then fail back to fireworks, etc by using litellm or another proxy.


Do you know any assistants for jetbrains that can plug into groq+deepseek?

I do not as I'm not in the ecosystem, but groq is openai compliant, so any tool that is openai compliant (99% are) and lets you put in your own baseurl should work.

For example, many tools will let you use local llms. Instead of putting in the url to the local llm, you would just plug in the groq url and key.

see: https://console.groq.com/docs/openai


Continue.dev is available for Jetbrains, though the plugin is not as good as the VSCode counterpart. You can plug in any openai compatible API. Under experimental settings, you can also define an applyCode model (and others) which you could set to a faster, cheaper one (eg Sonnet).

Run your deepseek R1 model on your own hardware.

Only various distillations are available for most people’s hardware, and they’re quite obviously not as good as actual R1 in my testing.

"$6,000 computer to run Deepseek R1 670B Q8 locally at 6-8 tokens/sec"

https://reddit.com/r/LocalLLaMA/comments/1ic8cjf/6000_comput...


> I've been shifting more and more of my coding from Sonnet to DeepSeek V3 in recent weeks.

For what purpose, considering Sonnet 3.5 still outperforms V3 on your own benchmarks (which also tracks with my personal experience comparing them)?


That's amazing data. How representative do you think your Aider data is of all coding done?

aider looks amazing - I'm going to give it a try soon. Just had a question on API costs to see if i can afford it. Your FAQ says you used about 850k tokens for Claude, and their API pricing says output tokens are $15/MTok. Does that mean it cost you under $15 for your Claude 3.5 usage or am I totally off-base? (Sorry if this is has an obvious answer ... I don't know much about LLM API pricing.)

I built a calculator for that here: https://tools.simonwillison.net/llm-prices

It says that for 850,000 Claude 3.5 output tokens the cost would be $12.75.

But... it's not 100% clear from me if the Aider FAQ numbers are for input or output tokens.


It's "total" tokens, input plus output. I'd guess more than two-thirds of them are input tokens.

If we guess 500,000 for input and 350,000 for output that's a grand total of $6.75. This stuff is so cheap these days!

When I was mostly just using Sonnet I was spending ~$100/month on their API. That included some amount of bulk API use for benchmarking, not just my interactive AI coding.

If you're concerned about API costs, the experimental Gemini models with API keys from API studio tend to have very generous free quota. The quality of e.g. Flash 2.0 Experimental is definitely good enough to try out Aider and see if the workflow clicks. (For me, the quality has been good enough that I just stuck with it, and didn't get around to experimenting with any of the paid models yet.)

> As an example, aider currently writes about 70% of the new code in each of its releases.

Yeah but part of that is because it's physically impossible to stop it making random edits for the sake of it.


Love aider, thank you for your work! Out of curiousity, what are your future plans and ideas for aider in terms of features and workflow?

Hello...

Is it possible to use aider with a local model running in LMStudio (or ollama)?

From a quick glance i did not see an obvious way to do that...

Hopefully i am totally wrong!



Thanks for your interest in aider.

Yes, absolutely you can work with local models. Here are the docs for working with lmstudio and ollama:

https://aider.chat/docs/llms/lm-studio.html

https://aider.chat/docs/llms/ollama.html


Yes absolutely

In the left bar there's a "connecting to LLMs" section

Check out ollama as an example


Yes and is easy

yeah:

    aider --model ollama_chat/deepseek-r1:32b
(or whatever)

This didn't work well for me, no changes are ever made but maybe it's because I'm just using the 14B model.

In case you are on a 32+GB Mac, you could try deepseek-r1-distill-qwen-32b-mlx in LM Studio. It’s just barely usable speed-wise, but gives useful results most of the time.

When a log line contains {main_model, weak_model, editor_model} does the existence of main_model mean that mean the person was using Aider in Architect/Editor mode?

Do you usually use that mode and, if so, with which architect?

Thank you!


Can you make a plot like HISTORY but with axis changed? X: date Y: work leverage (i.e. 50%=2x, 90%=10x, 95%=20x, leverage = 1/(1-pct) )

Could you share how you track AI vs human LoC?

That's covered here, including a link to the script: https://aider.chat/docs/faq.html#how-are-the-aider-wrote-xx-...



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: