Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I just tried the following prompt:

> please write a rust library implementing a variant of simple8b integer compression augmented to use run-length encoding whenever it's beneficial to do so.

Initially I was sort of impressed, it quickly generated a program which looked like rust code, and provided an explanation that, while not as technically detailed as I'd hoped, seemed to be at least related to the topic.

Then I tried to compile the program. Turns out the bot didn't quite actually write rust, it had written something closely resembling rust though, and the compiler errors helped me fix it.

Then I tried to run the tests--yes! the bot even wrote tests, although it did so in a totally bone-headed way by writing multiple distinct tests in one test function--not good. Panic on integer overflow trying to left shift a value. There were also multiple pages of compiler warnings complaining about dead code, unused functions, enum variants, etc. I always fail on warnings.

This is not a lot of code. 190 lines including tests. At this point, given that I already have concerns about its correctness, I don't think there's anything I can really use here. I'm worried the deeper I dig the worse it'll get, so better to cut my losses now, sit down and read the simple8b paper, and implement this from first principles.

Every time I try to use one of these things it's the same story. I cannot understand the hype. I'm genuinely trying but I just can't understand it.



It feels exactly like me programming. The first pass resembles whatever I'm trying to do, and only after some struggle with compiling errors, squiggles from the LSP and some Google fu that I get something meaningful running.


Not unlike me! The difference is it's incremental. I write one function, then write a test, and get it working. Then, building on that stable foundation I write another function, more tests, etc. Crapping out an entire pile of garbage at once is not the way.

I guess I'm holding it wrong? Is there a better way I could phrase my query?


Your are prompting it to output the entire thing at once. If you want it to approach the problem incrementally, prompt it incrementally.


So should I be following up and asking it to refine its solution like this?

> The program you wrote doesn't compile. Please fix it such that it compiles.

Then, maybe, if we're lucky, we progress to the second step:

> Ok, now the program compiles but there are tons of warnings about dead code, unreachable code, blanket trait implementations which aren't actually used, etc. Could you please fix those?

Then assuming we clear that hurdle,

> Great! The program compiles without warnings, but when I run the tests it panics due to an integer overflow. I see in your encode_rle function you're inexplicably left-shifting a small unsigned integer by 60, which will absolutely for certain cause it to overflow and panic. Would you mind explaining why in the actual fuck you did this and please fix it? Kthx.

And on, and on... You know what? No. Fuck that shit. I refuse. I have absolutely no confidence this process will come up with a working, trustworthy implementation of the algorithm.


Not, the person you were replying to, but I think a better example of incrementally here would be

- write me a file with the function definitions for this problem. - compile that - write a test that test x outcome - compile that - then have it start writing functionality

If it's trying to one shot a complex problem that you would typically break up, your prompt is probably too vague.


"I'm genuinely trying but I just can't understand it." to "You know what? No. Fuck that shit. I refuse." in 3 hours - Are you genuinely trying? Or just don't like how it works? the hype is that lots of people are happy to work in the manner you outright refused. To be fair to you: if I could drive, I probably wouldn't take the bus either, fuck that shit. :)


I absolutely do not want to review multiple haphazardly written[1] attempts at the same computer program over and over again. That's a ridiculous way to spend time. So I'm not willing to try asking the bot over and over again to rewrite it. I'd rather write it myself and be confident in the result.

I think this only works for things where it just doesn't matter whether it actually works correctly, which to me seems synonymous with "problems that aren't worth working on".

[1] just look at this https://play.rust-lang.org/?version=stable&mode=debug&editio...


I gave your prompt to o1-preview and with one correction it did something that seems good to me (I am not a rust programmer, so please double check). :)

first attempt: https://onecompiler.com/rust/42w2duuqh

final result: https://onecompiler.com/rust/42w2e3jr4

PS: it "thought" about it for 2 x 60 seconds


This is looking better! L49 is giving me a little anxiety though, I'd really love to see some kind of justification for that decision. Compare e.g. to Lemire's[1].

EDIT: To be clear, I do understand that I'm making an unreasonable demand. I know the process that came up with this program has no ability to justify this or that "decision" (deliberately scare quoted because it doesn't actually have agency and can't decide at all). And that's the problem. That's why I find it very difficult to trust it.

[1] https://github.com/lemire/FastPFor/blob/52e45deeab9c3a481daa...


If you drop it down a level and ask for block level code or functions you'll find it works. At this point users still have to organize the output. But I'm getting the sense that latter task is something LLMs are going to get better at.


Was it a word choice issue on my part then? Like, this task should be achievable using two functions. Should I ask it to write the encode function and then ask it to write the corresponding decode function? Then finally in a third step ask it to write various test functions?


I recommend this for a more nuanced view:

https://nicholas.carlini.com/writing/2024/how-i-use-ai.html


Yes, I've read that and it seems like the use cases the author can really get behind are basically "fuzzy search" queries, not implementing things. I don't think I really have those needs? My entire adult life I've cultivated a "precise search" skillset (e.g. using google and (rip)grep) that continues to serve me well--and very quickly! So I'm not seeing the value there. I've tried those use cases too and it doesn't really add up either...


The biggest acceleration is for the mediocre coders like me - the one who knows 90% of the code but will spend 95% of the time (perhaps several hours) trying to get the data structure correct. These systems can the code almost all the way there and I can now spend that couple hours running tests rather than pounding my head against the wall realizing this is faster and easier to understand in a dictionary than the dumb tuple (round peg) I would have spent hours trying to jam through the square hole.


I think the problem you're describing might be a symptom of coding as the first step instead of the last. I find once I specify a problem and my proposed solution in sufficient detail, the structure of the code becomes obvious. This is best done with the various tools of human communication--visual diagrams, prose, mathematics, and algorithmic descriptions in the form of pseudocode. Only then, when I sufficiently understand what I'm actually trying to do, should I actually start writing code in a programming language. Otherwise I get pigeonholed into some half-baked idea by the various rigours of the language itself. Writing code before I truly understand what I'm trying to accomplish, I've learned over time, is an awfully costly form of premature optimization.

EDIT: I don't mean to suggest programming languages aren't tools of human communication--they absolutely are. In fact, that's their primary purpose--to communicate ideas about the structure of a computation to other programmers. But starting with structural ideas about the implementation rather than conceptual ones about the nature of the problem and the shape the solution should therefore take is putting the cart before the horse.


o1 found a new feature I hadn't noticed became available and replaced some nasty regex code I had been working on for hours with 1 library call.

4o had been happy to attempt to help me fix my function, o1 just went "well that's interesting, meatbag, but have you considered reading the manual?"


Did you feed it the simple8b paper along with your prompt?


No, that's an interesting idea though. Hard to imagine how that would help with the code correctness issues, though. I haven't even dug into algorithmic correctness yet so I have no real idea whether there's room for improvement there--although I sure do suspect there is!

EDIT: Oh my. After digging into the code I found this gem:

  fn encode_rle(&self, value: u64, count: usize) -> u64 {
      let selector = Simple8bSelector::RLE as u64;
      (selector << 60) | ((count as u64) << 30) | (value & 0x3FFFFFFF)
  }
And this one:

  fn try_rle(&self, input: &[u64]) -> Option<(u64, usize)> {
      if input.is_empty() {
          return None;
      }

      let value = input[0];
      let mut count = 1;

      for &x in input.iter().skip(1) {
          if x != value || count >= 0x3FFFFFFF {  // Max 30-bit run length
              break;
          }
          count += 1;
      }

      Some((value, count))
  }
What even is going on here? Compare to an actually sane implementation like[1] or[2].

[1]https://github.com/lemire/FastPFor/blob/master/headers/simpl... [2]https://github.com/timescale/timescaledb/blob/403782a5899c75...


Actually I was wrong about where the error was here, encode_rle actually works like it should the shift isn't the problem there. It actually blew up later in a different place. The second one is just a bizarre way to write that but sure, it counts the first N repeats in the input slice.. There's plenty of bizarre stuff in here[1], but mostly the general shape of the idea is directionally correct. A couple notably questionable things, though, like the assumption that RLE is always the way to go if the run length is greater than 8, perplexing style choices, etc.

[1] https://play.rust-lang.org/?version=stable&mode=debug&editio...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: