Hacker Newsnew | past | comments | ask | show | jobs | submit | newswasboring's commentslogin

>there is a bit of non-determinism in batched non-associative math that can vary by batch / hardware

Maybe a dumb question but does this mean model quality may vary based on which hardware your request gets routed to?


It doesn't have to be this messy. If I were the maker I would treat this as a good first version and transfer the ownership to the business slowly. This is just like working with any consultant.

The business is run by his wife, and if they had a SWE(-like) already, that person would’ve made this. But instead, the husband did and now owns it. He also open sourced it, so he has to live with the inevitable consequences of that too.

Isn't this proposal closely matching with the approach OpenSpec is taking? (Possibly other SDD tool kits, I'm just familiar with this one). I spend way more time in making my spec artifacts (proposal, design, spec, tasks) than I do in code review. During generation of each of these artifacts the code is referenced and surfaces at least some of the issues which are purely architecture based.

"For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem [275b] to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise."

- Socrates on Writing.


> we have decided that journals should not be the arbiters of quality.

At that point why even have a journal, let's just put everything as a Reddit post and be done with it. We will get comment abilities for free.

Maintaining quality standards is a good service, the journal system isn't perfect but its the only real check we have left.


> At that point why even have a journal

Great question.

> the journal system isn't perfect but its the only real check we have left.

I wish I could agree but Nature et al continually publish bad, attention-grabbing science, while holding back the good science because it threatens the research programmes that gave the editorial board successful careers.

"Isn't perfect" is a massive understatement.


My favorite form is when someone shouts "concurrency" in the middle of the sentence.


Isn't data entry a really good usecase for the LLM technologies? Of course depending on the exact usecase. But most "data entry" jobs are data transformation jobs and they get automated using ML techniques all the time. Current LLMs are really good at data transformation too.


No because they aren't reliable. You don't want to be storing hallucinated data. They can help write the scripts that do the actual work though.


We can't even use AI language translation because of compliance / liability - we translate food ingredients.

"It says 'no shellfish', go ahead - eat it"

Even with lots context the various services we tried would get something wrong.

e.g. huile is oil in French and sometimes it would get translated as "motor oil"


No data replication or transformation is not a good use-case for text generators.


If your core feature is data entry, you probably want to get as close to 100% accuracy as possible.

"AI" (LLM-based automation) is only useful if you don't really care about the accuracy of the output. It usually gets most of the data transformations mostly right, enough for people to blindly copy/paste its output, but sometimes it goes off the rails. But hey, when it does, at least it'll apologise for its own failings.


Ah yes, because hallucinations will definitely improve our data entry!


I grew up on the borland Turbo series. Learned C then C++ on it. Such nostalgia.

I was wondering, is there a way to get VS code to look like this? Maybe neoVim?


What difference does it make how many people use it? Complex software exists all over the world for handful of users. I personally work in an industry where anything we create will be used by at max 100 people worldwide. Does it diminish the complexity of code? I think not.


> #3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

Heh, I write this for some production code too (python). I guess because python is not typed, I'm testing if my pydantic implementation works.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: