Hacker News new | past | comments | ask | show | jobs | submit login

Even for those people, there is still a ton of old text files in Windows-1252 etc floating around.

You can choose to never work on projects where you have to deal with files like that.

But there may come a day where you have to choose between not paying your rent or writing a tool that converts old textfiles to UTF-8. At that point, it's nice to have references on the internet on how other people have actually dealt with it and what works. "Abort with an error" is not very useful advice then.




Why would you write a tool that does that instead of just digging up one that's already written? This sounds like the folly of writing ones own encryption library.


Which library do you pick? Which ones are good? Bad?

What are the common pitfalls that the library must deal with?

How do you know how to evaluate them if all you know is "not UTF-8? Abort with error" ?


As far as implementing new tech goes, this sounds like about the easiest research project you could ask for. Grab a few and test em out. Not sure why you're making this out to be some kind of intractable problem.


If you look for comments by "jcranmer" elsewhere on this page you'll find examples of why it's not as simple as you make it sound.

Do you have any experience at all with different charsets under the hood by the way?


I do. I cut my IT teeth on an old ass system from the 80s in the 00's. I remember having problems feeding the reports into modern systems. Goofy problems with eol and eof and some other hiccups. It wasn't that bad.


Honestly if you can't review/read/figure that out without writing a library of your own, you probably shouldn't be writing a library of your own in the first place.


You can't pick and use a library like this without understanding the underlying concepts. That goes for both encryption and the charset conversion issue. It's not always just plug and play.

There are examples of where people used encryption libraries in the wrong way and undermined the strength of the encryption (for example, CVE-2024-31497 in PuTTY).

A very big part of software development is dealing with leaky abstractions. We don't work with perfect black boxes. We need to understand enough of how things works in the lower layers to avoid problems. Note here that I wrote "enough", not "everything", or "write everything yourself".

I would not want a person writing software to handle charset conversion if he refuses to learn how the various encodings work, which charset will decode as another charset or not, etc.


Your example seems to be due to, paraphrasing, lack of a library rather than inability to choose one.

"older approach, PuTTY developers said, was devised at a time when Microsoft Windows lacked native support for a cryptographic random number generator."

So "enough of how things work" could just be "pick a modern encryption library" that doesn't come from the dark ages when t there were no random numbers

Same with encodings, it requires a much lower level understanding to pick a library, you can rely on the expertise of others




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: