> That's the main reason why CSV absolutely sucks too [...]
Is it? I think you're absolutely right that naive points of view like the one you're responding to will lead to avoidable bugs, but I'm not so sure the problem is CSV so much as people who assume CSV is simple enough to parse or generate without using a library.
> I'm not so sure the problem is CSV so much as people who assume CSV is simple enough to parse or generate without using a library.
The simplicity of CSV is what tells people that they can parse and generate it without a library, and even more so that that's a feature of CSV. You just had to read the comments disagreeing with me to see exactly that.
> It has nothing to do with what the model is being used for.
I may be misunderstanding this passage of the article, but I thought the author was claiming that machine learning (specifically training) was equivalent to compression, while language understanding is equivalent to decompression. Therefore, they can't be the same thing. Why does language understand have to be analogous to training an ML model rather than using an ML model for inference?
> Why does language understand have to be analogous to training an ML model rather than using an ML model for inference?
Why would you look at ML model inferences in particular? There is no compression or decompression going on during inferences, you're just running data through the existing weights.
Creating an ML model on the other hand is lossy compression. You reduce the size of the data (Training set -> model) in exchange for reduced accuracy (100% -> 90-95% or whatever).
NLU is decompression because you are extracting information that doesn't exist in the text.
I see ML as ahead-of-time compression (Creating a model), whereas NLU is just-in-time decompression (Extracting information from current context). Looking specifically at inference-time doesn't make sense to me because all the work for ML is done during training, not inference.
I was hoping there would be discussion of this point higher up in the thread, because I had essentially the same reaction as you while reading this passage. I'm no expert of machine learning, NLP, or linguistics, but this struck me as a pretty obvious flaw in the author's argument.
You've gotten a couple responses making arguments that the atoms required for a language like lisp are fundamentally simpler than eval in Perl and that makes the difference... Fwiw, I don't find these arguments very compelling. It's not so much that they're entirely false, but the "most beautiful program ever written [sic]" that we're talking about here is "beautiful" precisely because it elides all these "simple" details. I do get the vague sense that it is essentially a more verbose `eval $1`.
I agree. You could probably say though, that once you decide to implement anything that's not `eval $1` you jump to the next level that forces you to parse the input and then use your built-in primitives to implement themselves. I think any minimal interpreter for any dynamic language will probably end up being pretty simple, though not necessarily as elegant as Lisp's. But elegance aside, there's nothing groundbreaking here.
It's one thing if you have a source of revenue so justify your S3 costs. My interpretation of the parent commentator's concern is that this person has opened up a multi gigabyte S3 file publicly and sent lots of traffic its way (via hacker news) for what appears to be a passion project.
This isn't about throwing away tools for some idealized goal. It is about using the tools that are available to achieve best results without making you reliant on the tools to the point you don't know what your program is going to do without compiling and running.
IDE helps catch a lot of stupid simple mistakes and that helps save time. Why would that be bad?
I don't think using an IDE to catch lots of stupid simple mistakes is bad. It's how I prefer to work.
> It looks really strange to me to observe other developers constantly compiling and running their code just to see if it works. It kinda looks as if they did not exactly understand what they are doing because if they did, they would be confident the implementation works.
Explain to me how this statement doesn't apply to your use of an IDE, but the other engineers you've observed don't understand what they're doing.
It's legitimately surprising that you would double down here instead of realize that your tooling is recompiling your code and showing you the result continuously, making your workflow essentially the same as the people you seem to feel so superior to.
They did read that sentence with comprehension. It is you who can't connect the lines. Your IDE already does typechecking and finds other issues for you. Basically the only thing that you are missing is the ability to run and test your program.
One can't help but wonder what kind of a person would tweet their "apology" for unintentionally defacing the entirety of Scots Wikipedia in more broken Scots. Seems... Insensitive, no?
`mem::transmute` is roughly equivalent to a `reinterpret_cast` in C++. It treats the bits of a u8 as an i8.
In the Rust definition of safety (mutable xor shared, no data races, memory safety, etc), treating the bits of a u8 as an i8 is safe and can be done with an `as` cast.
Is it? I think you're absolutely right that naive points of view like the one you're responding to will lead to avoidable bugs, but I'm not so sure the problem is CSV so much as people who assume CSV is simple enough to parse or generate without using a library.