I think it's a bit much to name this a bug. This should have very little to no effect on real-world code, and the only way to fix it for real is to actually parse the languages.
It should in fact have an effect on real-world code. It looks like sloc in general doesn't handle nested block comments, so any source file that uses nested block comments will be counted incorrectly by sloc.
The accuracy cost is trivial as well, it's a trade-off. I'm pretty sure the implementers were fully aware that regex can't properly handle every case and decided not to care.
..in source size. What I meant was while for a single source file, performance implications aren't noticeable. But usually when we run a tool like this, we happen to do it on an entire project, e.g. imagine running it on Kubernetes repo. If it were to parse entire syntax trees, the performance implications would be significant.
This is an IOCCC entry for the world's smallest quine, sizing up at 0 bytes of C. Some C compilers will compile an empty source file into a program which does nothing when executed, i.e. ouputs zero bytes, i.e. is a quine.
Oh I thought this was Jon Skeet's "hello world" in 1 byte, written in his custom "H" language. Of course he could have done it in 0 bytes but that would just be silly.
Jon Skeet would implement `sloc` in -1 bytes to begin with, and still fix the regex heuristics so that none of the posted solutions in that code golf works. ;)
There are quite a few esoteric programming languages used on code golf that use more or less the whole Unicode character set to solve even complicated puzzles in a few characters.
It's interesting the same way any other code golf is interesting: artifical constraints breeds creativity.
This sort of thing actually does have real-world applicability though: eg security issues caused by different parsing by a validation layer versus the actual execution.
I don't think it's quite so bad - seems more playful than manipulative. I briefly added scare quotes around "zero" but it made the title look unusually weird so I took them off.
No limit, since it needn't even fit on one (real) line, as even in languages with semantic newlines (and no alternative) you can just wrap every line in multiline comments.
The current winner seems to be HTML at 23 bytes, though I would argue it doesn't meet the requirements of the question, which call for a "full program".
The argument against is that HTML qua HTML is not a programming language, and that it is reasonable in English to say that "0 lines of code" implies the use of something that could have code in it, not something that degenerately contains no code because it is incapable of having any code at all. A text file with "Hello World" wouldn't be a solution either, nor the minimal CSV document with "Hello World", nor a PNG showing "Hello World", etc. These may also have "zero lines of code" in that strict, pedantic sense, but that makes them uninteresting.
The argument for including it even so is that if you think of the underlying challenge as "fooling sloc" and that the original challenge phrased it in terms of "code" simply because sloc mostly deals with code, but as it happens, it also tries to count HTML even though it isn't (necessarily) code, then it's reasonable to include.
I have to qualify HTML qua HTML because once you include Javascript, it goes from trivially not code to trivially being code, or at least capable of carrying code.
But it is important not to blur the lines between a document format and code. There are real and important differences. HTML qua HTML is not a programming language.
Edit: Oh, the solution to the other debate going on is that programming languages ALWAYS execute in the context of a virtual machine of some sort, or in the case of assembly, a real machine (at least in potential). The distinguishing characteristics of a programming language is something along the lines of being at least one of "Turing complete" and "able to read input, change behavior based on it, and write output", although I'm not getting too far down the weeds on that one on purpose. You can't use "whether or not they need a specialized execution environment to function" as a distinction for whether or not something is a programming language, because they all do.
The very straightforward counterargument is that sloc counts non-comment html lines as code-lines. The challenge is "trick sloc into considering something that it counts as code to not counting it as code".
Would have to be a terminal-based browser, since the rules do specify that the message and nothing else must be printed to stdout. Presumably 'links' can do that.
It is slightly disappointing that all the solutions I see for various languages are just different ways to trigger pretty much the same multi-line comment parsing problem in sloc.
It is a valid python program . This is because python will join strings that are adjacent strings (eg. "foo " "bar" is evaulated just like "foo bar") and while multiline strings are commonly used as comments they are evaluated as a string-literal rather than a comment.
You missed the point. All the examples obviously all have at least one line of code, otherwise there would be no program to display "Hello, World!" at all. The goal is to have a program that is _counted as 0 line by the software "sloc"_ even though it is not 0 line.
Did he or did you? Some arbitrary game of how you count lines of code is not interesting enough to make the front page.
Using an arbitrary metric that is not commensurate with reality is inaccurate. The point is that the title is clickbait for what amounts to a singularly exploited bug in a tool, not what was purported. Literally the same misinformation goes on in the political space all the time, where it is lamented. If it's related to quirky coding eventualities, it's lauded.
You can argue against the merits of this game or call this clickbait, and that's okay. But someone that blithely points out that there is in fact code, not even realizing the game exists, is missing the point.
There is a big difference between disagreeing with a point versus not realizing the point is there.
Which point? Are you talking about the point notinventedhear made? That point was not useful and the initial reply was not derailing and that wasn't me making that reply.
If you're talking about something else, I have no idea what you mean at all.
And nobody here has said anything in bad faith. Why do you think there was bad faith?
> But someone that blithely points out that there is in fact code, not even realizing the game exists, is missing the point.
notinventedhear didn't miss the point.
He disputed the topic (and implication) which is not "the game" but a description that is factually incorrect. You don't want to agree, that's fine. It's not subtle or complicated. To claim anyone misunderstands where this premise comes from, is a bad faith interpretation, which you inexplicably double down on.
The page says right at the very top that this is about tricking the line counter.
notinventedhear is not disputing the actual page when they say "There is still one line of code." Nor do they appear to be objecting to the title, because they're quoting part of the actual page and re-explaining it even though the page already does so.
This seems like pretty strong evidence to me that they missed the point; they missed the part about tricking.
It's not that I don't agree with what they said, it's that what they said was already thoroughly covered in the linked page, right up front and in more detail. If they understood that, then why did they even make a comment like that?
Most of the top solutions seem to take advantage of bugs in the way sloc tries to parse commented lines. e.g. many are:
And I'm surprised I don't see a related bug issue in the git repo: https://github.com/flosse/sloc/issuesStill this is a fun read. I'm curious to see what other tricks are out there.