Hacker News new | past | comments | ask | show | jobs | submit login

The StackOverflow database was open source for a while, but I haven't checked for a data dump since the new management came in.



True, may not be open source, but it is still viewable.

Best for Society: Open Source <-- Where SO was

Medium for Society: Proprietary, but openly browsable <-- Where SO is

Worst for Society: Proprietary, not browsable <-- LLM-based code assist tools


This is just like the majority of communities moving to Discord over Forums. The barrier to entry might be a lot lower than getting a VPS and hosting phpBB or whatever, but the discoverability and searchabilty has gone towards 0. Everything is just moving into opaque black boxes.


Your comment sparked a thought -- maybe this is the "Dark Forest" of the Web:

Either your site's content stays hidden behind discord, or an LLM's bot/minion scrapes all your content and makes visiting your site superfluous, thereby effectively killing your site.


This is an interesting thought because there is probably a real dynamic like this with some kinds of content, but as an HN comment it is also somewhat self-refuting.


I never understood the move to Discord. Maybe I should host a PhpBB and bring some sanity back.


> phpBB

Whoa there Nelly!

If we're going to resurrect things, can we do it whilst leaving PHP in the past?


PHP hasn't failed me yet. Been using if for about 20 years. Not heavily but it gets the job done and it's still improving. It's really quite sane if you start comparing it to the alternatives. It continues to 'just work'


Less competition for me


...where would you classify Llama in here? That's not really "open source" despite what Facebook calls it, but I wouldn't call it proprietary, anyone can download and use the whole thing locally.


"public weights"?


Can those weights be interpreted by anyone viewing them? If not, it seems like publicly available, obfuscated code at best.


"model available" would be my preferred term.

Is Photoshop.exe "interpretable" by anybody with a copy (of windows)? How about a binary that's been heavily decompiled, like a Mario game?


Photoshop doesn't claim to be open source like llama does though, I'm not sure of the connection you're making.

Don't get me wrong, llama is at least more open than OpenAI and that may be meaningful.


The license aside, the question is what can be done with a carefully arranged blob of binary? Without additional software (Windows) I can't really do anything with Photoshop.exe. Similarly, Llama.gguf is either useful, with Ollama.app, or not, standing alone. So (looking past the difference in license), would you consider Photoshop.exe similar in that it's a binary blob that's useless by itself, or is it a useful collection of bytes, and why is/is not an ML model available on hugging faces the same?


The license used isn't important in my opinion, when talking about open source the question is whether the source code is available to be modified and reviewed/interpreted.

Photoshop, or any compiled binary, isn't meant to be open source and the code isn't meant to be reviewable. Llama is called open source, though the most important part isn't publicly available for review. If llama didn't claim to be open source I don't think it would matter that the model itself and all the weights aren't available.

If your argument is just that most software is shipped as compiled and/or obfuscated code, sure that's how it is usually done. That isn't considered open source though, and the line with LLMs seems to be very gray - it can be "open source" if the source code for the training logic is available even though the actual code/model being run can't be reviewed or interpreted.


The source data for the training needs to be public and freely licensed too, otherwise its IMO not an open source model.


Is that really necessary if the resulting model was actually available and comprehensible?

Personally I can't say I care as much about what the training set is, I want to know what's actually in the model and used at runtime/interpretation.


Yes, you can't know what kind of poisoning was done in the initial training data set, and you can't review the data, you can't review any human inputs, and you can't retrain from scratch. All those are things the model author can do, downstream folks/companies/governments should be able to do them too. Otherwise it isn't open source.


I think this discussion is silly in the context of a modern LLM. Nobody really understands how an LLM works, and you absolutely do not actually want to retrain Llama from scratch.

When I said "it's not really open source", I was referring to the fact that there are restrictions on who can use Llama.


Well that's a much deeper rabbit hole - we shouldn't be using such massive systems or throwing so many resources at them when no one even knows how they work.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: