Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even though the README.md calls the license the Databricks Open Source License, the LICENSE file includes paragraphs such as

> You will not use DBRX or DBRX Derivatives or any Output to improve any other large language model (excluding DBRX or DBRX Derivatives).

and

> If, on the DBRX version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Databricks, which we may grant to you in our sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Databricks otherwise expressly grants you such rights.

This is a source-available model, not an open model.



> This is a source-available model, not an open model.

To me, "source available" implies that everything you need to reproduce the model is also available, and that doesn't appear to be the case. How is the resulting model more "free as in freedom" than a compiled binary?


I like:

- “open weights” for no training data and no restrictions on use,

- “weights available” for no training data and restrictions on use, like in this case.


I don't think it's possible to have an "open training data" model because it would get DMCA'd immediately and open you up to lawsuits from everyone who found their works in the training set.

I hope we can fix the legal landscape to enable publicly sharing training data but I can't really judge the companies keeping it a secret today.


> I don't think it's possible to have an "open training data" model because it would get DMCA'd immediately…

This isn't a problem because OpenAI says, "training AI models using publicly available internet materials is fair use". /s

https://openai.com/blog/openai-and-journalism


I don't think it's that crazy, even if you're sure it's fair use I wouldn't paint a huge target on my back before there's a definite ruling and I doubly wouldn't test the waters of the legality of re-hosting copyrighted content to be downloaded by randos who won't be training models with it.

If they're going to get away with this collecting data and having a legal chain-of-custody so you can actually say it was only used to train models and no one else has access to it goes a long way.


1. Open source is a well-defined model and I reasonably expect Databricks to be aware of this due to their use of open source models in their other projects.

2. The stated licensing terms are clearly and decisively not open source.

3. It is reasonable to conclude that this model is dual licensed, under this restrictive proprietary license, and an undisclosed open source license.

4. Just use this Model under the open source license with the assumption that they will release the open source license later.

I jest. In all seriousness, you should just disregard their licensing terms entirely as copyright does not apply to weight. https://news.ycombinator.com/item?id=39847147


Sorry, I forgot to link the repository [1] and missed the edit window by the time I realized.

The bottom of the README.md [2] contains the following license grant with the misleading "Open Source" term:

> License

> Our model weights and code are licensed for both researchers and commercial entities. The Databricks Open Source License can be found at LICENSE, and our Acceptable Use Policy can be found here.

[1] https://github.com/databricks/dbrx

[2] https://github.com/databricks/dbrx/blob/main/README.md


The first clause sucks, but I’m perfectly happy with the second one.


Maybe the license is “open” as in a can of beer, not OSS.


identical to llama fwiw




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: