Even though the README.md calls the license the Databricks Open Source License, the LICENSE file includes paragraphs such as
> You will not use DBRX or DBRX Derivatives or any Output to improve any other
large language model (excluding DBRX or DBRX Derivatives).
and
> If, on the DBRX version release date, the monthly active users of the products
or services made available by or for Licensee, or Licensee’s affiliates, is
greater than 700 million monthly active users in the preceding calendar month,
you must request a license from Databricks, which we may grant to you in our
sole discretion, and you are not authorized to exercise any of the rights under
this Agreement unless or until Databricks otherwise expressly grants you such
rights.
This is a source-available model, not an open model.
> This is a source-available model, not an open model.
To me, "source available" implies that everything you need to reproduce the model is also available, and that doesn't appear to be the case. How is the resulting model more "free as in freedom" than a compiled binary?
I don't think it's possible to have an "open training data" model because it would get DMCA'd immediately and open you up to lawsuits from everyone who found their works in the training set.
I hope we can fix the legal landscape to enable publicly sharing training data but I can't really judge the companies keeping it a secret today.
I don't think it's that crazy, even if you're sure it's fair use I wouldn't paint a huge target on my back before there's a definite ruling and I doubly wouldn't test the waters of the legality of re-hosting copyrighted content to be downloaded by randos who won't be training models with it.
If they're going to get away with this collecting data and having a legal chain-of-custody so you can actually say it was only used to train models and no one else has access to it goes a long way.
1. Open source is a well-defined model and I reasonably expect Databricks to be aware of this due to their use of open source models in their other projects.
2. The stated licensing terms are clearly and decisively not open source.
3. It is reasonable to conclude that this model is dual licensed, under this restrictive proprietary license, and an undisclosed open source license.
4. Just use this Model under the open source license with the assumption that they will release the open source license later.
Sorry, I forgot to link the repository [1] and missed the edit window by the time I realized.
The bottom of the README.md [2] contains the following license grant with the misleading "Open Source" term:
> License
> Our model weights and code are licensed for both researchers and commercial entities. The Databricks Open Source License can be found at LICENSE, and our Acceptable Use Policy can be found here.
> You will not use DBRX or DBRX Derivatives or any Output to improve any other large language model (excluding DBRX or DBRX Derivatives).
and
> If, on the DBRX version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Databricks, which we may grant to you in our sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Databricks otherwise expressly grants you such rights.
This is a source-available model, not an open model.