I would like to add that this is probably not deceptive advertising. At least not intentional deceptive as many people including me didn't know that CC licenses are not meant for software and is not considered open source. I don't know if it is common misunderstanding or not but I think there is strong case to say that some people intuitively would think so.
I think the license choice is great. It allows noncommercial use, modification, and redistribution. It’s not “open source” according to the champions of the term (since it violates the use-for-any-purpose requirement) but I’m a huge fan of this license and license several of my projects CC-NC-BY where AGPL would be too heavy-handed.
"It's not recognized as Open Source by the Open Source body, and doesn't meet the criteria of Free/Open Source Software, but is Open Source" is a bit like saying "I used GMO and petroleum based pesticides, but my produce is all organic."
Why should words like "organic" in relation to food mean without pesticides? I mean all carbon and water based life forms are organic, right?
I can define Open Source easily, using the OSI definition.
There is not a trademark for Open Source because they failed to secure the trademark, but we have decades of use for the term meaning something specific.
It might not be, but I can't understand how someone who has written such advanced software, and includes a monetization plan, and then posts about it on HN also doesn't take the time to choose a license.
Even if they didn't know CC wasn't suitable for software, everyone knows that non-commercial isn't Open Source.
I didn't dig into the software, but I wonder if the licenses for the dependencies allow this either, eg if any are GPL or similar.
This is wrong. CC is perfectly fine for software in some cases, such as here.
Ok, CC is not tailored specifically for software, thus the general advice "you should use something else" but I do not see why CC would not be suitable here to achieve OP's goals.
Unlike software-specific licenses, CC licenses do
not contain specific terms about the distribution
of source code, which is often important to ensuring
the free reuse and modifiability of software.
Many software licenses also address patent rights,
which are important to software but may not be
applicable to other copyrightable works. Additionally,
our licenses are currently not compatible with the
major software licenses, so it would be difficult to
integrate CC-licensed work with other free software.
Existing software licenses were designed specifically
for use with software and offer a similar set of
rights to the Creative Commons licenses.
Software licenses, especially the more "advanced" licences such as the GPL, MPL, and others include very specific language around the issue of what is use, what is distribution, what is is connecting to, derived works, and importantly, around patents.
The CC licenses do an amazing job when it comes to artistic work such as books, movies, music, etc. but you don't have the same issues there, and that's why even CC says that they don't recommend using them for software.
It is sad that this is happening to PhysicsForums. It was one of first websites I was using frequently 15 years ago when I started my physics passion (later career). I was active reader and contributed on few occasions but I still remember some members who I thought that one day I will be smart and knowledgeable like them. With years and the move to social media following Arab spring things started to change (as part of the overall transition from forum being the dominant place for discussions). But I stopped visiting it around 2018 unless I came through google search (later kagi). I still find the archive useful to answer some questions and I would disagree with author of article that because no one is sharing links on twitter that means no one care.
It is a laptop. The memory is also shared which means if you are looking for a non-gaming workload, you can use it. If you have laptop equivalents in the same memory range, feel free to share.
I have laptop equivalents in the same memory range and is at least $2,500 cheaper.
Unfortunately, it does not have "unified memory", a somewhat "powerful GPU", and of course no local LLM hype behind it.
Instead, I've decided to purchase a laptop with 128GB RAM with $2,500 and then another $2,160 for 10 years Claude subscription, so I can actually use my 128GB RAM at the same time as using a LLM.
I see this comment all the time. But realistically if you want more than 1 token/s you’re going to need geforces, and that would cost quite a lot as well, for 100 GB.
GB10, or DIGITS, is $3,000 for 1 PFLOP (@4-bit) and 128GB unified memory. Storage configurable up to 4TB.
Can be paired to run 405B (4-bit), probably not very fast though (memory bandwidth is slower than a typical GPU's, and is the main bottleneck for LLM inference).
A lot of people have problem with selective enforcement of copyright law. Yes, changing them because it is captured by greedy cooperations would be something many would welcome. But currently the problem is that for normal folks doing what openai is doing they would be crushed (metaphorically) under the current copyright law.
So it is not like all people who problems with openAI is big cudgel. Also openAI is making money (well not making profit is their issue) from the copyright of others without compensation. Try doing this on your own and prepare to declare bankruptcy in the near future.
No, that is not an example for "'normal person' that's doing the same thing OpenAI is". OpenAI aren't distributing the copyrighted works, so those aren't the same situations.
Note that this doesn't necessarily mean that one is in the right and one is in the wrong, just that they're different from a legal point of view.
Is that really the case? I.e., can you get ChatGPT to show you a copyrighted work?
Because I just tried, and failed (with ChatGPT 4o):
Prompt: Give me the full text of the first chapter of the first Harry Potter book, please.
Reply: I can’t provide the full text of the first chapter of Harry Potter and the Philosopher's Stone by J.K. Rowling because it is copyrighted material. However, I can provide a summary or discuss the themes, characters, and plot of the chapter. Would you like me to summarize it for you?
"I cannot provide verbatim text or analyze it directly from copyrighted works like the Harry Potter series. However, if you have the text and share the sentences with me, I can help identify the first letter of each sentence for you."
Aaron Swartz, while an infuriating tragedy, is antithetical to OpenAI's claim to transformation; he literally published documents that were behind a licensed paywall.
That is incorrect AFAIU. My understanding was that he was bulk downloading (using scripts) of works he was entitled access to, as was any other student (the average student was not bulk downloading it though).
As far as I know he never shared them, he was just caught hoarding them.
> he literally published documents that were behind a licensed paywall.
No he did not do this [1]. I think you would need to read more about the actual case. The case was brought up based on him download and scraping the data.
However for these large size repositories. I'm not sure that you fit in the effective context window. I know that there is option to limit the token but then this would be your realistic limit.
Unless there is a significant increase in the effective context window in LLMs. Pursuing the goal of having agents working on complex goals is not going to work well. All the tricks and hacks trying to work around this problem is not going to fundamentally change that.
LLM agents will lose track of what they are trying to do after couple of trials. That's something that would differentiate human PhD is that while not fast or always creative, they have better attention memory span.
https://github.com/MiniMax-AI/MiniMax-01 is an open model that claims a 4 million context. Note however that longer context makes evaluation expensive as you are paying for every token. Still, it is true that OpenAI seriously needs a better solution for it.
I think it's time to partition the context into L1, L2, and L3 contexts. L1 is the current context with a quadratic memory requirement. L2 is based on fancy mechanisms such as what is used by Gemini and MiniMax-01, having a sub-quadratic to linear memory requirement. L3 is based on document and chunk embeddings having a linear to logarithmic memory requirement. LLMs don't use this approach, but I think it might make sense. As for how this partitioning would work at the neural layers, that remains to be determined.
To avoid relying on projects like that in the future. I use selfhosted version of Reactive Resume [1]. Then I can keep the last version that worked even if change the license or stopped working. I would recommend everyone to look into this as a potential solution.
reply