Is there any real justification behind this fear of close nature of OpenAI or this is just frustration coming out?
We had this debate of closed Vs open source 20 years back and eventually opensource won it because of various reasons.
Won't those same reasons apply to this situation of close nature of OpenAI? If so then why are people worried about this? What is differnt this time?
Closed source and open source developers use the same $300-3,000 laptops / desktops. Everybody can afford them.
Training a large model in a reasonable time costs much more. According to
https://lambdalabs.com/blog/demystifying-gpt-3/ the cost of training GPT-3 was $4.6 million. Multiply it by the number of trial and errors.
Of course we can't expect that something that costs tens or hundreds of millions will be given away for free or to be able to rebuild it without some collective training effort that distributes the cost on at least thousands of volunteers.
OpenAI only trained the full sized GPT-3 once. Hyperparameter sweep was conducted on significantly smaller models (see: https://arxiv.org/abs/2001.08361)
This. Plus the increasing amount of intransparent results.
Training data is private, so it's impossible to even try to recreate results, validate methods, or find biases/failure cases.
This is dead right and very important. The data you train on is much more important than model architecture in terms of validation and compliance, and yet it’s a closely held secret. Producing or obtaining good data is a pain in the ass.
For this reason, EAI has made the data we are training on public. I can’t link to it because of anon policies at conferences, but if you look at our website I’m sure you can find a paper detailing it and a link to download it.