If this ran on google's own cloud it amounts to internal bookkeeping.
The only cost is then the electricity and used capacity. Not consumer pricing. So negligible.
It is rather unfortunate that this sort of paper is hard to reproduce.
That is a BIG downside, because it makes the result unreliable. They invested effort and money in getting an unreliable result. But perhaps other research will corroborate. Or it may give them an edge in their business, for a while.
They chose to publish. So they are interested in seeing it reproduced or improved upon.
Marketing of some sort. Either “come to Google and you’ll have access to H100s and freedom to publish and get to work with other people who publish good papers”, which appeals to the best researchers, or for smaller companies, benchmark pushing to help with brand awareness and securing VC funding.
Come be dishwashers in the fancy kitchen! You can only have one chef after all and the line cook positions are filled long ago too, but dishes don't wash themselves.
It's commonly discussed in AI/ML groups that a paper at a top conference is "worth a million dollars." Not all papers, some papers are worth more. But it is in effect discussing the downstream revenues. As a student, it is your job and potential earnings. As a lab it is worth funding and getting connected to big tech labs (which creates a feedback loop). And to corporations, it is worth far more than that in advertising.
The unfortunate part of this is that it can have odd effects like people renaming well known things to make the work appear more impressive, obscure concepts, and drive up their citations.[0] The incentives do not align to make your paper as clear and concise as possible to communicate your work.
> The only cost is then the electricity and used capacity. Not consumer pricing. So negligible.
I don’t think this is valid, as this point seems to ignore the fact that the data center that this compute took place in required a massive investment.
A paper like this is more akin to HEPP research. Nobody has the capability to reproduce the higgs results outside of at the facility the research was conducted within (CERN).
I don’t think reproduction was a concern of the researchers.
The Higgs results were reproduced because there are two independent detectors at CERN (Atlas and CMS). Both collaborations are run almost entirely independently, and the press are only called in to announce a scientific discovery if both find the same result.
Obviously the 'best' result would be to have a separate collider as well, but no one is going to fund a new collider just to reaffirm the result for a third time.
The point I was trying to make was the fact that nobody (meaning govt bodies) was willing to make another collider capable of repeating the results. At least not yet ;).
Kinda but Google sells compute so it makes money off the data centre investment, assuming they had spare capacity for this it's negligible at Google scale
Opportunity cost is cost. What you could have earned by selling the resources to customers instead of using them yourself is what the resources are worth.
This assumes that you can sell 100% of the resources' availability 100% of the time. Whenever you have more capacity that you can sell, there's no opportunity cost in using it yourself.
A few months back, a lot of the most powerful GPU instances on GCP seemed to be sold out 24/7.
I suppose it's possible Google's own infrastructure is partitioned from GCP infrastructure, so they have a bunch of idle GPUs even while their cloud division can sell every H100 and A100 they can get their hands on?
I'd expect they have both: dedicated machines that they usually use and are sometimes idle, but also the ability to run a job on GCP if it makes sense.
(I doubt it's the other way round, that the Deepmind researchers could come in one day and find all their GPUs are being used by some cloud customer).
As someone who worked for an compute time provider, I can tell you that the last people who can use the system for free are internal people. Because external people bring in cash revenue while internal people just bring in potential future revenue.
I think Google produces their own power, so they don’t pay distribution cost which is at least one third of the price of power, even higher for large customers.
I'd argue it's not hard to reproduce per se, just expensive; thankfully there are at least half a dozen (cloud) computing providers that have the necessary resources to do so. Google Cloud, AWS and Azure are the big competitors in the west (it seems / from my perspective), but don't underestimate the likes of Alibaba, IBM, DigitalOcean, Rackspace, Salesforce, Tencent, Oracle, Huawei, Dell and Cisco.
Is the electricity cost negligible? It's a pretty compute intensive application.
Of course it would be a tiny fraction of the $10m figure here, but even 1% would be $100,000. Negligible to Google, but for Google even $10 million is couch cushion money.
The electricity cost is not neglible- I ran a service that had multiples of $10M in marginal electricity spend (IE, servers running at 100% utilization, consuming a significantly higher fraction than when idle, or partly idle). Ultimately, the scientific discoveries weren't worth the cost, so we shut the service down.
$10M is about what Google would spend to get a publication in a top-tier journal. But google's internal pricing and costs don't look anything like what people cite for external costs; it's more like a state-supported economy with some extremely rich oligarch-run profit centers that feed all the various cottage industries.
I feel like your comment answers itself: If you have the money to be running a datacenter of thousands of A100 GPUs (or equivalent), the cost of the electricity is negligible to you, and definitely worth training a SOTA model with your spare compute.
Is it really spare compute? Is the demand from others so low that these systems are truly idle? Does this also artificially make it look like demand is high because internal tasks are using it?
We ran Folding@Home at google. we were effectively the largest single contributor of cycles for at least a year. It wasn't scientifically worthwhile, so we shut it down after a couple years.
That was using idle cycles on Intel CPUs, not GPUs or TPUs though.
It is rather unfortunate that this sort of paper is hard to reproduce.
That is a BIG downside, because it makes the result unreliable. They invested effort and money in getting an unreliable result. But perhaps other research will corroborate. Or it may give them an edge in their business, for a while.
They chose to publish. So they are interested in seeing it reproduced or improved upon.