Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If this ran on google's own cloud it amounts to internal bookkeeping. The only cost is then the electricity and used capacity. Not consumer pricing. So negligible.

It is rather unfortunate that this sort of paper is hard to reproduce.

That is a BIG downside, because it makes the result unreliable. They invested effort and money in getting an unreliable result. But perhaps other research will corroborate. Or it may give them an edge in their business, for a while.

They chose to publish. So they are interested in seeing it reproduced or improved upon.



> They chose to publish. So they are interested in seeing it reproduced or improved upon.

Call me cynical, but this is not what I experienced to be the #1 reason of publishing AI papers.


I hope someone could share their insight on this comment. I think the other comments are fragile and don't hold too strongly.


Marketing of some sort. Either “come to Google and you’ll have access to H100s and freedom to publish and get to work with other people who publish good papers”, which appeals to the best researchers, or for smaller companies, benchmark pushing to help with brand awareness and securing VC funding.


Come be dishwashers in the fancy kitchen! You can only have one chef after all and the line cook positions are filled long ago too, but dishes don't wash themselves.


It's commonly discussed in AI/ML groups that a paper at a top conference is "worth a million dollars." Not all papers, some papers are worth more. But it is in effect discussing the downstream revenues. As a student, it is your job and potential earnings. As a lab it is worth funding and getting connected to big tech labs (which creates a feedback loop). And to corporations, it is worth far more than that in advertising.

The unfortunate part of this is that it can have odd effects like people renaming well known things to make the work appear more impressive, obscure concepts, and drive up their citations.[0] The incentives do not align to make your paper as clear and concise as possible to communicate your work.

[0] https://youtu.be/Pl8BET_K1mc?t=2510


As someone not in the AI space, what do you think is the reason for publishing? Marketing and hype for your products?


Retaining your researchers so they don't get frustrated and move to another company that lets them publish.


and attracting other researchers so your competitors can't pick them up to potentially harm your own business


> The only cost is then the electricity and used capacity. Not consumer pricing. So negligible.

I don’t think this is valid, as this point seems to ignore the fact that the data center that this compute took place in required a massive investment.

A paper like this is more akin to HEPP research. Nobody has the capability to reproduce the higgs results outside of at the facility the research was conducted within (CERN).

I don’t think reproduction was a concern of the researchers.


The Higgs results were reproduced because there are two independent detectors at CERN (Atlas and CMS). Both collaborations are run almost entirely independently, and the press are only called in to announce a scientific discovery if both find the same result.

Obviously the 'best' result would be to have a separate collider as well, but no one is going to fund a new collider just to reaffirm the result for a third time.


Absolutely, and well stated.

The point I was trying to make was the fact that nobody (meaning govt bodies) was willing to make another collider capable of repeating the results. At least not yet ;).


Kinda but Google sells compute so it makes money off the data centre investment, assuming they had spare capacity for this it's negligible at Google scale


Opportunity cost is cost. What you could have earned by selling the resources to customers instead of using them yourself is what the resources are worth.


This assumes that you can sell 100% of the resources' availability 100% of the time. Whenever you have more capacity that you can sell, there's no opportunity cost in using it yourself.


A few months back, a lot of the most powerful GPU instances on GCP seemed to be sold out 24/7.

I suppose it's possible Google's own infrastructure is partitioned from GCP infrastructure, so they have a bunch of idle GPUs even while their cloud division can sell every H100 and A100 they can get their hands on?


I'd expect they have both: dedicated machines that they usually use and are sometimes idle, but also the ability to run a job on GCP if it makes sense.

(I doubt it's the other way round, that the Deepmind researchers could come in one day and find all their GPUs are being used by some cloud customer).


As someone who worked for an compute time provider, I can tell you that the last people who can use the system for free are internal people. Because external people bring in cash revenue while internal people just bring in potential future revenue.


Not if you’re only using the resources when they’re available because no customer has paid to use them.


I think Google produces their own power, so they don’t pay distribution cost which is at least one third of the price of power, even higher for large customers.


I'd argue it's not hard to reproduce per se, just expensive; thankfully there are at least half a dozen (cloud) computing providers that have the necessary resources to do so. Google Cloud, AWS and Azure are the big competitors in the west (it seems / from my perspective), but don't underestimate the likes of Alibaba, IBM, DigitalOcean, Rackspace, Salesforce, Tencent, Oracle, Huawei, Dell and Cisco.


> They chose to publish. So they are interested in seeing it reproduced or improved upon.

Not necessarily, publishing also ensure that the stuff is no longer patentable.


Forgive me if I am wrong, but all of the techniques explored are already well known. So, what is going to be patented?


the fundamental algorithms have been, sure, but there are innumerable enhancements upon those base techniques to be found and patented.


I merely listed another reason why someone would publish something. This did not imply they did if for that reason.


Is the electricity cost negligible? It's a pretty compute intensive application.

Of course it would be a tiny fraction of the $10m figure here, but even 1% would be $100,000. Negligible to Google, but for Google even $10 million is couch cushion money.


The electricity cost is not neglible- I ran a service that had multiples of $10M in marginal electricity spend (IE, servers running at 100% utilization, consuming a significantly higher fraction than when idle, or partly idle). Ultimately, the scientific discoveries weren't worth the cost, so we shut the service down.

$10M is about what Google would spend to get a publication in a top-tier journal. But google's internal pricing and costs don't look anything like what people cite for external costs; it's more like a state-supported economy with some extremely rich oligarch-run profit centers that feed all the various cottage industries.


I feel like your comment answers itself: If you have the money to be running a datacenter of thousands of A100 GPUs (or equivalent), the cost of the electricity is negligible to you, and definitely worth training a SOTA model with your spare compute.


Is it really spare compute? Is the demand from others so low that these systems are truly idle? Does this also artificially make it look like demand is high because internal tasks are using it?


I’d imagine publishing is more oriented toward attracting and retaining talent. You need to scratch that itch or the academics will jump ship.


Its like them running SETI@home ;)


We ran Folding@Home at google. we were effectively the largest single contributor of cycles for at least a year. It wasn't scientifically worthwhile, so we shut it down after a couple years.

That was using idle cycles on Intel CPUs, not GPUs or TPUs though.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: