I think a better description than “underutilized” would be “sunk capex cost” - Google (or any cloud provider) cannot run at 100% customer utilization because then they could neither acquire new customers nor service transitory usage spikes for existing customers. So they stay ahead of predicted demand, which means that they will almost always have excess capacity available.
Cloud providers pay capital costs (CapEx) for servers, GPUs, data centers, employees, etc. Utilization allows them to recoup those costs faster.
Cloud customers pay operational expenses (OpEx) for usage.
So Google generally has excess capacity, and while they would prefer revenue-generating customer usage, they’ve already paid for everything but the electricity, so it’s extremely cheap for them to run their own jobs if the hardware would otherwise be sitting idle.
As you run close to 100% utilization, you also run close to infinity waiting times. You don't want that. It might be acceptable for your internal projects (the actual waiting time won't be infinity, and you'll cancel them if it gets too close to infinity) but it's certainly not acceptable for customers.
There is a genre of game called "time management games" which will hammer this point home if you play them. They're not really considered 'serious' games, so you can find them in places where the audience is basically looking to kill time.
3. The way a task gets done is, you click on it, and the next time a worker is available, the worker will start on that task, which occupies the worker for some fixed amount of time until the task is complete.
4. Some tasks can't be queued until you meet a requirement such as completing a predecessor task or having enough resources to pay the costs of the task.
You will learn immediately that having a long queue means flailing helplessly while your workers ignore hair-on-fire urgent tasks in favor of completely unimportant ones that you clicked on while everything seemed relaxed. It's far more important that you have the ability to respond to a change in circumstances than to have all of your workers occupied at all times.
> You will learn immediately that having a long queue means flailing helplessly while your workers ignore hair-on-fire urgent tasks in favor of completely unimportant ones that you clicked on while everything seemed relaxed.
In practice it's more complicated than this- borg isn't actually a queue, it's a priority-based system with preemption, although people layered queue systems on top. Further, granularity mattered a lot- you could get much more access to compute by asking for smaller slices (fractions of a CPU core, or fraction of a whole TPU cluster). There was a lot of "empty crack filling" at google.
TL/DR: You should think of and use queues like shock absorbers, not sinks. Also you need to monitor them.
Queues are useful to decouple the output of one process to the input of another process, when the processes are not synchronized velocity-wise. Like a shock absorber, they allow both processes to continue at their own paces, and the queue absorbs instantaneous spikes in producer load above the steady state rate of the consumer (side note: if queues are isolated code- and storage-wise from the consumer process, then you can use the queue to prevent disruption in the producer process when you need to take the consumer down for maintenance or whatever).
Running with very small queue lengths is generally fine and generally healthy.
If you have a process that consistently runs with substantial queue lengths, then you have a mismatch between the workloads of the processes they connect - you either need to reduce the load from the producer or increase the throughput of the consumer of the queue.
Very large queues tend to hide the workload mismatch problem, or worse. Often work put into queues is not stored locally on the producer, or is quickly overwritten. So a consumer end problem can result in potential irrevocable loss of everything in the queue, and the larger the queue, the bigger the loss. Another problem with large queues is that if your consumer process is only slightly faster than the producer process, then a large backlog of work in the queue can take a long time to work down, and it's even possible (admission of guilt) to configure systems using such queues such that they cannot recover from a lengthy outage, even if all the work items were stored in the queue.
If you have queues, you need to monitor your queue lengths and alarm when queue lengths start increasing significantly above baseline.
I doubt they are doing this, but if they did burn in tests with 3 machines doing identical workloads, they could validate workloads but also test new infra. Unlike customer workloads, it would be OK to retey due to error.
This would be 100% free, as all electricity and "wear and tear" would be required anyhow.
Cloud providers pay capital costs (CapEx) for servers, GPUs, data centers, employees, etc. Utilization allows them to recoup those costs faster.
Cloud customers pay operational expenses (OpEx) for usage.
So Google generally has excess capacity, and while they would prefer revenue-generating customer usage, they’ve already paid for everything but the electricity, so it’s extremely cheap for them to run their own jobs if the hardware would otherwise be sitting idle.