I've been using Cloud Run for my GPT-2 text generation apps (https://github.com/...

ParanoidShroom · on Jan 12, 2020

I'm currently serving an api that uses a 500mb resnet v2 model. The bootup takes to long, so now I have a single instance that can't handle any peaks and costs too much. Doesn't your model take to long to spin up before being able to serve a request ?

chishaku · on Jan 12, 2020

From: https://github.com/ahmetb/cloud-run-faq#how-to-keep-a-cloud-...

---

How to keep a Cloud Run service “warm”?

You can work around "cold starts" by periodically making requests to your Cloud Run service which can help prevent the container instances from scaling to zero.

Use Google Cloud Scheduler to make requests every few minutes.

Does my application get multiple requests concurrently?

Contrary to most serverless products, Cloud Run is able to send multiple requests to be handled simultaneously to your container instances.

Each container instance on Cloud Run is (currently) allowed to handle up to 80 concurrent requests. This is also the default value.

What if my application can’t handle concurrent requests?

If your application cannot handle this number, you can configure this number while deploying your service in gcloud or Cloud Console.

Most of the popular programming languages can process multiple requests at the same time thanks to multi-threading. But some languages may need additional components to do concurrent requests (e.g. PHP with Apache, or Python with gunicorn).

minimaxir · on Jan 12, 2020

Yes unfortunately, but that's the caveat of services-on-demand. I'm looking more into more efficient/cheap model deployment workflows. (it might be just running the equivalent of Cloud Run on Knative/GKE, backed by GPUs)

ignoramous · on Jan 12, 2020

Does GCP have an equivalent for Aurora Serverless? If so, would choosing that over CloudSQL been cheaper?

If you're familiar with AWS, would using AWS Batch exclusively with Spot pricing [0] (or Fargate with Spot pricing) and Aurora Serverless [1] been cheaper than Cloud Run + CloudSQL?

[0] Say, the service runs for 5 mins every hour for 30 days. The respectable a1.large instance would cost $0.50 per month, the cheapest t3.nano would cost around $0.19 per month.

[1] Say, the service stores rolling 10 GiB/Month ($1.00) and does about 1000 calls per 5 minutes every hour ($0.20) using 2 ACUs ($3.60). This would cost $4.80 per month.

orf · on Jan 12, 2020

Tensorflow serving, or whatever name Google cloud has given their managed version.

rocho · on Jan 13, 2020

You can run it on GKE with autoscaling.

montychain · on Jan 12, 2020

How much do you end up paying on average for a tweet-sized generated text?

minimaxir · on Jan 12, 2020

I haven't create a service for auto-generated tweets yet (just human-curated ones), but for similar service which output tweet-length text (w/ a 2GiB RAM side), it takes about 30s on a cold boot (which makes sense as it has to load the model to RAM), and ~12s to generate text after a cold boot.

From the pricing (https://cloud.google.com/run/pricing):

12 * ($0.00002400) + 12 * (2 * $0.00000250) = $0.000348 per text

...and that's assuming you go over the free tier limit.