The only complaint I have with Cloud Run now (after many usability updates since the initial release) is that there is no IP rate-limiting to prevent abuse, which has been the primary cause of unexpected costs. (due to how Cloud Run works, IP rate-limiting has to be on Google's end; implementing it on your end via a proxy eliminates the ease-of-use benefits)
I'm currently serving an api that uses a 500mb resnet v2 model.
The bootup takes to long, so now I have a single instance that can't handle any peaks and costs too much.
Doesn't your model take to long to spin up before being able to serve a request ?
You can work around "cold starts" by periodically making requests to your Cloud Run service which can help prevent the container instances from scaling to zero.
Use Google Cloud Scheduler to make requests every few minutes.
Does my application get multiple requests concurrently?
Contrary to most serverless products, Cloud Run is able to send multiple requests to be handled simultaneously to your container instances.
Each container instance on Cloud Run is (currently) allowed to handle up to 80 concurrent requests. This is also the default value.
What if my application can’t handle concurrent requests?
If your application cannot handle this number, you can configure this number while deploying your service in gcloud or Cloud Console.
Most of the popular programming languages can process multiple requests at the same time thanks to multi-threading. But some languages may need additional components to do concurrent requests (e.g. PHP with Apache, or Python with gunicorn).
Yes unfortunately, but that's the caveat of services-on-demand. I'm looking more into more efficient/cheap model deployment workflows. (it might be just running the equivalent of Cloud Run on Knative/GKE, backed by GPUs)
Does GCP have an equivalent for Aurora Serverless? If so, would choosing that over CloudSQL been cheaper?
If you're familiar with AWS, would using AWS Batch exclusively with Spot pricing [0] (or Fargate with Spot pricing) and Aurora Serverless [1] been cheaper than Cloud Run + CloudSQL?
[0] Say, the service runs for 5 mins every hour for 30 days. The respectable a1.large instance would cost $0.50 per month, the cheapest t3.nano would cost around $0.19 per month.
[1] Say, the service stores rolling 10 GiB/Month ($1.00) and does about 1000 calls per 5 minutes every hour ($0.20) using 2 ACUs ($3.60). This would cost $4.80 per month.
I haven't create a service for auto-generated tweets yet (just human-curated ones), but for similar service which output tweet-length text (w/ a 2GiB RAM side), it takes about 30s on a cold boot (which makes sense as it has to load the model to RAM), and ~12s to generate text after a cold boot.
The only complaint I have with Cloud Run now (after many usability updates since the initial release) is that there is no IP rate-limiting to prevent abuse, which has been the primary cause of unexpected costs. (due to how Cloud Run works, IP rate-limiting has to be on Google's end; implementing it on your end via a proxy eliminates the ease-of-use benefits)