(Disclosure: I work on Google Cloud and contributed to this effort).
As I said back in November [1] for our initial announcement, the most exciting thing (for me) is that we let you mix and match cores and GPUs. You can see that spelled out in a nice table in the docs [2].
Sorry for being off-topic: can you guys consider introducing something similar to AWS's free tier? It doesn't have to have a lot free resources but it should be lengthy (6+ months, ideally at least 1 year). I recently started a new project, and while I'd prefer to use GC, I had to go with AWS because they offer one year for free (and GC's 2 months is not nearly enough to validate an idea).
The tricky part of a 2 month free trial is that it's not nearly enough time to explore the offerings unless you're a company that's already got a product ready to go and staff to migrate it full-time (and I assume the latter is exactly who you'd want to charge full price to). I took advantage of the 2-month GCP free trial, and ended up using 1 GCE instance, 2 Cloud Storage buckets, your cloud storage webhosting & DNS, and....that's it. I don't even really know what other services exist. It takes time to program things...by the time you get around to realizing that there are other GCP services that may help your project, the trial's already over.
Since there's a dollar cap as well, I don't see the problem with extending the trial out to a long calendar time-period.
Anyway to get a discounted Cloud Next ticket? I'm here in San Francisco, but even the $549 (one-day) is out of budget. Would really love to go. I am a GCP user and have a ops/devops consultancy startup (https://elasticbyte.net).
To each her own! The most common issue is that you have some idea you start working on (T=0), put it away for a while (T=3 months), and come back to realize your free trial has expired. Alternatively, if you're just tinkering in the background or on weekends, 60 days is suddenly just 9 weekends.
What type of architecture are you using? We're starting to set up a distributed backend at http://www.bugdedupe.com, and there don't seem to be a lot of resources out there about how to do it right.
I was excited when I read that Google would offer both NVIDIA and AMD GPUs in the November announcement[1]. Is there any timeline on when AMD GPUs will show up ?
As I said below, "Soon" (and the page says so). You might want to attend Google Cloud NEXT [1], with most of your ticket price being turned into credits for Cloud usage :).
Can you explain how this is done? Does this mean that VM and the GPUs can be on different host machines? Will it affect the GPU-CPU communication performance?
Sounds like if you max the GPUs per CPU core they have CPU cores "left over" that they then use for CPU-only instances, instead of splitting a host to a predefined pattern of just GPU instances that use up all the CPU as well.
A bit off topic, but i recently realized how much the cost per cpu is really competitive in cloud offerings compared to reserved instances in regular hosting providers, but the network costs are absolutely outrageous.
Cpu costs may be something 5 to 10 times more expensive in the cloud, but network are close to 100 times. Any hosting provider will offer you a 250 mbps unlimited network access for your macine, whereas consumming that much bandwidth in google cloud for a month will cost you more than 1000$.
> the cost per cpu is really competitive in cloud offerings
> Cpu costs may be something 5 to 10 times more expensive in the cloud
So are CPUs more or less expensive in the cloud?
Also I assume by "cloud offerings" you mean AWS, Azure, Google Cloud and the like, and by "regular hosting providers" you mean GoDaddy, BlueHost and the like? Or perhaps you mean Paas vs Saas offerings?
Yes and no: yes you can get a Quad-Core 64GB RAM + SSD server for $55/month (source: https://www.hetzner.de/de/hosting/produkte_rootserver/ex51ss...).
But there's one more spec that matters: networking. It takes completely different amount of effort to provide 1Gbps connectivity versus 40Gbps per server.
Most of the providers like Hetzner/OVH provide former, while GCE provides the latter. I'm not saying it's bad, in fact for most of the people 1Gbps would be more than enough. But it's not something that is fair to omit.
Disclaimer: I don't work for Google, just from my experience.
I am sure Hetzner would be able to offer the same as an add-on if you contacted them directly.
That being said, you would continue to pay orders of magnitude less for your bandwidth then you would at Google or AWS. There's a bubble in cloud bandwidth pricing, and I don't think it's value-related.
Cpu are more expensive on google cloud than ovh, but that is expected. I'm ok to pay x2 or even x5 for the agility and the power those platform offer (especially when you need autoscaling up and down every single day).
What i don't understand is the network cost factor of x100 to x1000
To a first approximation, CPU and RAM are required to get your site up and running at all. Bandwidth is less so. Bandwidth scales up as your growth scales up.
So it makes sense for cloud providers to make CPU and RAM relatively cheap, and charge unreasonable prices for bandwidth. If you're growing, you're more inclined to pay, since you're seeing success. Plus you're already locked in at that point.
That depends a lot on your cpu / bandwidth ratio. A modern website consume a lot more cpu per request than an online videogame server (my use case), which is basically the equivalent of a websocket router with a little data transformation in the middle.
No kidding! I liken it to buying a soft drink with dinner. It only costs the restaurant 10c to fill the cup, but they charge $2.99 because the people that want it are willing to pay.
Google's $0.70 rate is in addition to the normal instance cost, so math is more complicated if you don't have a machine already. (Alhough, a low-end high memory machine is $0.126/hr, so the sum is still lower than Amazon)
It really depends on what you're doing. If you are doing large transfers over PCIe back and forth, not really. But lots of things work just fine.
The bigger challenge for large ML models is the memory you'd need to back it. But with GCE you can happily do a custom machine with up to 6.5 GB per vCPU, just so you can fit the output ;).
Slightly off-topic, but I don't know where to get help with this.
I see one of our projects has gotten a quota of 16 GPUs in asia-east1, us-central1, us-east1. However we seem to have been allocated nothing in europe-west1. Is this an error, or simply something I have to manually ask for somewhere?
"Quota 'NVIDIA_K80_GPUS' exceeded. Limit: 0.0" :(
I'm so psyched to finally be able to use GPUs on GCE instead of AWS, so any help would be appreciated.
Yep! And for distributed training, the per-minute billing is incredibly important. But just a correction: Azure also does per-minute billing (following suit from GCE way back in the day).
We do not currently apply sustained-use discounts to GPUs, but we may do so in the future (we need to gather data first on usage, to understand if people will be running 24x7) [1]:
> You cannot attach GPUs to preemptible instances. GPUs do not receive sustained use discounts.
1. Are GPUs covered by the free trial (ie, can that $300 be spent towards GPU instances)?
2. How is support for GCP?
I've been curious about trying GCP, but held off over GPU support (since AWS was covering my needs and I do ML stuff mostly) and general support (since Google doesn't have a great reputation for supporting products).
Also, perhaps an affiliated person can chime in with something about the roadmap to stable GPU support. (It currently says there may be breaking changes.)
1. At this time, we don't grant quota for GPUs for free trial customers (hello various coin miners!). However, if you upgrade from your free trial, you do keep your $300, and that's just money :).
2. Unlike consumer-facing products, GCP is focused on business. We offer paid support plans [1] with high (measured) customer satisfaction. I know several of the people in the support teams (and you see them here on HN as well), and we're really trying to defeat the meme of "Google doesn't do support".
As far as "stable GPU support", this is just confusing language surrounding our usual "Beta" terms [2]. Once it becomes Generally Available, no changes would be made. But moreover (for GCE anyway), we don't make API breaking changes from Beta to GA (Beta to GA is "just" about stability in production).
>1. At this time, we don't grant quota for GPUs for free trial customers (hello various coin miners!). However, if you upgrade from your free trial, you do keep your $300, and that's just money :).
What do I have to upgrade to? Isn't GCE all per-minute? Do I just have to pay $0.05 for a standard instance and then use the $300 from my trial?
Also, is coin mining against any TOS? I'm not planning on doing it, I'm just curious.
Sorry, upgrading to a paid account (abuse risk is just too high for the free trial, so we keep the quota limits low).
Coin mining is not against the TOS. However, because it's usually economically irrational, it's usually abuse. If you don't pay for your GPUs (fake credit card) it's awfully economically rational though ;).
"The board is designed for a maximum input power consumption of 300 W"
So one hour at home:
Power used: 300 Wh
Price: $0.20/kWh
Total cost: $0.06/hour
Google cost: $0.70
You could say it's 10X more expensive, but you need to include all additional costs, power is just a fraction of it. The GPU itself is $5k, which over - let's say - 3 years would cost $5/day, or $0.20/hour, and if you're using it a third of the time, just that amounts to $0.60/hour. Add everything else (you might use less than 1/3 of the time) and it would be far more expensive than using it in the cloud. As it is expected to be.
Keep in mind that they're tricking you. Yes, the entire card is $5k, but they're selling access per GPU. This card has two processors on it, so they're actually selling access to half that card.
I think that's why they chose the K80 as well. By today's standards it's an extremely old card (2 generations behind), but it's the last professional series card to feature two processors on it. It's easier for them to sell, I guess.
Actually, it's more that it's the best Kepler class part. Maxwell, unfortunately, doesn't have full-speed double precision, locking out an entire (important) segment of the market. As I said in another thread, we'll have P100s and so on as soon as we can.
Fair enough, but Maxwell makes up for the lack of DP with other features that other customers appreciate more. Pascal will solve all those issues, but the P100 unfortunately will probably be extremely expensive to use because of the cost of the card.
But the double precision perf of M60 is very low and less than the K80 due to limitations of the Maxwell microarchitecture. So we could say K80 is the last decent dual-GPU Nvidia card...
Ah, sorry, I forgot about that one. The Tesla Maxwells came much, much later after the Geforce cards, so it wasn't very long until Pascal was released. To meet all the needs of the customer, Maxwell was likely not a good choice, but the single precision was much better than Kepler.
A major use for GPU in the cloud is not being limited to the 8-12GB(ish) limits of enthusiast cards. There's similar processing performance, and the ability to store all you need in memory is a much closer reality to why these cards go for $5k
I'm not sure most gtx users doing hpc can be called "enthusiasts". After all, most research is done in these devices, and most models should therefore fit memory requirements (unless you're doing some hardcore - or lazy engineered research model) . Correct me if I'm wrong, but AFAIK the tesla cards are designed to fit regulated markets and are not mass produced like the geforces, therefore the elevated prices.
Do you mean that they fail? I have built a mining setup with multiple gpus in the past and haven't lost a card. I am 100% sure that Google is smarter than me at such setups.
No. It's charged at a rate of $.70/hour per die, but usage is billed per minute. For example, if you use a K80 for 24 minutes (i.e., 40% of an hour) you pay $.7 * .4 = $0.28 total.
Which doesn't even remotely mean their energy is free (though they don't pay as much as you do at home per unit, because they specifically build datacenters where they can get power cheaper, because it's a huge cost at datacenter scale)
I didn't say it was even remotely free, but it's at a scale compared to consumers that it's pretty damn close. 70 cents a minute is A LOT to pay to run these GPU's on Google's end. Which is the point of this discussion.
On the market, renewable energy costs more than non-renewable, simply because there is a higher demand for renewable, and non-renewable has no benefit for consumers.
SIGH. Because GPL you can't pre-install the NVIDIA driver. At that point, it sadly makes more sense to have you roll your own and then bake the image. I'll gladly hand out refunds for the X minutes this takes (don't forget that on kernel upgrades you get to do it again when the kernel headers change!), but like you I wish it weren't so.
Yes, that's correct. But, you still must have the user do so. Otherwise, you're distributing the resulting artifact (the Nvidia driver linked against the kernel) which is GPL (kernel) and not (NVIDIA).
I know you are working on better GPU support in kubernetes, but it would be awesome if I could just grab my image that already run in nvidia-docker and run it on GKE.
You can do that on Nimbix/Jarvice. Submit a Docker image, start batch job and get an e-mail once it's finished. 4 core 32GB RAM machine with K80 costs ~$1.06 there.
Anyone checked if this is compatible with Blender GPU rendering? On my old Mac machine GPU rendering doesn't work; I can still get a 16-core VPS for a few hours when I'm in a hurry but this has the potential for more performance I guess..?
I believe Cycles (the Blender renderer) only relies on CUDA [1], so it should work. Depending on which AMD card you had on your Mac, it sounds like it might not be supported by Cycles. Because the NVIDIA K80 doesn't do Display though, you'd need to run say vnc or run Cycles from the command line.
My iMac 2011 doesn't work with GPU Cycles but that's OK. And I've successfuly boosted render times by renting the multicore VPS by the hour already (commandline, specify different frame range on each machine to render).
Hopefully soon someone will create a GCE image for WebPageTest. It's been a pig to get up and running, but Amazon's per-hour billing is expensive for a machine needed 10-15 minutes every hour.
If you manage to build you app/infrastructure in away that can survive nodes shutting down at random times (or if you don't care about restarts) then you can reduce your infra costs for 80%.
We have a staging cluster running on preemptive instances and as soon as one instance goes away we get a different one. Everything gets deployed automatically. Regular internal users checking out various webpages don't even notice.
We're looking into changing our 24/7 infra (which needs to be 24/7) to something that can be run mostly on preemptive (with a couple of normal instances for services that can't be randomly killed).
Super happy about our move to GCP and our K8S experience.
Sigh. When I first use Google cloud (lowercase c), it was only App Engine (CMIIW). I woke up one day, now it has more than a dozen offerings, confusing like AWS.
Does anyone have, specifically for AI/ML, a list of "If you want to do X, use Y" for Google Cloud offerings? The official list (https://cloud.google.com/products/machine-learning/) doesn't help much. Would appreciate if it's explained by layer (higher such as ready-to-use Speech Recognition and lower where you possibly need to setup some infra stuff).
TPUs are for Tensorflow-based computation only, rather than being generally CUDA compatible so they can run everything from Deep Learning to fluid dynamics simulations. I believe they are also for Google internal applications right now, so not generally available.
As Cloud ML is "just" hosted TensorFlow, once you train the model it stores a .meta file in GCS for you. You can import this in TensorFlow [1] for serving elsewhere if you so choose. Is that what you're after?
The copy on https://cloud.google.com/ml/ under Portable Models says "In future phases, models trained using Cloud Machine Learning can be downloaded for local execution" so I hadn't looked into it further.
Huh... maybe I'm mistaken. Lemme ask the experts and get back to you.
[Edit: Okay, we've decided that the exported model it produces is what you'd expect. We're going to update the landing page, once we can agree on what it should say.]
> AMD FirePro and NVIDIA® Tesla® P100s are coming soon.
from cloud.google.com/gpu (our landing page). That's currently limited on availability of hardware, testing, etc. but it really should be "soon". Note though that P100s are massive and expensive, so we don't intend to get rid of K80s or anything once we have P100s.
Out of curiosity, are there strong reasons to leverage GPUs in standard web application development from a general-purpose standpoint? Can I leverage this to enhance a general-purpose server or database? If so, anywhere I can read more?
Always interested in what kinds of things one can do when new offerings like this are made.
Short answer: no. Long answer: if and only if you can really parallelize your processing into very small calculation each doesn't take a lot of memory, and you have a lot of time of writing special code using special framework. Perhaps streaming encoding in real time could be one, but that's already sort of beyond general web application.
GPU is powerful because a GPU usually has 100+ cores. Each core is weak and inefficient, but power adds up when you have 100+ cores available.
Besides 3D graphics and machine learning, GPUs are also good for image resizing/encoding and video encoding/transcoding. But I haven't heard of anyone trying to accelerate Node.js or MongoDB on GPUs.
SQream DB (http://www.sqream.com) is an SQL analytics database which uses GPUs.
It is on both AWS P2 and Azure NC machines, and it can do very fast analytics on hundreds of raw terabytes on those.
Having said that, on-premise or bare-metal is about 25% faster than AWS P2 for large datasets (over 2TB). For smaller datasets that may fit in-memory, they function about the same.
No. The K80 is a Compute-only device from NVIDIA (i.e., it won't run OpenGL or DirectX). We've previously announced that cards with Display are coming "soon".
Does anyone else find it kind of odd that Google, a company famous for having no human support at all anywhere who shuts down any attempts at contacting a human within its ranks for support on ANY of their products or services, always seems to deploy a small army of senior tech people to answer questions every time there is an article posted on HN. Strange double standards afoot, some users matter, others not so much.
> Does anyone else find it kind of odd that Google, a company famous for having no human support at all anywhere
While a number of people push that false meme repeatedly, everything I've heard from people who've used it (or worked on it, but the latter comes with obvious bias) is that the paid human support on GCP is good.
Heck, on the consumer side, I've gotten good (quality and speed) human support on Google Express, too.
GCP gold support is great as you actually get through to an SRE, silver (what I use)... Not so great, often you'll submit a ticket with a ton of technical detail only to have them come back with something asinine rather than escalating it if it's past their understanding. The difference in price is pretty striking though, so it's no surprise the different in quality of service is different.
Honestly though I can't say I've ever really needed support on GCP, most of the times I've raised tickets its been due to funny behaviour (slow spinning disks in us-central1 was the last one).
Yeah, the latter is literally the case. No one is 'deployed' anywhere, and it's a little amusing to think that we have "no support," but would deploy people to web forums, when looking to promote a commercial business.
(We have support, more than 600 security engineers alone, we just offer commercial support for a commercial product.)
No, but I do find these regurgitated posts regarding their lack of human interaction annoying. I've always been able to talk to someone at Google on Google Play. Why don't you try it? Go to Google Play and initiate a chat session or ask for someone to call you.
As I said back in November [1] for our initial announcement, the most exciting thing (for me) is that we let you mix and match cores and GPUs. You can see that spelled out in a nice table in the docs [2].
[1] https://news.ycombinator.com/item?id=12963902
[2] https://cloud.google.com/compute/docs/gpus/