Hacker News new | past | comments | ask | show | jobs | submit login
Vast.ai – marketplace for renting out your GPU, or renting someone else's GPU (vast.ai)
179 points by andrewstuart on March 21, 2022 | hide | past | favorite | 74 comments



I tried out this service last time bitcoin was in a slump, in 2018. I was out of work, and toying with the idea of commercializing a Norwegian (bokmål-nynorsk) language model which I had trained.

Prices for compute were competitive, but most instances had lousy bandwidth. You don't need bandwidth to run bitcoin miners, apparently.

The people behind it were very friendly and helpful. However, there was an incident in 2020, long after I'd gotten a job and abandoned my ML project. They tried to charge my card a small amount ($1.70), and sent a message that my card was expired, which indeed it was. I asked them what that was all about, since I'd not rented any instances for years, and I never got any explanation other than "that's weird".


I'm glad you mostly? liked our service. Here's a better (long overdue) explanation: our billing system attempted to bill you because you had a negative credit balance (owed us) - most likely from forgetting to destroy an instance. However there's a threshold for any credit card charge around $5. I suspect that we later experimented with a different threshold criteria, which then caused the billing system to attempt to charge your card the balance owed - a long while after it was accrued.

We do now have better billing history reports so you can see what caused a charge, although sadly the customer service still isn't much better. Source: I am the CEO/founder and still handle most customer service.


Your service helped me immensely when I was getting started with machine learning, but had no budget and just a laptop. For me, everything “just worked”, and I learned a ton about Docker and created my own system to handle the ephemeral nature of the computing systems by automatically backing up training checkpoints, programmatically finding the next best “deal”/“bid” and resuming the training on a different instance.

Was very very cool, and I generally recommend it to anyone I come across who has modest needs (<25 GPUs).

I did find the throughout was not always accurate at the time. This was about 2 years ago. It was fairly frequent that a listing would say “300” to “1000” Mbps Up+Down, but would actually get an order of magnitude less to any of the big cloud services (GCP, AWS, etc). It wasn’t important to me that the speed was low, but it was important that it didn’t match what was “advertised”. For certain workloads I would have gladly paid more for higher throughput but that’s not really an option when the listings couldn’t be fully trusted.

I also heard there may be some opportunities to market your technology/platform for “on-prem”, “on-demand” GPU clouds for large enterprises so that a pool of corporate GPUs could be efficiently used and accurately billed to a variety of internal stakeholders. Could improve asset utilization for capital intensive on premise GPU’s.


Wow, I was definitely considering trying this out until I saw your comment. They charged you mistakenly, didn't detect it on their side, and then never gave you a reasonable explanation for why? For me, that's a definite no-go.


That's such a small thing tho. It was a tiny amount and presumably they would've refunded it had it gone through. It's also possible they would've discovered it themselves soon enough, since many companies only look at the money books at the end of the month.

Also, I doubt you'd get a "reasonable explanation" from any other company - at best they'll say "sorry it was a (human|software) error, we've taken steps to ensure it won't happen again", as disclosing anything more is usually against company policy.


Scammers use small changes on stolen cards as a test. If the test change doesn't get noticed or flagged, later a much larger is sent through.

"That's weird" is a not an acceptable answer for a card was charged for any amount.


Yeah, that was my first thought too, but the card was already expired, and a scammer would probably have known that immediately without testing (since they presumably need the expiry date as well as the card number to charge anything).

I've had mini-charges like these from other cloud providers, too, but then I could confirm the reason for it (some sort of storage I'd missed while deleting the instances).


the amount doesnt matter, it's about getting charged for no reason. You never know how much it would be for the next person they charge


It matters when evaluating their response. If they said "huh, weird" for a 200$ charge I'd be suspicious, but for such a small amount, I totally see why they wouldn't bother explaining how it happened.


you can't keep customers like that


I’ve used vast.ai many times over several years. I use the “top up” approach. Never any unsanctioned payments.


Key context here is vintermann had a balance due (and gave no indication that he disputes that). Of course there should have been a better explanation - especially if asked for. But to the extent there was a mistake, arguably it was simply in not charging the card earlier.


I'm glad I saw this comment before using this website. Never trust services that charges you years later without any reason. They would've probably continued to charge you without you knowing if the card wasn't expired!


I found this service interesting. This is how it pitches itself:

"Rent out your GPU to make your hobby pay for itself. Transform your mining farm into a GPU training center and earn ~2x to ~4x more per gpu-hour than mining cryptocurrency. We connect you with customers and provide simple tools to streamline hosting. You set your own prices and schedules. Get started today."

The biggest obstacle of course is security.

The founders described it as "kinda like airbnb for compute" here: https://www.reddit.com/r/gpumining/comments/8xu04h/vastai_be...


For me, the second-biggest obstacle would be data size and correlated, data transfer speed. I regularly analyse GBs to TBs of data on supercomputers which have PBs of free space, regular consumer PCs don't have the harddrive space for that kind of thing. And even if they have, they may not have the transfer speeds to make copying my data there feasible - by the time copying has finished, I may have already run the analysis on a slower CPU


I don't work with ML, so maybe this is stupid, but wouldn't a "regular consumer PC" have orders of magnitude less GPU power than a "supercomputer"? I'm assuming the latter has some form of GPU farm, whereas I don't know if you can fit more that 3-4 fat GPUs in the former.

So if you'd be able to split the workload such that it would be able to run on consumer PCs in a reasonable time, wouldn't that also split the storage requirements the same way?

Or, if instead you're OK with waiting for ages for the regular GPU to do its thing, is transfer speed that much of an issue?


Yes there are models that are trained on hundreds of GPUs but from my limited experience in scientific computing, most of the time researchers run their programs on a single node because going multi-gpu or multi-cpu requires somewhat large code changes and a single node is "good enough" for their use case or they come from a heavy science background and don't even know how to utilize multi node architecture. Their main benefit from using a cluster is 100% uptime, large storage, and large memory. I've been to multiple research institutes where there is an institute-wide HPC and researchers share it, that way no one needs any kind of high end computer and can just connect to the cluster. This service can help in that area if researchers can somehow schedule a job from their low-end laptop and get the results when the job is done.


Are the PBs of free space necessary? I thought training a ML model on a terabyte of data was computation expensive, but not storage intensive.


Sometimes! Not for the GPU part, but to make the summary data for the ML data.

For example, genome sequencing data and intermediate results are easily in the TB area of space, but the resulting table of genomic variants (k-mers like n-grams in NLP) is only a few hundred GB.


I'm confused from your post where the petabytes of storage come in.


I'm guessing that there are just many data sets, and each one is in the TBs of size.


> The founders described it as "kinda like airbnb for compute" here:

"... (business model: Uber for GPU waste heat)"


I immediately thought of n-gate and then laughed out loud at this. I miss n-gate.


so an open market for scalpers to sell you a "GPU time share"...


Sorry for the tangent, but I followed the thread, and something that confused me was this:

> Power outages, well there's not much you can do about that, again I don't know where you live, but those are pretty rare where I am. Maybe once a year for a few hours.

Once a year, a few hours. That’s not pretty rare, that’s close to frequent. I’ve had two outages in 17 years, only one of those was longer than a few minutes.


99.95% uptime is a few hours a year of downtime. Some people lose power almost daily. The thing about exponential / logarithmic distributions is your sense of what is frequent or rare is quite arbitrary and based on your own experiences which are rarely all that representative.

When I lived on country roads miles out of town power outages happened many times a year, occasionally for days. Some I’ve lived in big buildings with buried power lines I’ve never had an outage in years. People in developing economies in some places only have occasional electricity. Is quite hard to say which one is normal.


We have 5-10 outages a year where I live (rapidly growing city in the US), usually 1-3 hours in length but occasionally longer.

The power lines are all above ground and there are lots of trees. Every big storm brings a substantial risk of knocking the power out.


Wow, I did not know that. For me, outages are connected to major disasters.


This seems like a no go for anything where the dataset used for training includes any private data at all, or even public images that have been privately labeled by the company. It makes sense for hobbyists, but I doubt this entire platform was created for only hobbyist use in mind.

They say they intend to implement encrypted hosting environments in the future, but considering the number of security exploits that Intel SGX and its equivalents have had, I'm not sure I'd trust that either.


I love the idea of this.

As a potential user, I'm disliking how many clicks it takes to find even an approximate price. Prices should be easiest to find.


You can see spot prices here: https://vast.ai/console/create/


Thanks! I'm UI-impaired, I never would have found it. =)

(also, wow, that page renders slowly!)


> wow, that page renders slowly!

Dogfooding?


From the homepage (https://vast.ai/) it's literally one click ("Search Marketplace" button). From the FAQ page, it's literally two clicks (first homepage, then marketplace button).

Or am I missing something obvious here? Maybe the page looked different 7 hours ago?


"Search Marketplace" is not the text for a button I'm expecting to pricing. After looking further, I resorted to clicking "Search Marketplace" but gave up after it took forever with zero indication that anything was happening so I clicked something else. Something about the first loading of that page takes a lot of time. Since it's faster after the first load, so I'm assuming some large initial download (probably a large JS library???).


You're not expecting the pricing for products in a marketplace, on a page labeled "marketplace"? I'm not sure how much more explicit it could get.

The loading time sure is dumb though, I agree. But that doesn't affect the number of clicks to take to find something, only the time (which arguably, is worse).

Also, you might want to investigate why your browser doesn't show any loading indicator when it's loading content. Firefox and Chrome should both do it by default, but if you're not seeing it maybe something is broken or you've changed some setting.


Yes, I finally looked at the browser tab to see that indication of "activity" was occurring. However, if the website knows that it is going to take some time to pull in a large asset, it would be better to have a page that loads quickly and shows some sort of activity to the user via UI rather than just "browser default". In fact, I don't even know what the the default indicator on a mobile browser looks like


Speaking as someone who has very little knowledge of ML, let's say I wanted to create a site like www.remove.bg and wanted to use this service to training my own models (from scratch). Is it feasible to do? How many hours ballpark would it take to train for something like this. How does it upload my training images to the target machine?

I see the interruptible pricing for RTX3080 is $0.130/hr. Anybody with some ML experience calculate approx cost using this?

P.S. I have no intention of creating a remove.bg clone, just curious if it can be done and if so how much approx it will costs.


I would guess a few days of training, but this will vary widely depending on the type of model and how much data you train on.


An interesting sci-fi/spy thriller could be getting someone arrested by training NeuralHash 2.0 with CSAM on their rig using a service like this. After all that news last year I'd be super paranoid about that, even if it's extremely unlikely that a hobbyist would attempt such a thing without backing from law enforcement.


Used this several times, works great. But for smaller training tasks (max 1 GPU), colab pro plus is much cheaper


Absolutely incredible how cheap the compute power is on this platform. Vast has saved me tens of thousands of dollars running ML experiments.


What code do you actually runs on those instances? For example is it possible to somehow run hashcat on a rented maching?


Yes. You get a full Linux shell. You can deploy images and some already with hashcat installed


I'm wondering why researchers would rather use this than pay AWS?

Is it significantly cheaper?


It's practically impossible now to get new GPU capacity on AWS, so AWS is not really a competitive option for labs that didn't have it before the recent restrictions. Vast.ai is much more usable.


Surprising


Yes, it’s been a while since I used it but about half the price of an AWS spot instance.


Half price is a good deal


I've always wanted to create something like this and make people use a specific cryptocurrency to rent or receive payments that they could also mine with a GPU and call it 'Ouroboros'.


please do


I wonder how much data mining is going on on these machines. They can easily benefit from the huge free, potentially well-labeled, data they can re-sell to other companies


Neat idea, but it seems to be born out of current market conditions and might not be sustainable going forward once supply/demand for GPUs normalize.

As someone running these workloads I'd only be desperate enough to pay $250/mo for a 3090 today because I cannot buy it for $1500 at the store straight up. And if you drop the price by a third now it isn't worth it for the host anymore.


Electricity plus some small premium should be tenable under any market condition. At least it works for public clouds so far even as hardware prices have fluctuated over time. There will likely always be someone who'd rather briefly rent than buy.

I do wonder if what's charged for electricity and premium can be competitive with large data centers in the long run. Amazon already has spot pricing. I expect it would be favorable for consumers with cheap electricity or who are already paying to heat their house.

I also wonder what effect renewables increasing electricity spot price volatility might have on this market in the future.


Public clouds have economies of scale, reliability and capacity all working in their favor. There's a reason there has never been a market for renting out spare CPU cycles or hard disk space or any other resource piecemeal from your computer.


DO and friends wouldn't exist in the CPU side if that was true, so a spot economy in GPU side makes sense, and even more so bc they are higher-end time shares

Even if we didn't have an open ticket with Azure since Christmas for more GPUs, the ability to burst non-sensitive tasks on say 20 cheaper GPUs for 5-30 min is attractive.

That stuff adds up fast, and lowering cost both opens accessibility for low-end users and helps scale what power users can do.

My bigger surprise is vast has been around awhile and only ~20 available servers, so I'm guessing < 100 total. Is this a friction issue?


What if instead of 1 GPU for a month, you need 10 GPUs for 24 hours? Renting is a lot more flexible in that way.


While I love the idea, the requirement for Ubuntu means I can't contribute my compute though (and neither can any of the PC gamers with Windows machines, or anyone else on other distros it seems). As much as I would love to do this over mining a cryptocurrency (which I don't do in the first place), I can't.


Ah yes, a new take on the botnet-as-a-service genre.

Running untrusted code, even inside a container, is a terrible idea. Container escapes get discovered all the time and even without them you could get yourself in serious trouble with the police by just letting random people use your network. Sure, you'll be able to defend yourself in court, but nobody has time to go to court over this stuff.

If you run this inside a VM with a PCIe paasthrough and all network traffic tunneled through some kind of VPN then maybe it's worth the effort, but I just wouldn't risk it.


good luck getting your hardware after it was confiscated, too.


Why bother though, the number of machines you can hack is very limited.


These decides have their performance and internet connection uplink listed, which make for very interesting targets. Yes, a hundred shitty smart scales are a nice target, but they're underpowered for more involved attacks.

With a fiber uplink, a decent CPU and plenty of disk space advertised, this service can be very attractive, especially since you only need to pay for a minute to escape the boundaries of the container runtime if there's an exploit.

Also, it doesn't matter that there's only a few computers if you're using other people's computers to break the law.


The threat is using your IP address to do illegal stuff (read: serve/proxy CSAM).


I am interested in the technical aspect. What is the tech stack to enable such a vastly distributed host network? Presumably, people have to open up their home networks, but the company has to ensure that the host machines are secure.


Well that answers my question "Would it be worth renting out my unused 1080?"...

No...


Love vast.ai. Only wish is for it to be easier to work with persistent disks over time. I only use it for one off expensive jobs now.


Can this be profitable for folks to buy new gear for?

They claim 2-4x crypto revenues, and crypto revenues are basically break even now right?


IANAL, but isn’t this against GPU terms of service, as in why companies can’t just host 3080s etc in cloud?


The customer use license specifically talks about datacenters, not "cloud": "The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted" https://www.nvidia.com/en-us/drivers/geforce-license/


TIL that a GPU has a ToS?! What's next ToS'd RAM and storage?


Technically it's not ToS to GPU, it's a license to the software: https://www.nvidia.com/en-us/drivers/geforce-license/ So using a completely open source or third party driver and no CUDA is fine.


How much does it cost for training BERT large?


just do it on colab, save checkpoints, and keep training


thx bruh




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: