I tried out this service last time bitcoin was in a slump, in 2018. I was out of work, and toying with the idea of commercializing a Norwegian (bokmål-nynorsk) language model which I had trained.
Prices for compute were competitive, but most instances had lousy bandwidth. You don't need bandwidth to run bitcoin miners, apparently.
The people behind it were very friendly and helpful. However, there was an incident in 2020, long after I'd gotten a job and abandoned my ML project. They tried to charge my card a small amount ($1.70), and sent a message that my card was expired, which indeed it was. I asked them what that was all about, since I'd not rented any instances for years, and I never got any explanation other than "that's weird".
I'm glad you mostly? liked our service. Here's a better (long overdue) explanation: our billing system attempted to bill you because you had a negative credit balance (owed us) - most likely from forgetting to destroy an instance. However there's a threshold for any credit card charge around $5. I suspect that we later experimented with a different threshold criteria, which then caused the billing system to attempt to charge your card the balance owed - a long while after it was accrued.
We do now have better billing history reports so you can see what caused a charge, although sadly the customer service still isn't much better. Source: I am the CEO/founder and still handle most customer service.
Your service helped me immensely when I was getting started with machine learning, but had no budget and just a laptop. For me, everything “just worked”, and I learned a ton about Docker and created my own system to handle the ephemeral nature of the computing systems by automatically backing up training checkpoints, programmatically finding the next best “deal”/“bid” and resuming the training on a different instance.
Was very very cool, and I generally recommend it to anyone I come across who has modest needs (<25 GPUs).
I did find the throughout was not always accurate at the time. This was about 2 years ago. It was fairly frequent that a listing would say “300” to “1000” Mbps Up+Down, but would actually get an order of magnitude less to any of the big cloud services (GCP, AWS, etc). It wasn’t important to me that the speed was low, but it was important that it didn’t match what was “advertised”. For certain workloads I would have gladly paid more for higher throughput but that’s not really an option when the listings couldn’t be fully trusted.
I also heard there may be some opportunities to market your technology/platform for “on-prem”, “on-demand” GPU clouds for large enterprises so that a pool of corporate GPUs could be efficiently used and accurately billed to a variety of internal stakeholders. Could improve asset utilization for capital intensive on premise GPU’s.
Wow, I was definitely considering trying this out until I saw your comment. They charged you mistakenly, didn't detect it on their side, and then never gave you a reasonable explanation for why? For me, that's a definite no-go.
That's such a small thing tho. It was a tiny amount and presumably they would've refunded it had it gone through. It's also possible they would've discovered it themselves soon enough, since many companies only look at the money books at the end of the month.
Also, I doubt you'd get a "reasonable explanation" from any other company - at best they'll say "sorry it was a (human|software) error, we've taken steps to ensure it won't happen again", as disclosing anything more is usually against company policy.
Yeah, that was my first thought too, but the card was already expired, and a scammer would probably have known that immediately without testing (since they presumably need the expiry date as well as the card number to charge anything).
I've had mini-charges like these from other cloud providers, too, but then I could confirm the reason for it (some sort of storage I'd missed while deleting the instances).
It matters when evaluating their response. If they said "huh, weird" for a 200$ charge I'd be suspicious, but for such a small amount, I totally see why they wouldn't bother explaining how it happened.
Key context here is vintermann had a balance due (and gave no indication that he disputes that). Of course there should have been a better explanation - especially if asked for. But to the extent there was a mistake, arguably it was simply in not charging the card earlier.
I'm glad I saw this comment before using this website. Never trust services that charges you years later without any reason. They would've probably continued to charge you without you knowing if the card wasn't expired!
I found this service interesting. This is how it pitches itself:
"Rent out your GPU to make your hobby pay for itself. Transform your mining farm into a GPU training center and earn ~2x to ~4x more per gpu-hour than mining cryptocurrency. We connect you with customers and provide simple tools to streamline hosting. You set your own prices and schedules. Get started today."
For me, the second-biggest obstacle would be data size and correlated, data transfer speed. I regularly analyse GBs to TBs of data on supercomputers which have PBs of free space, regular consumer PCs don't have the harddrive space for that kind of thing. And even if they have, they may not have the transfer speeds to make copying my data there feasible - by the time copying has finished, I may have already run the analysis on a slower CPU
I don't work with ML, so maybe this is stupid, but wouldn't a "regular consumer PC" have orders of magnitude less GPU power than a "supercomputer"? I'm assuming the latter has some form of GPU farm, whereas I don't know if you can fit more that 3-4 fat GPUs in the former.
So if you'd be able to split the workload such that it would be able to run on consumer PCs in a reasonable time, wouldn't that also split the storage requirements the same way?
Or, if instead you're OK with waiting for ages for the regular GPU to do its thing, is transfer speed that much of an issue?
Yes there are models that are trained on hundreds of GPUs but from my limited experience in scientific computing, most of the time researchers run their programs on a single node because going multi-gpu or multi-cpu requires somewhat large code changes and a single node is "good enough" for their use case or they come from a heavy science background and don't even know how to utilize multi node architecture. Their main benefit from using a cluster is 100% uptime, large storage, and large memory. I've been to multiple research institutes where there is an institute-wide HPC and researchers share it, that way no one needs any kind of high end computer and can just connect to the cluster.
This service can help in that area if researchers can somehow schedule a job from their low-end laptop and get the results when the job is done.
Sometimes! Not for the GPU part, but to make the summary data for the ML data.
For example, genome sequencing data and intermediate results are easily in the TB area of space, but the resulting table of genomic variants (k-mers like n-grams in NLP) is only a few hundred GB.
Sorry for the tangent, but I followed the thread, and something that confused me was this:
> Power outages, well there's not much you can do about that, again I don't know where you live, but those are pretty rare where I am. Maybe once a year for a few hours.
Once a year, a few hours. That’s not pretty rare, that’s close to frequent. I’ve had two outages in 17 years, only one of those was longer than a few minutes.
99.95% uptime is a few hours a year of downtime. Some people lose power almost daily. The thing about exponential / logarithmic distributions is your sense of what is frequent or rare is quite arbitrary and based on your own experiences which are rarely all that representative.
When I lived on country roads miles out of town power outages happened many times a year, occasionally for days. Some I’ve lived in big buildings with buried power lines I’ve never had an outage in years. People in developing economies in some places only have occasional electricity. Is quite hard to say which one is normal.
This seems like a no go for anything where the dataset used for training includes any private data at all, or even public images that have been privately labeled by the company. It makes sense for hobbyists, but I doubt this entire platform was created for only hobbyist use in mind.
They say they intend to implement encrypted hosting environments in the future, but considering the number of security exploits that Intel SGX and its equivalents have had, I'm not sure I'd trust that either.
From the homepage (https://vast.ai/) it's literally one click ("Search Marketplace" button). From the FAQ page, it's literally two clicks (first homepage, then marketplace button).
Or am I missing something obvious here? Maybe the page looked different 7 hours ago?
"Search Marketplace" is not the text for a button I'm expecting to pricing. After looking further, I resorted to clicking "Search Marketplace" but gave up after it took forever with zero indication that anything was happening so I clicked something else. Something about the first loading of that page takes a lot of time. Since it's faster after the first load, so I'm assuming some large initial download (probably a large JS library???).
You're not expecting the pricing for products in a marketplace, on a page labeled "marketplace"? I'm not sure how much more explicit it could get.
The loading time sure is dumb though, I agree. But that doesn't affect the number of clicks to take to find something, only the time (which arguably, is worse).
Also, you might want to investigate why your browser doesn't show any loading indicator when it's loading content. Firefox and Chrome should both do it by default, but if you're not seeing it maybe something is broken or you've changed some setting.
Yes, I finally looked at the browser tab to see that indication of "activity" was occurring. However, if the website knows that it is going to take some time to pull in a large asset, it would be better to have a page that loads quickly and shows some sort of activity to the user via UI rather than just "browser default". In fact, I don't even know what the the default indicator on a mobile browser looks like
Speaking as someone who has very little knowledge of ML, let's say I wanted to create a site like www.remove.bg and wanted to use this service to training my own models (from scratch). Is it feasible to do? How many hours ballpark would it take to train for something like this. How does it upload my training images to the target machine?
I see the interruptible pricing for RTX3080 is $0.130/hr. Anybody with some ML experience calculate approx cost using this?
P.S. I have no intention of creating a remove.bg clone, just curious if it can be done and if so how much approx it will costs.
An interesting sci-fi/spy thriller could be getting someone arrested by training NeuralHash 2.0 with CSAM on their rig using a service like this. After all that news last year I'd be super paranoid about that, even if it's extremely unlikely that a hobbyist would attempt such a thing without backing from law enforcement.
It's practically impossible now to get new GPU capacity on AWS, so AWS is not really a competitive option for labs that didn't have it before the recent restrictions. Vast.ai is much more usable.
I've always wanted to create something like this and make people use a specific cryptocurrency to rent or receive payments that they could also mine with a GPU and call it 'Ouroboros'.
I wonder how much data mining is going on on these machines. They can easily benefit from the huge free, potentially well-labeled, data they can re-sell to other companies
Neat idea, but it seems to be born out of current market conditions and might not be sustainable going forward once supply/demand for GPUs normalize.
As someone running these workloads I'd only be desperate enough to pay $250/mo for a 3090 today because I cannot buy it for $1500 at the store straight up. And if you drop the price by a third now it isn't worth it for the host anymore.
Electricity plus some small premium should be tenable under any market condition. At least it works for public clouds so far even as hardware prices have fluctuated over time. There will likely always be someone who'd rather briefly rent than buy.
I do wonder if what's charged for electricity and premium can be competitive with large data centers in the long run. Amazon already has spot pricing. I expect it would be favorable for consumers with cheap electricity or who are already paying to heat their house.
I also wonder what effect renewables increasing electricity spot price volatility might have on this market in the future.
Public clouds have economies of scale, reliability and capacity all working in their favor. There's a reason there has never been a market for renting out spare CPU cycles or hard disk space or any other resource piecemeal from your computer.
DO and friends wouldn't exist in the CPU side if that was true, so a spot economy in GPU side makes sense, and even more so bc they are higher-end time shares
Even if we didn't have an open ticket with Azure since Christmas for more GPUs, the ability to burst non-sensitive tasks on say 20 cheaper GPUs for 5-30 min is attractive.
That stuff adds up fast, and lowering cost both opens accessibility for low-end users and helps scale what power users can do.
My bigger surprise is vast has been around awhile and only ~20 available servers, so I'm guessing < 100 total. Is this a friction issue?
While I love the idea, the requirement for Ubuntu means I can't contribute my compute though (and neither can any of the PC gamers with Windows machines, or anyone else on other distros it seems). As much as I would love to do this over mining a cryptocurrency (which I don't do in the first place), I can't.
Ah yes, a new take on the botnet-as-a-service genre.
Running untrusted code, even inside a container, is a terrible idea. Container escapes get discovered all the time and even without them you could get yourself in serious trouble with the police by just letting random people use your network. Sure, you'll be able to defend yourself in court, but nobody has time to go to court over this stuff.
If you run this inside a VM with a PCIe paasthrough and all network traffic tunneled through some kind of VPN then maybe it's worth the effort, but I just wouldn't risk it.
These decides have their performance and internet connection uplink listed, which make for very interesting targets. Yes, a hundred shitty smart scales are a nice target, but they're underpowered for more involved attacks.
With a fiber uplink, a decent CPU and plenty of disk space advertised, this service can be very attractive, especially since you only need to pay for a minute to escape the boundaries of the container runtime if there's an exploit.
Also, it doesn't matter that there's only a few computers if you're using other people's computers to break the law.
I am interested in the technical aspect. What is the tech stack to enable such a vastly distributed host network? Presumably, people have to open up their home networks, but the company has to ensure that the host machines are secure.
The customer use license specifically talks about datacenters, not "cloud": "The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted" https://www.nvidia.com/en-us/drivers/geforce-license/
Prices for compute were competitive, but most instances had lousy bandwidth. You don't need bandwidth to run bitcoin miners, apparently.
The people behind it were very friendly and helpful. However, there was an incident in 2020, long after I'd gotten a job and abandoned my ML project. They tried to charge my card a small amount ($1.70), and sent a message that my card was expired, which indeed it was. I asked them what that was all about, since I'd not rented any instances for years, and I never got any explanation other than "that's weird".