There's a lot of comments on that reddit thread about how awesome this would be ...

Samin100 · on March 24, 2018

It's pretty much letting people with GPUs become a sort of 'mini-cloud provider'. There's no job queue or fancy distributing computing setup. We just let you SSH into a container on someone's computer, and pay for the time used.

I was doing a lot of fun deep learning projects on the side & would often Venmo my friend who was mining Ethereum to use his GPU to train my models. He made more and I paid less than AWS spot instances or Paperspace.

This is just a fun side project hoping to let people who want to train their deep learning models do it cheaply (on other people's computers!)

jonnydubowsky · on March 24, 2018

I love the way this idea evolved across threads, to stimulate a great discussion! As this idea attracts attention, the limits of scaling this become clear, as does the need for well balanced network incentives. This is one of those problems that actually does benefit from a blockchain token, and a few implementations are just emerging. I imagine success with this concept will include Homomorphic Encryption or Zero Knowledge Proofs, in order to prove unique processing took place. Value added services seem to be a natural fit as well. Check out Openmined ( https://github.com/OpenMined/Docs) Cardstack (particularly this recent post “The Tally Protocol: Scaling Ethereum With Untapped GPU Power” @christse https://medium.com/cardstack/the-tally-protocol-scaling-ethe... and Ocean Protocol, and as others mentioned, SONM, iExec, Golem, and all the BOINC tokens (pascal coin). I so look forward to this whole niche maturing.

sangnoir · on March 24, 2018

> It's pretty much letting people with GPUs become a sort of 'mini-cloud provider'. There's no job queue or fancy distributing computing setup. We just let you SSH into a container on someone's computer, and pay for the time used.

I had the same idea a few days ago - but in my head, the process would be wrapped up as a "cryptocurrency" where the AI researchers pay real money and the "proof of work" is useful/"real" work. I ran into 2 issues regarding trust: the first is that how do you verify that the hardware owner is running the real job an not NOOP'ing and sending false results? The second issue is how do you protect the hardware from malicious jobs? GPUs have DMA access - how do you stop task submitters from rooting your box and recruiting it into an AI botnet (for free)?I ended up dismissing the idea, but if you could work out these 2 issues, there's money to be made...

jonhohle · on March 24, 2018

> the first is that how do you verify that the hardware owner is running the real job an not NOOP'ing and sending false results?

Consensus. Have _n_ nodes perform the same work (if it’s deterministic), and only accept (and pay) if all the nodes match - or at least the nodes that were part of the majority

I don’t think this would be considerably different from SETI or folding@home, which have been going on for around twenty years.

For my senior project in college one of our ideas was a distributed render farm that operated like what we’re talking about. There were some additional issues there (transfering assets, preventing node owners from extracting the output [say a studio was “letting” fans donate compute time to render a feature film], etc).

milesokeefe · on March 24, 2018

If you can pay for a single fully trusted node to do the calculation once, the cost of n nodes redundantly calculating the same result in order to establish trust must require those untrusted nodes to be cumulatively cheaper than the one trusted node, in order for there to be an economic incentive to do so, no?

My assumption is that you would have to be faithful in a low number of untrusted nodes in order for that to end up cheaper.

The cases of folding and SETI are particularly different because there are institutions which have in interest in funding these programs in part due to their goal being a public cause. The same clearly doesn’t apply to micro tasks if you will.

But I can imagine cases in which you can accept bad actors giving bunk results for some percentage of the calculations you run. As long as you’re rotating nodes often enough (provided that they’re from distinct actors) I’m imagine it could work out to be economically more feasible to spend the time to work around that bad data than it would be to directly hire fully trustable compute power.

ric2b · on March 26, 2018

> Consensus. Have _n_ nodes perform the same work (if it’s deterministic), and only accept (and pay) if all the nodes match - or at least the nodes that were part of the majority

Sounds vulnerable to sibyl attacks.

dmurray · on March 24, 2018

It would work well for problems that are computationally hard to solve, but easy to verify solutions for. Unfortunately, such problems are ubiquitous in cryptocurrency, but rare in machine learning.

chillee · on March 24, 2018

Well, not really. Training a model is one such task. It's hard to train a network but easy to verify that it has good performance (training vs inference).

The real problem here, I believe, and I've seen this idea pop up several times on hackernews, is that almost no machine learning tasks are progress free.

If the cryptocurrency is just paid out to the person who solves the task first in a non-progress free problem, then the person with the fastest GPU would mine all the coins and nobody else can participate. One of the key ideas behind proof of work is that if two people have the same compute, and person A has a headstart, if person A has not succeeded by the time person B starts, they'll have the same probability of mining a block.

People seem to be just jumping on the crypto bandwagon and trying to come up with "useful" proof of work, but it's a pretty difficult task.

jakob223 · on March 24, 2018

Consensus for every computation would be 2x as expensive, but you may be able to achieve something like it with randomly assigning 10% of the calculations to be double-checked, and double=checking more (all?) of a node's computations if it has an inconsistent result.

timClicks · on March 24, 2018

BOINC has quite a sophisticated system, but it's a long time since I looked at the details. I believe new participants are subject to greater scrutiny.

ric2b · on March 26, 2018

> Consensus. Have _n_ nodes perform the same work (if it’s deterministic), and only accept (and pay) if all the nodes match - or at least the nodes that were part of the majority

Soudns vulnerable to sibyl attacks.

tshaddox · on March 24, 2018

I’m under the impression that proof of work that verifies the authenticity of transactions on a blockchain cannot must depend on those specific transactions as its input. If there are other uses for the work that are unrelated to securing specific transactions, then the fact that you performed the work says nothing about the authenticity of those transactions.

snuxoll · on March 24, 2018

> how do you stop task submitters from rooting your box and recruiting it into an AI botnet (for free)?

Only real way to do this is run the job in a VM with a GPU and CPU+motherboard that support passthrough (read: not consumer NVidia GPU's, your CPU+board must support an IOMMU and your card cannot freak out when being reset after initialization).

highd · on March 24, 2018

Golem is solving exactly this problem.

sho · on March 24, 2018

Saw a panel with the golem people just last night, and sure enough this question came up. The short answer is that they don't have a solution yet and IMO their thinking was no more advanced than what I'm seeing on this thread.

Donzo · on March 24, 2018

Give them some time. They are solving many large, complex problems. It looks like they're pretty close to having something that works too.

sho · on March 24, 2018

How is their problem substantially different from this project? Apart of course from the overheard and complexity caused by trying to force what should probably just be a centralised service into a blockchain.

Don't get me wrong, I think it's a great idea. I just don't see why it needs a blockchain and all the associated trustless infrastructure. Even nicehash doesn't bother with all that.

Donzo · on March 24, 2018

I suppose the approach by the Golem team is substantially different because the ideology associated with it.

What you see as “trying to force what should probably just be a centralized service,” I see as “innovating a new approach to powering decentralized architecture.”

I’m not saying you’re wrong. It would be easier to solve the problem using existing tool sets and more mature protocols. Yet, I’m pretty sure that the Golem team is doing something right. So there’s that. Maybe this isn’t a zerosum thing.

sho · on March 24, 2018

> I’m pretty sure that the Golem team is doing something right

I'm not at all convinced that the golem team have any particular insight to solve this obvious and common problem that everyone else doesn't have. And frankly I think that the overhead of running unnecessary infrastructure will render them price-uncompetitive to any reasonable centralised provider. In short, I predict they will fail.

But eh, they raised USD$8m and I didn't, so what do I know.

Donzo · on March 24, 2018

>In short, I predict they will fail.

I guess that's why we're sitting in different camps.

One advantage that the Golem team has over a centralized, proprietary solution is the open nature of the project:

https://github.com/golemfactory/golem

The Golem team doesn't necessarily need to solve every problem. Being built on top of the Ethereum Network is advantageous. If they make an appealing, open platform with potential, maybe other developers will pick up the ball and run with it to power their own ends.

In short, I predict that they will succeed.

sho · on March 25, 2018

> I guess that's why we're sitting in different camps.

Indeed, doesn't mean I don't want to hear the other side's point of view though!

Open source is not going to save them. They have one main problem - how to tell if people did the work they claim they did? If centralised, they can "test" new users or perhaps periodically check up on long term users by secretly allocating duplicate work and verifying its content. How can you do that in public? The blockchain is actually working against them.

And who really needs a cryptographically secure attestation that on march 25th 2018, user XYZ completed ML shared 456.7? This is a level of audit logging appropriate for a bank and basically nothing else. All you need is availability accounting of some sort. It's not rocket science. I couldn't write the client for this app but I sure as hell could write the back end and I wouldn't even think of using a blockchain. Make no mistake, their choice of technologies is for buzzword compliance, not technical necessity - a very bad sign.

There is also no need for the GNT. It solves no problem and users could just as easily be compensated in ETH or anything else. Sure, it's a funding mechanism, fine. We still haven't figured out how ICOs should even work.

Despite all the rigmarole, they have a product they need to sell like any other startup: rent us your GPU/CPU for $x/hr. Because of their overhead, I predict they will easily be outcompeted by centralised providers. People are not going to use golem over another, better-paying alternative just from the goodness of their hearts. And I cannot see any way how golem can be structurally more efficient than a centralised solution.

All said, I'm not as optimistic as you. Not like I want them to fail though, good luck to them!

Donzo · on March 25, 2018

Well I sure do appreciate you going out of your way to explain your perspective to me.

Just a couple more responses:

1. The use of blockchain is not for buzzword compliance. Julian (CEO) is a longtime Ethereum supporter/developer. This project has really been in development since 2014 or so, long before blockchain was "buzzy." So the use of blockchain here is not for grabbing cash. They actually think it's a better (perhaps harder) way forward.

2. Not only does the token allow investors to directly invest in the project, it also allows developers to "print" tokens that can be locked behind smart contracts. That way developers can be rewarded for reaching project goals with bowls of their own dog food. Not bad to eat when it's pretty much "real" money.

3. The decentralized and distributed nature of the project will allow the Golem Network to achieve goals and execute code that no centralized competitor could achieve/run. I'll leave it as a thought exercise for you to speculate what those goals/codes might look like.

Thanks for the engagement. It's great to test my beliefs through debate. Time will be the true arbitrator here though. Best wishes.

joshumax · on March 24, 2018

Instead of having the party ssh into a VM installed on the user's machine, potentially exposing a high majority of the user's codebase, have you considered spinning up temporary containers on your back-end and having contributors install something like remote CUDA or remote OpenCL so that only the GPU kernels are transferred to the contributor, who's client software polls a network queue checking to see what kernel should be run and where the results should be sent?

Samin100 · on March 24, 2018

Remote CUDA seems incredibly useful - this is an excellent idea - I'll look more into it tonite.

mbajkowski · on March 24, 2018

Good idea from the perspective of not exposing the code base. However, technologies such as remote Cuda/OpenCL which rely on remote execution of compute kernels in general require high-bandwidth and low-latency connectivity - this is especially true for deep learning / AI workloads, not necessarily for other applications which may have a higher computer to data transfer / synchronization ratios. The latency on a typical internet connection will likely stall the GPUs on a remote system, yielding little compute benefit.

ThePhysicist · on March 24, 2018

I think this is a great business idea with a lot of potential, if you can address the reliability, scalability and trust problems you might have a really large business opportunity here. I'm really impressed with your pragmatic and simple (for the user) implementation of this idea, I think you really did a great job identifying a minimal set of useful features and implementing that to validate your idea, congratulations! How long did you work on this so far if I might ask?

dpwm · on March 24, 2018

I somehow missed the paragraph where you said about building the platform and the link to the service. This looks awesome then.

As somebody who's spent a silly amount of money on EC2 spot instances to train models, I would certainly overlook the odd dodgy result for access to those GPUs at those prices.

I just hope you find a way so that the ingenious but disreputable people that seem to come when money's to be gained don't ruin it for everyone. However, I wish you every success.

I imagine you could do some kind of hardware fingerprinting, but there's nothing stopping a really bad actor from modifying the kernel to pretend to have a GPU and NaN on allocation. I suppose I'm descending into absurd levels of distrusting trust that may never happen.

I also foresee annoying customers who say they only get back NaNs but this is down to instabilities in their training and they flood any reporting of bad actors that you have.

I don't believe either are actually terminal with the right incentives.

pishpash · on March 24, 2018

That's an interesting arbitrage to make cloud computing costs drop towards mining costs, i.e. cost of electricity.

rdlecler1 · on March 24, 2018

It’s Uber for AWS.

Samin100 · on March 24, 2018

Haha, I will admit to using the 'Airbnb for GPUs' to explain this.

state_less · on March 24, 2018

Nice work! I wouldn't let this minor objection keep you down. You can always spot check a computation on a trusted system (e.g. your own) and update your trust accordingly.

Samin100 · on March 24, 2018

Thank ya! Plus the way it's setup right now, you don't pay for anything until you done and satisfied with a session! I just want both parties to be happy with the GPU compute transaction :)

morganvachon · on March 24, 2018

This looks awesome, I just submitted a hosting application. I only have a single GTX 1060 on a Ryzen board, but I only use it 3-4 hours per day and I'm good with its downtime being used for passive income. Hopefully someone will find it useful.

One question, I noticed you only pay in crypto right now, do you plan to offer USD or other fiat currencies in the future? Crypto isn't a problem for me (I don't mine crypto myself but I wouldn't be opposed to carrying a passively obtained ∗coin balance and watching it appreciate over time), just curious.

Anyway, I think you have the makings of a nifty project here. Good luck!

Samin100 · on March 24, 2018

I was thinking about paying out in fiat, but crypto is so much easier because of no fees, instant transactions, and not having to deal with various currencies.

While something like Stripe Connect may be useful, the fees are unreasonable for smaller transactions. A quick hack to cash out to crypto is to use your Coinbase wallet address as the payout address, and just sell off the crypto the moment it hits your wallet.

zhoujianfu · on March 24, 2018

Why no Bitcoin Cash? :)

But seriously, it does have the lowest fees of the four supported by coinbase, and always should (since that’s its whole raisin d’etre!).

Cool idea, well done!

Samin100 · on March 24, 2018

Honestly I wish I had a good answer to this. I didn't think about supporting BCash because I had never seen it used before.

If enough people are interested in it, I can add support since it's already supported by Coinbase/GDAX.

Houshalter · on March 24, 2018

You don't verify every task. You verify some percent of them at random. And blacklist the people who cheat the system (Maybe with some leeway because random memory errors are a thing with consumer GPUs.)

runeks · on March 25, 2018

How do you blacklist people who cheat when a proxy server and a freshly generated public key can make them appear as a new node in the network?

Houshalter · on March 26, 2018

The same way any other website deals with bot abuse. You give extra scrutiny to new accounts, require captchas, flag suspicious email providers and bank accounts, and block IP ranges that correlate highly with abuse. I don't think you would need to go anywhere near that far though. Because it's pretty cheap to do lots of verification constantly, and fake users wouldn't last very long.

wdr1 · on March 24, 2018

> You could farm it out to two people and if the results disagree, then payment is decided by a third. But then you've just doubled the (paid) workload, and you've not really solved collusion by a significant portion of the workers.

That would be the naive/bruteforce way to verify trust.

Just like you can find a substring faster via Boyer–Moore than a char-by-char match, there are more efficient ways to verify trust in a distributed environment like this. There's a few white papers on the topic.

lilyball · on March 24, 2018

Being able to generate random data in the right shape for arbitrary ML workloads would be a pretty impressive technical achievement.

gormanc · on March 24, 2018

According to the website (https://vectordash.com/hosting/) they use a highly isolated Ubuntu image, so the person hosting the service shouldn't have access to the VM with your model or data on it. It would be nice if there was some third party audit of the software though, the models, the code, and even the training data can be pretty sensitive for researchers.

Samin100 · on March 24, 2018

If your training data is sensitive, then Vectordash may not be the best GPU provider. But if you're a broke CS student like me who wants to participate in a few Kaggle competitions (after having burned up their AWS student credits in 3 days) without shelling out a bunch for a K80, then Vectordash might be pretty helpful!

throwaway2048 · on March 24, 2018

there is no way to "highly isolate" a VM from a host.

lostmsu · on March 24, 2018

But there is (though I think they don't use it): TPM based host attestation.

throwaway2048 · on March 24, 2018

The microsoft secureboot golden key got leaked, anything based on secureboot as a root of trust is 100% blown wide open.

https://web.archive.org/web/20170604013028/https://rol.im/se...

lostmsu · on April 3, 2018

I am not sure this depends on TPM. Care to share a link?

encom · on March 24, 2018

If you don't want to claw your eyes out while reading:

https://bpaste.net/show/571ef50296ac

q3k · on March 24, 2018

Theoretically possible via SGX.

mmozeiko · on March 24, 2018

Which can be defeated with SgxSpectre: https://arxiv.org/abs/1802.09085

snuxoll · on March 24, 2018

Oh goodie, I wonder if Netflix is going to disable 4K support on PC as a result of this (the requirement for Skylake was due to SGX).

gruez · on March 24, 2018

Worthless if the GPU doesn't have something similar. Otherwise you can monitor the pci-e lanes for all the data the cpu is sending over to the gpu.

ThePhysicist · on March 24, 2018

While I think it's a valid point I don't see this as a large issue for this platform: I think that many ML workloads do not require a fully-trusted environment as it's easy to verify the results after training by running the test set evaluation on a local machine. Also, many datasets can be transformed in a way such that they do not reveal much about the underlying data itself (e.g. by using one-hot encoding, removing feature labels, categorizing/tokenizing potentially sensitive fields like names, ...), alleviating data security concerns in many cases. Leakage/theft of your ML models/code might be a bigger concern here, though for many companies this might not be a large problem either as in my experience the models are often just slight modifications of "best practice" architectures.

traviscj · on March 24, 2018

Piling on/agreeing with you: The “magic smoke” isn’t the model, it’s the infrastructure the model plugs into, the data->feature pipeline, AND the model. Assuming you’be done the things you mention (and maybe with one additional assumption of several models operating at once), I would also consider the models themselves to USUALLY not be super super sensitive.

ThrustVectoring · on March 24, 2018

Couldn't you embed a task or sub-task with a known result and throw out responses from nodes that didn't successfully process the sub-task?

brador · on March 24, 2018

You could randomize the tests + test everyone at the same time. So every r(n) cycles send a micro problem with a known precomputed locally verified solution. Compare solutions, kick and ban node if wrong.

TrainedMonkey · on March 24, 2018

This is identical to the problem blockchain solves - hard to solve, but easy to verify. For example, if the workload is training a neural network on a set of samples it is relatively cheap verify performance on a training sample.

Alternatively you could give X% of tasks to multiple workers for cross checking. In your example X% is 100%, but it does not have to be that high.

mattkrause · on March 24, 2018

Say you do a single feedforward pass. How do you verify that its output is the result of a trained network-and particularly a completely trained one?

TrainedMonkey · on March 24, 2018

Aside from doing the same work X% of the time I can think of only one other option. The network could be overloaded with additional nodes such that it would also compute some function on the input data with the known output. For example a hashing function. Having said that, single pass does not strike me as a particularly useful unit of work. You would want to batch the samples, and that leads back to a strategy of verifying a portion of them based on the host trust metric.

Terribledactyl · on March 24, 2018

Pay some hourly nominal fee, and a "bonus" for how well it classifies some reserved data? Proof of work could be the fit.

nandhp · on March 24, 2018

This sort of effort has existed for a long time, e.g. SETI@home and the Great Internet Mersenne Prime Search (although they don't tend to pay people). They've faced many of these same problems, and presumably devised minimum overhead solutions.

zitterbewegung · on March 24, 2018

Yea but their workload is much harder to fudge and the solution they use to confirm primes is running the task twice . Training of models can be shortcutted or even maliciously tampered with.

freech · on March 24, 2018

The person who runs the platform could do some calculations themselves and then throw people out if there's a disagreement.

dpwm · on March 24, 2018

This is quite a good solution. It reminds me of public transport in some European countries. Yes, you can get on board without a ticket. But there will be a guard who will be on board maybe 1 in 10 times and the fines are substantial.

The problem is you can't really issue substantial fines in this instance. I suppose you could pay less for the first few runs where verification is more likely.

I really want this to succeed, and I think these problems can be overcome.

freech · on March 24, 2018

You could require people to advance some money, that they only get back if they pass the test.

carussell · on March 24, 2018

> How do I know you actually ran what I paid you for and not just generated random data that looks right in the shape I wanted it?

See https://pfrazee.hashbase.io/blog/nodevms-alpha

mcgarnagle · on March 24, 2018

pepper data with known results, obviously