I've been working on the idea of building synthetic workers. I'm trying to implement a planning workflow system for scenarios where the workflow definition, the environment, or the task are not well defined. I also ended up implementing a micro Palentir plugin system to support the action system for the synthetic users.
Its a cool project that gave me immense pleasure to built, however its unfortunately a intellectually masturbatory one, because although the tech is cool, I haven't found a cool application for it. If anyone is interested hit me up.
Location: Lisbon, Portugal
Remote: Yes
Willing to relocate: Maybe
Technologies: Django, Python, Vue, C, Java, JavaScript, Typescript, PyTorch, Nordic Firmware Development
Résumé/CV: https://www.surf-the-edge.com/wp-content/uploads/2024/01/Resume.pdf
Email: artur.ventura@gmail.com
Website: https://www.surf-the-edge.com/
My name is Artur Ventura, I'm a Senior Software Developer Engineer with a cross functional expertise in every level of
software development. Developing software on a professional level, in scenarios of
hundreds of thousands of users for 13 years. Focused on Applied Artificial
Intelligence, in particular Natural Language Processing pipelines. Just finishing the sale of a company and looking for a new challenge.
Interesting idea, but if you didn't filter the elements with URLs, you're creating a CSS injection attack that enables people who visit the website to work as bots in a botnet and attack some url.
This is really good, and I was really excited by it but then I read:
> running on a single 8XA100 40GB node in 38 hours of training
This is a $40-80k machine. Not a diss, but I would love to see an advance that would allow anyone with a high end computer to be able to improve on this model. Before that happens this whole field is going to be owned by big corporations.
That's a great comparison. For a real number, I just checked Runpod and you can rent a system with 8xA100 for $17/hr or ~$700 for 38 hours. Not cheap, but also pretty close to the cost of renting a premium vehicle for a few days. I've trained a few small models by renting an 1xA5000 system and that only costs $0.44/hr, which is perfect for learning and experimentation.
The problem with that is currently, the available memory scales with the class of GPU.... and very large language models need 160-320GB of VRAM. So, there sadly isn't anything out there that you can load up a model this large on except a rack of 8x+ A40s/A100s.
I know there are memory channel bandwidth limits and whatnot but I really wish there was a card out there with a 3090 sized die but with 96GB of VRAM solely to make it easier to experiment with larger models. If it takes 8 days to train vs. 1, thats fine. having only two of them to get 192GB and still fit on a desk and draw normal power would be great.
Technically this is not true- there are a lot of techniques to shard models and store activation between layers or even smaller subcomponents of the network. For example, you can split the 175B parameter bloom model into separate layers, load up a layer, read the prev. layers input from disk, and save the output to disk.
And NVIDIA does make cards like you are asking for - the A100 is the fast memory offering, the A40 the bulk slower memory (though they added the 80GB A100 and did not double the A40 to 96GB so this is less true now than the P40 vs P100 gen).
Oddly, you can get close to what you are asking for with a M1 Mac Studio - 128GB of decently fast memory with a GPU that is ~0.5x a 3090 in training.
Do you know if there's any work on peer-to-peer clustering of GPU resources over the internet? Imagine a few hundred people with 1-4 3080Tis each, running software that lets them form a cluster large enough to train and/or run a number of LLMs. Obviously the latency between shards would be orders of magnitude higher than a colocated cluster, but I wonder if that could be designed around?
Well if it used to cost you $1 for 1hr at 1x speed, now it will take you 10hr at 0.1x speed, and if my math checks out $1. You need to shrink the model.
But of course now you run it on your own computer instead of in the DC, which changes the numbers. Especially if your student dorm has a shared electricity bill :)
Let's not forget that rendering 3D Animations in 3DSMAX or Maya used to take days for a single frame for a complex scene, and months for a few minutes.
Great news! Cloud instances energy usage is included in their price, and because they're remote and transient it's impossible to permanently damage them.
I think the equivalent of being not careful and getting a dent in this context is to leave it open to the internet and having a bitcoin miner installed.
As you are paying for the resources you use that's fine.
The closest would be if you used some form of software bug to actually cause physical damage, certainly not impossible, but extremely unlikely compared with actually physically damaging a car.
A better fit would be, if you have unlimited liability like with AWS, and you leak your key pair. Then someone runs up a 100k bill setting up mining instances
I think it was a DIY machine, those RTX 3090 have gotten cheaper for sure.
From my experience, going beyond 4 GPUs is a pricey affair. See [§]. All but one model of the RTX3090 require at least 3 slots.
If 4 GPUs connected via PCIe 4.0x16 are enough you can choose among various sRTX4 boards for 3000 series AMD Threadripper CPUs.
It's a $33/hour machine on AWS, so about $1250 for one training run. Not cheap, but easily in the reach of startups and educational or research institutions.
Edit: or about $340 if you get the 8xA100 instance from lambdalabs, in the realm of normal hobby spending
"...Spot instances can be interrupted, causing jobs to take longer to start or finish. You can configure your managed spot training job to use checkpoints. SageMaker copies checkpoint data from a local path to Amazon S3. When the job is restarted, SageMaker copies the data from Amazon S3 back into the local path. The training job can then resume from the last checkpoint instead of restarting...."
If you're doing something new/ custom (which you presumably are if you aren't using someone else's prebuilt model), it could take a lot of runs to figure out the best training data and finetune settings.
(I assume. I've never worked with GPT, but have done similar work in other domains).
Just download the model and run it on something much smaller and cheaper. Bigger models like GPT-J are a bit of a pain to run, but GPT2-sized models run just fine on consumer GPUs.
Ahh okay, thanks. So how big is the model? Seems like it should be available to download so people don't have to train it. I understand you can train it on custom data but for a "default" model are there any available to download?
Depends on precision, you can run ~5B model with fp32 precision or ~11B fp16 model max. Int8 is really bad for real world use case so not mentioning it.
But if you are looking to get performance of ChatGPT or GPT-3 then don't waste your time, all GPT-3 like small LLM models (below at least 60B params) are useless for any real world use case, they are just toys.
If you specifically mean a general LLM trained on a general language corpus with instruction finetuning this is correct.
Fortunately very few real world use cases need to be this general.
If you are training a LLM on a domain specific corpus or finetuning on specific downstream tasks even relatively tiny models at 330m params are definitely useful and not “toys” and can be used to accurately perform tasks such as semantic text search, document summarization and named entity recognition.
> If you specifically mean a general LLM trained on a general language corpus with instruction finetuning this is correct.
Yes, thanks, that's what I meant.
> If you are training a LLM on a domain specific corpus or finetuning on specific downstream tasks even relatively tiny models at 330m params are definitely useful and not “toys” and can be used to accurately perform tasks such as semantic text search, document summarization and named entity recognition.
> This creates a much smaller Transformer (4 layers, 4 heads, 64 embedding size), runs only on CPU, does not torch.compile the model (torch seems to give an error if you try), only evaluates for one iteration so you can see the training loop at work immediately, and also makes sure the context length is much smaller (e.g. 64 tokens), and the batch size is reduced to 8. On my MacBook Air (M1) this takes about 400ms per iteration. The network is still pretty expensive because the current vocabulary is hard-coded to be the GPT-2 BPE encodings of vocab_size=50257. So the embeddings table and the last layer are still massive. In the future I may modify the code to support simple character-level encoding, in which case this would fly. (The required changes would actually be pretty minimal, TODO)
But how often do you need to run this? You can run 8xA1000 on LambdaLabs [0] (no affiliation) for $8.80/hr. So you should be able to run the entire data set for less than $350.
If you can’t fit the model on your resources you can leverage DeepSpeed’s ZeRO-offload which will let you train GPT2 on a single V100 (32gb).
Alternatively, if you’re researching (with the caveat that you have to either publish, open source or share your results in a blog post) you can also get access to Google’s TPU research cloud which gives you a few v3-8s for 30 days (can’t do distributed training on devices but can run workloads in parallel). You can also ask nicely for a pod, I’ve been granted access to a v3-32 for 14 days pretty trivially which (if optimized) has more throughput than 8xA100 on transformer models.
TPUs and moreso pods are a bit harder to work with and TF performs far better than PyTorch on them.
I was curious about how much this would be to rent, because definitely the cost of those servers is outside the budget! Lambda has 8xA100 40gb for $8.80/hr: https://lambdalabs.com/service/gpu-cloud#pricing
It seems as likely as people being able to build big automaker level of cars just with tools in their garage. More compute is going to keep producing better results at least for LLMs.
Most decently large colleges have been investing in HPC for a while, and started investing in GPU HPC around 2014. You'd be surprised what sort of school projects the compute budget exists for.
I went to a smallish state university, even there we had our own HPC center and lab. We had a proper HPC (IIRC) 6 row data center across campus and we had a continuous budget available to me as an undergraduate research assistant for building beowulf clusters for the graduate programs to run assignments on. I once got an allowance to buy 15 raspberry pis to build an arm cluster.
That's to train it from scratch, though, right? If you preload the GPT2 weights you don't need to do this. You can just give it additional training on your texts.
Well, he does include instructions for running it on a personal computer, which looks like what I'm gonna be doing next week.
Besides the rental options discussed below these nvidia boxen don't look too big so either used ones will be available for cheap relatively soon, or you could just locate and liberate one in Promethean fashion.
Supposedly even running the trained model for ChatGPT is extremely expensive unlike the image generators which can largely be run on a consumer device.
So if I see it right that would be a p4d.24xlarge instance. Which goes for about $32.77 an hour nowadays so the total training would be about $1245. Not cheap, but certainly not a nation state budget.
Edit: i just noticed lambda lab. It seems they ask $8.8 per hour for an instance of this caliber. That puts the total training cost around $334. I wonder how come it is that much cheaper.
That is a key difference. You can’t easily and cheaply rent an auto factory, but you’re starting to be able to rent an LLM training factory once for a model where you can then more cheaply run inference on.
Search in the web is becoming problematic. Quality is decreasing and competition is extremely hard.
Some time ago I came to the realisation that the biggest strength of Google might also be its Achilles heel. Google is forced to create a list of links because that's the main vehicle where they drive profit from. If you were to send a question like "Who is Barack Obama?" you still will get a list of links although google knows there is a canonical answer.
However, if you were to build a new search engine from the ground up you would need to build the infrastructure to crawl the web, crawl it, index it and build the interface. That will take a lot of money and time to test one idea. And there are multiple possible attack vectors to Google's business model (privacy, subscription model, modality, etc.). You might get the chance of testing one of them, and if that fails, starting again is super expensive so you might not be able to.
My idea is to have a single open web index database, continuously updated so that you can apply ranking and embedding algorithms to it. This would reduce the cost of entry, and enable developers to build competitors to google on top of it, or create new products in the search space (for instance, a search engine for clothes). I don't know if this is interesting for anyone but if it is, hit me up.
I'm working on building an AWS for anyone who wants to make their own search engine. The idea is to have a single open webindex database, continuously updated that you can apply ranking and embedding algorithms in it. This would reduce the cost of entry, and enable developers to build competitors to google on top of it, or create new products in the search space like a search engine for clothes. I don't know if this is interesting for anyone but if it is, hit me up.
That sounds very cool, and I hope you (and your customers!) are successful. Out of curiosity, did you find an existing market need for that, or it's a "build it and they will come" model?
Also, have you thought about partnering with commoncrawl.org? I could see that relationship benefiting both sides: they get fresher indices, you get access to the historical web snaps
I faced the problem. I think one of the main issues with google is the modality of the results. Google is forced to create a list of links because that's the main vehicle where they drive profit. If you were to send a question like "Who is Barack Obama?" you still will get a list of links although google knows there is a canonical answer.
The problem is that if you were to build a new search engine from the ground up it will take millions in infrastructure, and a lot of time for you to test one idea. And there are multiple attack vectors to Google's business model (privacy, subscription model, modality, etc.) however you might get the change of testing one of them, and if that fails, starting again is super expensive so you might not be able to get funds to do it.
My approach then became to build something that others can build on top of.
I'm currently using common crawl but my main problem is that I need to build a small toy to test it and even processing common crawl is crazy expensive. Just a single snap are 150 Tb, so this needs to be process on metal, or you're gonna pay a hefty AWS bill.
> If you were to send a question like "Who is Barack Obama?" you still will get a list of links although google knows there is a canonical answer.
For that specific search I would start at Wikipedia, but for more general "data search" I lean towards Wolfram Alpha, which has some usability issues, but interesting maths engine for queries.
https://www.wolframalpha.com/input?i=Barack+Obama+vs+Donald+...
It sounds like just what need to break free from Google.
I’ve been dreaming of an open web index and social graph for more than a decade.
Any company having the data + the algorithm + the presentation layer is way too much power. We can and should split that problem into its separate domains.
is what I saw as the primary difference. Whether that's going to pan out in reality as well as it does in HN comments is "the devil's in the details" though
My professional background is in artificial intelligence. Currently I’m the Tech Lead at Bond Touch, and I’ve worked at Unbabel, a YC company, as an artificial intelligence engineer on the Machine learning team. I also have some interests in quantum computing, financial modeling and robotics.
I love watching SpaceTime on youtube! If you watch it, it will give you a surprisingly deep understanding of the state of the art on physics, but is the kind of show that if you don't have a massive background in physics, you either need to be extremely focused to understand it, or blazed out of your mind.
Its a cool project that gave me immense pleasure to built, however its unfortunately a intellectually masturbatory one, because although the tech is cool, I haven't found a cool application for it. If anyone is interested hit me up.