Clarification: The AMD iGPU driver (or Chrome) on Ubuntu 24.04 has bugs on your hardware. You could try a newer and different distro (just using a live-USB) to see if that has been fixed.
"Last week I ran a script on BigQuery for historical HTTP Archive data and was billed $14,000 by Google Cloud with zero warning whatsoever, *and they won’t remove the fee.*"
> I think Apple will come up with some crazy hardware to run good quality LLMs.
you mean like the "neural engine" that has been present in their SoCs for nearly a decade? (this is also why M1/M2s can run LLMs at comparable speeds to desktop GPUs... and they weren't even designed with LLMs in mind yet)
What are those NLP tasks, if I may ask? (I was thinking above about using it as a chatbot like ChatGPT or Bard, which currently seems the only application for end-users.)
News summarization, news data extraction, news question answering, news filtering. I can assure you that older 7B/13B models had trouble following directions and outputting (for example) JSON.
I'm pretty sure Apple won't offer those things locally on iPhones. The hardware requirement is too high and the value to average Apple customers too small.
if you use commodity GPUs, sure. if you use TPUs (which Apple is already building into their chips) the efficiency improvements are massive. seriously look at some Coral Edge TPUs and what they can do at power levels completely unheard of for GPUs. then look at how much faster M1/M2 Macs are than normal desktop GPUs for machine learning tasks because they have an onboard accelerator
It's not just inference time, RAM size is another bottleneck. Apple, being Apple, probably wouldn't want to offer anything less than GPT-3.5 level of intelligence. Which I would estimate at 220 billion parameters (1/8 MoE GPT-4 rumor), which would require 220 GB RAM at 8 bit parameter quantization.
apple probably has the attention to detail to train the absolute shit out of their models. they will not need 8x220M parameters to do what GPT4 does, if they ever get to that point. see LLaMA2 7b and 13b being (subjectively) far better than LLaMA1 even with the same number of parameters, just by having been trained more
apple is known to care a lot about stuff like this. like, a lot. they are pedantic as heck
Indeed, years ago I had scripts to automatically fetch URLs from IRC and I quickly realized that if I didn't spoof the user agent of a proper web browser many websites would reject the query. Googlebot's UA worked just fine however.
They obviously don't care enough then - Google says you should use rdns to verify that googlebot crawls are real[0]. Cloudflare does this automatically now as well for customers with WAF (pro plan).