Yes, it was probably one of my better decisions and investments in the past years. I run a ton of things I never would have paid to run in the cloud and feel in control. I've even started making money from one of the projects.
Tons of small things feel nice, taking notes and I don't have to use Google Docs if I don't want. Watching a movie from my home media center etc
I recently deleted my ChatGPT and was just looking around for what the other options are. I kinda like fast, but Grok / Perplexity are behind a perpetual CloudFlare check for me. I'm looking for something that doesn't spend forever reasoning out the answer to some basic quick question.
It's interesting, as I type that out it makes me wonder why not just go back to the search engine since it has the AI summaries that have been getting better.
Finally, I do also like the longer reasoning when I have a tough question and usually like to copy paste it around to various models and compare their responses.
I’ve had similar friction experiences — especially when reasoning-heavy modes take longer or get retried. That repels me too.
On the search engine comparison: do you feel LLMs reduce cognitive load because they maintain context, whereas search requires more manual synthesis?
Also curious — do you think the frustration is mostly with the model itself, or with the serving/infrastructure layer (Cloudflare, routing, batching, etc.) around it? Both comments seem to point at that layer in different ways.
Location: SF, Taipei
Remote: Ok
Willing to relocate: Maybe
Technologies: Advertising, mobile ads, game marketing
Résumé/CV: jamesoclaire.com
Looking for anyone working on new mobile ads related companies or programmatic advertising. I'm passionate about advertising and looking for companies that share those values.
I really hope at some point in the near future AI models shrink enough or laptops get strong enough to run AI models locally. I haven't tried in the past year, but when I did it was very slow token output + laptop was on fire to make that happen.
I've wanted to try some of the more recent 8B models for local tab completion or agentic, any experience with those kinds of smaller models?
I've been running local language models on an existing laptop with 8GB GPU, currently using ministral-3:8b. It's faster than other models of similar size I used previously, fast enough that I never wait for it, rather have to scroll back to read the full output.
So far I'm using it conversationally, and scripting with tools. I wrote a simple chat interface / REPL in the terminal. But it's not integrated with code editor, nor agentic/claw-like loops. Last time I tried an open-source Codex-like thing, a popular one but I forget its name, it was slow and not that useful for my coding style.
It took some practice but I've been able to get good use out of it, for learning languages (human and programming), translation, producing code examples and snippets, and sometimes bouncing ideas like a rubber-duck method.
qwen3-8b is good and if you are doing tab completion then it's more than adequate. you can get basic agentic with it, but if you really want to use a serious agent and do some serious work, then at the very least qwen3.5-27B if you have a 5090 32gb vram GPU or qwen3.5-35-a3b if you have less than 24gb. if you want to use a laptop, get a laptop with a built in gpu or igpu.
> NTransformer
High-efficiency C++/CUDA LLM inference engine. Runs Llama 70B on a single RTX 3090 (24GB VRAM) by streaming model layers through GPU memory via PCIe, with optional NVMe direct I/O that bypasses the CPU entirely.
I had some luck with Ollama + Mistral Nemo models on consumer hardware, it seemed to punch above its "weight class". But it’s still far enough behind ChatGPT et al. that I couldn’t stop using that for real work.
What value does this give you? Part of why I deleted my account was I couldn't think of a single thing of value in my chats from the past couple years? Maybe some nostalgia looking at what bugs I was fixing?
For me this is very valuable. The results of personal "research projects" are in there. I use it for reference. Of course I could ask Claude to get me those answers but why waste the energy?
Thanks, but I guess I understand the sentiment. I probably should have not said that I couldn't think of "a single thing of value" when that is a bit of a judgement along with my question. Anyways, it is interesting hearing what people ask it, I think I've only ever used it like a search engine / bug fixing while it seems some people have much deeper conversations or discussions that are worth remembering.
I'm glad I upvoted. Your perspective and questions are valid, no matter the depth of conversation. You'd be surprised what fresh questions can do for a topic.
I for one might use these chats as an input for switching over to keep the learning process fast. For me it took a while for ChatGPT to get me. I know that other people delete memories because they want a clean slate experience with every chat. I use chatGPT mostly private (use claud code for work for instance) and I prefer that memories travel across chats.
In my case, I would rather keep it than lose it. It's just text so small amount of data. You can trivially get a GPT Embedding for it and search it in DuckDB later for things you asked.
I re-registered an account last year to the same email that was used previously. That account was deleted in 2023. Although they used to request your phone number back in the day and do not do so anymore, so email retention policies may also have changed.
for sure. If they weren't so self-righteous about not serving ads, it'd be a great revenue stream for them. It'd also align with Dario's seeming obsession with profitability
Tons of small things feel nice, taking notes and I don't have to use Google Docs if I don't want. Watching a movie from my home media center etc
reply