I'm really confused at that take.
If you watched the Corridor Channel on YouTube, you can catch a lot of times that Unreal is treated as a draft, or the on-set reference, and gets replaced almost always, before shipping the final. Something doesn't add up here.
Having watched a great deal of Andromeda, Star Trek, and Hercules/Xena growing up, I would submit that weak video effects can be perfectly fine as long as the actors take them seriously enough.
I'm thinking quite a bit about this at the moment in the context of foundational models and their inherent (?) regression to the mean.
Recently there has been a big push into geospatial foundation models (e.g. Google AlphaEarth, IBM Terramind, Clay).
These take in vast amounts of satellite data and with the usual Autoencoder architecture try and build embedding spaces which contain meaningful semantic features.
The issue at the moment is that in the benchmark suites (https://github.com/VMarsocci/pangaea-bench), only a few of these foundation models have recently started to surpass the basic U-Net in some of the tasks.
There's also an observation by one of the authors of the Major-TOM model, which also provides satellite input data to train models, that the scale rule does not seem to hold for geospatial foundation models, in that more data does not seem to result in better models.
My (completely unsupported) theory on why that is, is that unlike writing or coding, in satellite data you are often looking for the needle in the haystack. You do not want what has been done thousands of times before and was proven to work. Segmenting out forests and water? Sure, easy. These models have seen millions of examples of forests and water. But most often we are interested in things that are much, much rarer. Flooding, Wildfire, Earthquakes, Landslides, Destroyed buildings, new Airstrips in the Amazon, etc. etc.. But as I see it, the currently used frameworks do not support that very well.
But I'd be curious how others see this, who might be more knowledgeable in the area.
I played once with hosting a VSCode server on a raspberry pi for general development and it was actually quite powerful, when used from an iPad. Just not strictly for Swift unfortunately
I'm hosting a VSCode server with Termux/Ubuntu container on my old Pixel 6a and I cannot overstate how awesome it is for just a fun dev setup, especially with a tablet. Easy to nuke and start clean too!
The ecosystem is fine for non-Apple development. It's just building apps for iOS, macOS, etc. that is impossible on iPad right now past some basic applications.
Same here. I just didn't want to expend energy racing trigger happy mods. It was so odd, to this day remember vividly how they cleanup their arguments once proven wrong on the closing vote. Literally minutes before it would the close threshold.
Correct, most of r/LocalLlama moved onto next gen MoE models mostly. Deepseek introduced few good optimizations that every new model seems to use now too. Llama 4 was generally seen as a fiasco and Meta haven't made a release since
Llama 4 isn't that bad, but it was overhyped, and people in generally "hold it wrong".
I recently needed an LLM to batch process me some queries. I ran an ablation on 20+ models from Open Router to find the best one. Guess which ones got 100% accuracy? GPT-5-mini, Grok-4.1-fast and... Llama4 Scout. For comparison, DeepSeek v3.2 got 90%, and the community darling GLM-4.5-Air got 50%. Even the newest GLM-4.7 only got 70%.
Of course, this is just an anecdotal single datapoint which doesn't mean anything, but it shows that Llama 4 is probably underrated.
Oh, this is very interesting. Will have to test it out on coding too.
Very good point about testing. Had I only followed benchmarks, I'd miss few gems completely (long context models and 4b vision models that are unbelievably capable for their size).
I'd encourage anyone to test the models on actual problems you're working on.
The Llama 4 models were instruct models at a time when everyone was hyped about and expecting reasoning models. As instruct models, I agree they seemed fine, and I think Meta mostly dropped the ball by taking the negative community feedback as a signal that they should just give up. They’ve had plenty of time to train and release a Llama-4.5 by now, which could include reasoning variants and even stronger instruct models, and I think the community would have come around. Instead, it sounds like they’re focusing on closed source models that seem destined for obscurity, where Llama was at least widely known.
On the flip side, it also shows how damaging echo chambers can be, where relatively few people even gave the models a chance, just repeating the negativity they heard from other people and downvoting anyone who voiced a different experience.
I think this was exacerbated by the fact that Llama models had previously come in small, dense sizes like 8B that people could run on modest hardware, where even Llama 4 Scout was a large model that a lot of people in the community weren’t prepared to run. Large models seem more socially accepted now than they were when Llama 4 launched.
Large MoE models are more socially accepted because medium/large sized MoE models can still be quite small wrt. expert size (which is what sets the amount of required VRAM). But a large dense model is still challenging to get to run.
GLM 4.7 is new and promising. MinMax 2.1 is good for agents. Of course the qwen3 family, vl versions are spectacular. NVIDIA Nemotron Nano 3 excels at long context and the unsloth variant has been extended to 1m tokens.
I thought the last one was a toy, until I tried with a full 1.2 megabyte repomix project dump. It actually works quite well for general code comprehension across the whole codebase, CI scripts included.
Gpt-oss-120 is good too, altough I'm yet to try it out for coding specifically
Since I'm just a pleb with a 5090, I run GPT-OSS 20B a lot, since it fits comfortably in VRAM with max context size. I find it quite decent for a lot of things, especially after I set reasoning effort to high and disabled top-k and top-p and set min-p to something like 0.05.
For the Qwen3-VL, I recently read that someone got significantly better results by using F16 or even F32 versions of the vision model part, while using a Q4 or similar for the text model part. In llama.cpp you can specify these separately[1]. Since the vision model part is usually quite small in comparison, this isn't as rough as it sounds. Haven't had a chance to test that yet though.
I just let myself use AI on non-critical software. Personal projects and projects without deadline or high quality standards.
If it uses anything I don't know, some tech I hadn't grasped yet, I do a markdown conversation summary and make sure to include technical solutions overview. I then shove that into note software for later and, at a convenient time, use that in study mode to make sure I understand implications of whatever AI chose. I'm mostly a backend developer and this has been a great html+css primer for me.
If you're interested in the large codebase... The best I found so far are extended context models.
Using newest Nemotron3 nano, you can put a 1m tokens (about 3 ish megabytes of text) of pure code dump (I use repomix --style markdown) and ask around.
That's been one of the biggest wow moments I had with LLMs so far. Much better experience than any RAG I used
reply